Ticket 14736

Summary: 22.05/21.08 srun running against 20.11 slurmd - "io_init_msg_unpack: unpack error"
Product: Slurm Reporter: Tim Wickberg <tim>
Component: User CommandsAssignee: Tim Wickberg <tim>
Status: RESOLVED WONTFIX QA Contact:
Severity: 3 - Medium Impact    
Priority: --- CC: bart
Version: 21.08.8   
Hardware: Linux   
OS: Linux   
Site: SchedMD Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Tim Wickberg 2022-08-11 10:15:15 MDT
21.08/22.05 srun fails against a 20.11 slurmd with:

srun: job 16426 queued and waiting for resources
srun: job 16426 has been allocated resources
srun: error: io_init_msg_unpack: unpack error
srun: error: io_init_msg_read_from_fd: io_init_msg_unpack failed: rc=-1
srun: error: failed reading io init message
Comment 3 Tim Wickberg 2022-08-11 11:30:02 MDT
Unfortunately there's no way to readily restore this cross-version communication when dealing with 20.11. There were some structural changes to the I/O initialization going into 21.08 that caused this.

Going forward, however, this type of cross-version support is working correctly, and will be maintained. We've verified that a 22.05 srun vs. 21.08 slurmd/slurmstepd is working correctly, and will work to ensure this is not broken again in the future.