Ticket 14736 - 22.05/21.08 srun running against 20.11 slurmd - "io_init_msg_unpack: unpack error"
Summary: 22.05/21.08 srun running against 20.11 slurmd - "io_init_msg_unpack: unpack e...
Status: RESOLVED WONTFIX
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other tickets)
Version: 21.08.8
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-08-11 10:15 MDT by Tim Wickberg
Modified: 2022-08-11 11:30 MDT (History)
1 user (show)

See Also:
Site: SchedMD
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Tim Wickberg 2022-08-11 10:15:15 MDT
21.08/22.05 srun fails against a 20.11 slurmd with:

srun: job 16426 queued and waiting for resources
srun: job 16426 has been allocated resources
srun: error: io_init_msg_unpack: unpack error
srun: error: io_init_msg_read_from_fd: io_init_msg_unpack failed: rc=-1
srun: error: failed reading io init message
Comment 3 Tim Wickberg 2022-08-11 11:30:02 MDT
Unfortunately there's no way to readily restore this cross-version communication when dealing with 20.11. There were some structural changes to the I/O initialization going into 21.08 that caused this.

Going forward, however, this type of cross-version support is working correctly, and will be maintained. We've verified that a 22.05 srun vs. 21.08 slurmd/slurmstepd is working correctly, and will work to ensure this is not broken again in the future.