| Summary: | Comestic change to Cray contributed Slurm config generation file | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Kevin Buckley <kevin.buckley> |
| Component: | Configuration | Assignee: | Tim Wickberg <tim> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 5 - Enhancement | ||
| Priority: | --- | ||
| Version: | 20.02.2 | ||
| Hardware: | Cray XC | ||
| OS: | Linux | ||
| Site: | Pawsey | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | 6 UP07 | Version Fixed: | |
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: | Ignore empty slots on XC service blades | ||
Hey Kevin - Thanks for the submission, but unfortunately since the XC series is EOL I don't have a good contact at Cray/HPE to validate this type of change - they've all shifted their focus to Shasta - and I'm quite hesitant to roll it out globally on the chance that it could cause issues. You're certainly welcome to keep using this as a site local patch though. cheers, - Tim |
Created attachment 14063 [details] Ignore empty slots on XC service blades Given this output smw# xtcli status s0 | grep service c0-0c0s0n0: service | empty [noflags|] c0-0c0s0n1: service SB08 X86| ready [noflags|] c0-0c0s0n2: service SB08 X86| ready [noflags|] c0-0c0s0n3: service | empty [noflags|] c0-0c0s1n0: service | empty [noflags|] c0-0c0s1n1: service SB08 X86| ready [noflags|] c0-0c0s1n2: service SB08 X86| ready [noflags|] c0-0c0s1n3: service | empty [noflags|] c0-0c0s2n0: service | empty [noflags|] c0-0c0s2n1: service SB08 X86| ready [noflags|] c0-0c0s2n2: service SB08 X86| ready [noflags|] c0-0c0s2n3: service | empty [noflags|] c0-0c0s3n0: service IV20 X86| ready [noflags|] it has always "bugged me" that <slurm_source>/contribs/cray/csm/slurmconfgen_smw.py -t $SLURM_DIR/contribs/cray/csm/ -o $SLURM_CONF_DIR \ sdb p0 returns, for example Getting list of service nodes... Found 13 service nodes. Gathering hardware inventory... Found 19 compute nodes. Compacting node configuration... Compacted into 5 group(s). Writing Slurm configuration to /root/20200501/slurm-20.02.1/slurm.conf... Writing gres configuration to /root/20200501/slurm-20.02.1/gres.conf... Done when there are clearly only SEVEN service nodes, not 13. The attached patch if cname: - service.append(cname.group(1)) + empty = re.search( + r'\|\s+empty\s+\[', + line) + if empty is None: + service.append(cname.group(1)) ignores empty slots on the service blades and so returns the correct number of service nodes, vis: <slurm_source>/contribs/cray/csm/slurmconfgen_smw_kmb.py -t $SLURM_DIR/contribs/cray/csm/ -o $SLURM_CONF_DIR \ sdb p0 Getting list of service nodes... Found 7 service nodes. Gathering hardware inventory... Found 19 compute nodes. Compacting node configuration... You might want to run this one past someone at Cray, just in case they really do want to "configure" those empty slots. BTW, this was run on our Test&Dev system, hence the small number statistics. Kevin M. Buckley -- Supercomputing Systems Administrator Pawsey Supercomputing Centre