| Summary: | Core specialization doesn't work in interactive job | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Jason Repik <jjrepik> |
| Component: | User Commands | Assignee: | Brian Christiansen <brian> |
| Status: | RESOLVED DUPLICATE | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 17.02.7 | ||
| Hardware: | Cray XC | ||
| OS: | Linux | ||
| Site: | Sandia National Laboratories | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: | slurm.conf | ||
|
Description
Jason Repik
2017-11-13 11:56:42 MST
job_set_corespec(20673, 1) is the Cray API call and arguments. The first argument is the container ID and the second is the core count. Since a core count of 1 really should be good, I'll guess the problem is with the container ID. What is your configured ProctrackType value? It should be "proctrack/cray". If that is what you already have, then please attach your slurm.conf file. sdb:/etc/opt/slurm # grep -i proc slurm.conf ProctrackType=proctrack/cray sdb:/etc/opt/slurm # Created attachment 5553 [details]
slurm.conf
This is the same error as seen in Bug 4008. The following patch was added to 17.02.8 to address the issue. https://github.com/SchedMD/slurm/commit/525cde12e8d4ea771ca73aec01102a924bc369ca This is most likely why you aren't seeing the issue on LANL's systems -- since they are using 17.02.9 which as the patch. Can you apply the patch or upgrade and confirm the patch fixes it for you? We will be installing 17.02.9 next week. I'll run test case again after the update. Thanks. Did the upgrade happen and were you able to test? Yes, the upgrade did happen last week and I apologize for not updating the case. mutrino:~/yaml/yaml-cpp-master/build> salloc -N1 --time=00:20:00 -S 1 salloc: Granted job allocation 3272739 nid00109:~/yaml/yaml-cpp-master/build> srun -n 1 hostname nid00109 nid00109:~/yaml/yaml-cpp-master/build> Everything seems to work as expected. No problem. Good to hear. I'll close the bug. Thanks, Brian *** This ticket has been marked as a duplicate of ticket 4003 *** |