| Summary: | srun --exclusive not working on system with ThreadsPerCore>1 | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Josko Plazonic <plazonic> |
| Component: | slurmd | Assignee: | Marcin Stolarek <cinek> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 19.05.5 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: | https://bugs.schedmd.com/show_bug.cgi?id=10290 | ||
| Site: | Princeton (PICSciE) | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 19.05.7 20.02.3 20.11.0pre1 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: |
Result of test task 0
Result of test task 1 Result of test task 2 Result of test task 3 fix _pick_step_cores for tasks_per_core > 1 for 19.05 (v1) |
||
|
Description
Josko Plazonic
2020-02-19 14:34:33 MST
Hi Josko,
I'm not 100% sure if I'm able to reproduce it could you please rerun your tests with the script like the one below executed as a step?
Please note that you'll have to replace '/sys/fs/cgroup/cpuset' with the location of your cgroup cpuset filesystem.
# cat /tmp/set.sh
#!/bin/bash
#exec > ./slurm-${SLURM_STEPID}
date
sleep 5
echo "===SHOW MY JOB==="
scontrol show job ${SLURM_JOBID}
echo "===SHOW MY STEPS==="
scontrol show step ${SLURM_JOB_ID}
echo "===SHOW MY CGROUP==="
/bin/cat /sys/fs/cgroup/cpuset/slurm_${SLURMD_NODENAME}/uid_${SLURM_JOB_UID}/job_${SLURM_JOB_ID}/step_${SLURM_STEPID}/cpuset.cpus
echo '===SHOW MY TASKSET==='
taskset -cp $$
date
echo '===DONE==='
Additionally please add -o option to your srun calls in cpuset.slurm, like here:
>srun -o slurm-%J.out -l --exclusive -n1 /tmp/testStep &
This will label each line with taskId (in this case always 0) and create a separate file per job step.
cheers,
Marcin
Created attachment 13127 [details]
Result of test task 0
Created attachment 13128 [details]
Result of test task 1
Created attachment 13129 [details]
Result of test task 2
Created attachment 13130 [details]
Result of test task 3
Just attached results. Process affinity list is still only 8 cpus and does not change. Josko, The result is quite surprising. did you run the same commands as in initial comment or maybe you changed --ntasks-per-node=4 or did you change it to 8? I think that this is what happened, but I'd like to be 100% sure. cheers, Marcin Oh, sorry - I had -c 8 there... It doesn't invalidate the test though, still allocating same CPUs to all exclusive tasks. Josko, I can reproduce it, but to be sure that we're at te same page in terms of trace in the code could you please share TaskPluginParam configuration parameter used from both clusters? cheers, Marcin Hi there, good one: [root@adroit4 ~]# scontrol show config | grep TaskPlugin TaskPlugin = affinity,cgroup TaskPluginParam = (null type) "bad" one: [root@traverse ~]# scontrol show config | grep TaskPlugin TaskPlugin = affinity,cgroup TaskPluginParam = (null type) Thanks, Josko Josko, I have a patch that should fix the issue, however, it didn't pass our QA process yet. Would you be interested in applying it locally before QA is completed? An alternative workaround that should work pretty good is to add --cpu-bind=none option to srun commands. This should disable task affinity which is limiting your steps to the same core and let the operating system assign resources - for computing-intensive processes this should work quite well. Let me know how you'd like to continue. cheers, Marcin If it is not too complex I should be able to add it to our build of slurm and test it. Thanks. Created attachment 13340 [details]
fix _pick_step_cores for tasks_per_core > 1 for 19.05 (v1)
Josko,
The attached patch should cleanly apply on top of 19.05, as mentioned before it didn't pass SchedMD QA and is not yet scheduled for release. It's passing our automated regression tests without an issue.
Your feedback will be very much appreciated.
cheers,
Marcin
Josko, Where you able to apply the patch and verify if it works for you? cheers, Marcin Josko, Did you have a chance to apply the patch and verify if it works for you? cheers, Marcin Comment on attachment 13340 [details]
fix _pick_step_cores for tasks_per_core > 1 for 19.05 (v1)
Josko,
The patch is undergoing review. Please don't apply it now, we should get back to you with a final solution soon.
cheers,
Marcin
Josko, Fix for the bug was merged and will be available in slurm-19.05.7[1]. cheers, Marcin [1]https://github.com/SchedMD/slurm/commit/9028d1d49d551ff26e92e3039274bdfab4fc5c80 |