Summary: CPU affinity detected by oneapi GPU plugin is wrong. CPU affinity of GPU is "0-23,48-71", but reported as "0-23,4-71". Details: We configured Slurm to use Intel GPU in gres.conf. And we observed that the CPU affinity of the GPU is wrong from the slurmd.log. CPU affinity of GPU is "0-23,48-71", but reported as "0-23,4-71". slurmd: debug2: gpu/oneapi: _oneapi_get_device_name: Device name is: card1 slurmd: debug2: gpu/oneapi: _oneapi_read_cpu_affinity_list: Read file: /sys/class/drm/card1/device/local_cpulist slurmd: debug2: gpu/oneapi: _oneapi_read_cpu_affinity_list: line is: 0-23,48-71 slurmd: debug2: gpu/oneapi: _oneapi_read_cpu_affinity_list: tok is :0-23 slurmd: debug2: gpu/oneapi: _oneapi_read_cpu_affinity_list: cpu range is: 0~23 slurmd: debug2: gpu/oneapi: _oneapi_read_cpu_affinity_list: tok is :48-71 slurmd: debug2: gpu/oneapi: _oneapi_read_cpu_affinity_list: cpu range is: 4~71 The bug is caused by the strlcpy change in https://github.com/SchedMD/slurm/blob/f504d215ab9324e1dfd04879a3be286dc1afd7bb/src/plugins/gpu/oneapi/gpu_oneapi.c#L811 strlcpy(buf, tok, pos); //should be strlcpy(buf, tok, pos + 1);
commit 2d8014d976404e453bc127d362cffe5c2e73289d Author: Marcin Stolarek <cinek@schedmd.com> AuthorDate: Wed Apr 19 10:28:58 2023 +0000 gpu_oneapi - Fix CPU range parsing A regression from ceae922dc3 caused the string to be truncated by one character. Rather than correcting the strlcpy() math, just pivot to using atoi() directly on the existing strings. Bug 16547
The reported issue is fixed by the mentioned commit. It will be part of Slurm 23.02.2 release. cheers, Marcin