Currently the slurm --cpu-freq= option is dependent on the ACPI cpufreq driver. However, with current Linux distros and x86 hardware, the out-of-the-box cpufreq driver is intel_pstate, which offers a number of benefits over the old ACPI cpufreq. This means that setting cpu frequency policy is not available for slurm, and slurmd spams the logs with messages that it can't find the scaling_cur_freq file in /sys. While intel_pstate does not support setting an exact frequency like acpi-cpufreq, a subset of the functionality could, I think, be usefully supported. For instance, a useful out-of-the-box behavior could be something like: - When a job starts, set the governor to "performance" on the cpu's allocated to the job. - When the job ends, set the governor to "powersave". Then one could allow jobs to override the above defaults with the --cpu-freq= option, although intel_pstate supports only "powersave" and "performance", so the usefulness of this is perhaps not that big. Similarly, while intel_pstate does not support setting the frequency, one can set the maximum and minimum p-states in /sys/devices/system/cpu/intel_pstate/. Although setting the governor to "performance" sets /sys/devices/system/cpu/intel_pstate/min_perf_pct to 100, so again, allowing to set this is perhaps not that useful. For more information about intel_pstate, see: https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt https://events.linuxfoundation.org/sites/events/files/slides/LinuxConEurope_2015.pdf
Updating this would certainly be useful, although we don't have a plan to handle it just yet. Marking as a potential feature enhancement request, and updating assignee to match.
Created attachment 3013 [details] Support the intel_pstate scaling driver Hi, here's a patch which implements support for intel_pstate. I noticed that the CpuFreqDef config option was only partially implemented. The value was parsed, but the never used. So I took the liberty of re-purposing it to mean sort of the opposite, namely the frequency governor to use when running a job step in case the job doesn't explicitly provide any --cpu-freq option. I also changed the default of the CpuFreqGovernors option to be "ondemand,performance", since ondemand isn't available with the intel_pstate driver. Otherwise the patch should be relatively straightforward and only changes a few minor things here and there.
Thank you for your contribution. Your patch is committed here: https://github.com/SchedMD/slurm/commit/a4f35c45eddf54d9305e5a16352dabdab3ad97b3