The acct_gather_energy/IPMI documentation[1] specifies how the EnergyIPMIPowerSensors=Node=DCMI may be configured. The corresponding freeipmi command displays compute node DCMI power readings as expected, for example: $ ipmi-dcmi --get-system-power-statistics Current Power : 2429 Watts Minimum Power over sampling duration : 342 watts Maximum Power over sampling duration : 2925 watts Average Power over sampling duration : 1752 watts Time Stamp : 09/16/2023 - 09:33:15 Statistics reporting time period : 1926185000 milliseconds Power Measurement : Active However, not all BMCs support the IPMI DCMI extensions. We have some Huawei/Xfusion nodes which report an error in stead: $ ipmi-dcmi --get-system-power-statistics ipmi_cmd_dcmi_get_power_reading: command invalid or unsupported Before enabling EnergyIPMIPowerSensors=Node=DCMI into our production system, I would like to ask if slurmd on such nodes are going to handle DCMI errors gracefully, or potentially exit with a fatal error? I don't fully understand the source code in src/plugins/acct_gather_energy/ipmi/acct_gather_energy_ipmi.c Next, will it be possible to have a fallback acct_gather_energy plugin such as the RAPL plugin so at least CPU+DIMM power readings will be reported in stead of any invalid zero values from DCMI? I guess the question is whether a list of AcctGatherEnergyType plugins is possible? Thanks, Ole [1] https://slurm.schedmd.com/acct_gather.conf.html#SECTION_acct_gather_energy/IPMI
Ole, > Before enabling EnergyIPMIPowerSensors=Node=DCMI into our production system, > I would like to ask if slurmd on such nodes are going to handle DCMI errors > gracefully, or potentially exit with a fatal error? Looking through the code, the only fatals I found are more related with wrong configuration of this plugins. All of them will fatal (terminating slurmd) during initialization. These are: 1 - Trying to load ipmi (AcctGatherEnergyType=acct_gather_energy/ipmi) without the library compiled (acct_gather_energy_ipmi.so). 2 - Having a malformed EnergyIPMIPowerSensors in etc/acct_gather.conf 3 - having a negative value for EnergyIPMIFrequency in etc/acct_gather.conf All the other errors, will print their corresponding error log, and will have as a consequence readings with 0 or n/s value. Running it in my local machine, for example: [2023-09-19T17:15:42.179] error: _get_dcmi_power_reading: get DCMI power reading failed And node displayed: CurrentWatts=0 AveWatts=0 > Next, will it be possible to have a fallback acct_gather_energy plugin such > as the RAPL plugin so at least CPU+DIMM power readings will be reported in > stead of any invalid zero values from DCMI? I guess the question is whether > a list of AcctGatherEnergyType plugins is possible? This is not currently supported. Only one plugin should be configured for AcctGatherEnergyType. I have tested setting them both, in a comma separated list. And slurm will not complain, will simply load both at the same time. But I expect this to have unpredictable results, having them both overwriting same data. The alternative I see to handle your case, would be to set a different etc/acct_gather.conf file for each node. This way, you will be able to configure EnergyIPMIPowerSensors matching the nodes capabilities. To use RAPL though, you would also need to have different slurm.conf for each node. You might already be using that, but in case you have all configs centralized under a shared directory, you could do a symlink like: /nfs/etc/slurm/acct_gather.conf -> /local/etc/slurm/acct_gather.conf But that would imply to maintain a copy of acct_gather.conf in each node (it is not something that should change frequently). Let me know your thoughts about it, Kind regards, Oscar
Hi Oscar, Thanks a lot for your detailed answer! I have a first comment (more later): (In reply to Oscar Hernández from comment #2) > > Before enabling EnergyIPMIPowerSensors=Node=DCMI into our production system, > > I would like to ask if slurmd on such nodes are going to handle DCMI errors > > gracefully, or potentially exit with a fatal error? > Looking through the code, the only fatals I found are more related with > wrong configuration of this plugins. All of them will fatal (terminating > slurmd) during initialization. These are: > > 1 - Trying to load ipmi (AcctGatherEnergyType=acct_gather_energy/ipmi) > without the library compiled (acct_gather_energy_ipmi.so). It seems that if the file acct_gather.conf exists (we use configless), and even if we do NOT yet use AcctGatherEnergyType=acct_gather_energy/ipmi, the slurmd's will read acct_gather.conf and crash on our system :-( The reason is that the slurmd's from our RPM build host didn't yet have the precondition freeipmi-devel RPM package installed - please refer to bug 17706. We're going to install updated Slurm RPM packages on all compute nodes next Tuesday, so this issue should go away. My testing discovered that even though we have set AcctGatherEnergyType=acct_gather_energy/rapl (i.e., RAPL and NOT IPMI) in slurm.conf, this caused all slurmd's to crash with error messages: [2023-09-21T11:52:50.010] error: _parse_next_key: Parsing error at unrecognized key: EnergyIPMIPowerSensors [2023-09-21T11:52:50.010] error: _parse_next_key: Parsing error at unrecognized key: EnergyIPMIFrequency [2023-09-21T11:52:50.010] error: _parse_next_key: Parsing error at unrecognized key: EnergyIPMICalcAdjustment [2023-09-21T11:52:50.010] fatal: Could not open/read/parse acct_gather.conf file /var/spool/slurmd/conf-cache/acct_gather.conf. Many times this is because you have defined options for plugins that are not loaded. Please check your slurm.conf file and make sure the plugins for the options listed are loaded. My /etc/slurm/acct_gather.conf file has this content: EnergyIPMIPowerSensors=Node=DCMI EnergyIPMIFrequency=60 EnergyIPMICalcAdjustment=yes So the Slurm build host REALLY must have the freeipmi-devel RPM package installed! Furthermore, I'm guessing that also the slurmd nodes must have the freeipmi-devel RPM package installed. What I'm seeing is that /usr/lib64/libfreeipmi.so (a soft-link) is only installed by the freeipmi-devel RPM, whereas the actual library file is part of the freeipmi RPM: $ rpm -ql freeipmi | grep lib64/libfree /usr/lib64/libfreeipmi.so.17 /usr/lib64/libfreeipmi.so.17.1.4 $ rpm -ql freeipmi-devel | grep lib64/libfree /usr/lib64/libfreeipmi.so $ ls -l /usr/lib64/libfreeipmi.so lrwxrwxrwx. 1 root root 21 Sep 18 13:34 /usr/lib64/libfreeipmi.so -> libfreeipmi.so.17.1.4 When I want to "yum update" the slurm* RPM packages, only the freeipmi and not the freeipmi-devel RPM dependency gets installed: ================================================================================ Package Arch Version Repository Size ================================================================================ Updating: auto_tmpdir x86_64 1.0.2-23.02.5.el7 /auto_tmpdir-1.0.2-23.02.5.el7.x86_64 35 k slurm x86_64 23.02.5-1.el7 /slurm-23.02.5-1.el7.x86_64 75 M slurm-contribs x86_64 23.02.5-1.el7 /slurm-contribs-23.02.5-1.el7.x86_64 32 k slurm-devel x86_64 23.02.5-1.el7 /slurm-devel-23.02.5-1.el7.x86_64 372 k slurm-pam_slurm x86_64 23.02.5-1.el7 /slurm-pam_slurm-23.02.5-1.el7.x86_64 470 k slurm-perlapi x86_64 23.02.5-1.el7 /slurm-perlapi-23.02.5-1.el7.x86_64 3.1 M slurm-slurmd x86_64 23.02.5-1.el7 /slurm-slurmd-23.02.5-1.el7.x86_64 2.4 M slurm-torque x86_64 23.02.5-1.el7 /slurm-torque-23.02.5-1.el7.x86_64 390 k Installing for dependencies: freeipmi x86_64 1.5.7-3.el7 base-niflheim 2.0 M libjwt x86_64 1.12.1-7.el7 epel 24 k libyaml x86_64 0.1.4-11.el7_0 base-niflheim 55 k Question: Can you verify whether slurmd will actually require the link /usr/lib64/libfreeipmi.so to exist? I'm afraid that slurmd's may crash without it. If so, the RPM dependency freeipmi-devel will have to be added to the slurm.spec or configure files so that it will get installed automatically. I realize that these issues are only relevant only for RHEL/CentOS systems which are RPM based, but many Slurm sites use this family of OSes. Thanks a lot for your help, Ole
Ole, I am sorry for that... >and even if we do NOT yet use AcctGatherEnergyType=acct_gather_energy/ipmi, the >slurmd's will read acct_gather.conf and crash on our system :-( Yes, I agree that this is pretty inconvenient. The acct_gather.conf file can only be used for the ipmi energy plugin option. However, there are other plugins that may have their configuration options set in that file (e.g: acct_gather_profile/HDF5). That is the reason we always validate acct_gather.conf, to make sure it makes sense. Will check better, but I am not sure if we can improve much in the behavior here. With regard to the library, I was doing my tests in ubuntu, but also had to install libfreeipmi-dev (and libipmimonitoring-dev) to successfully build the library. So I will try to properly document this, as you mentioned in the other bug. I understand your concerns with the libraries. I would not expect the dev packages to be needed in compute nodes. Since, dev packages should be mainly needed to get the header files for compilation, but once it is compiled, I see no point in having them. However, I'll check about the symlink mentioned. For the moment, testing in my ubuntu system (where I have a similar symlink). I did test starting slurm with the symlink removed, and Slurm did not complain in any way, things seemed to work as expected. Afterwards, tested directly removing the libs: libfreeipmi.so.17 libfreeipmi.so.17.2.8 And got the following error: [2023-09-21T17:44:02.936] error: plugin_load_from_file: dlopen(/home/oscar/Projects/sandbox/17704/install/lib/slurm/acct_gather_energy_ipmi.so): libfreeipmi.so.17: cannot open shared object file: No such file or directory [2023-09-21T17:44:02.936] error: Couldn't load specified plugin name for acct_gather_energy/ipmi: Dlopen of plugin file failed [2023-09-21T17:44:02.936] error: cannot create acct_gather_energy context for acct_gather_energy/ipmi [2023-09-21T17:44:02.936] fatal: can not open the (null) plugin So, as you can see, it looks for the library versioned, not the basic symlink. Dev package does not seem to be needed. In any case, tomorrow I will test it out in CentOS, but I am expecting a similar behavior here. Cheers, Oscar
Hi Oscar, Thanks for a very detailed analysis! (In reply to Oscar Hernández from comment #4) > >and even if we do NOT yet use AcctGatherEnergyType=acct_gather_energy/ipmi, the > >slurmd's will read acct_gather.conf and crash on our system :-( > Yes, I agree that this is pretty inconvenient. The acct_gather.conf file can > only be used for the ipmi energy plugin option. However, there are other > plugins that may have their configuration options set in that file (e.g: > acct_gather_profile/HDF5). > > That is the reason we always validate acct_gather.conf, to make sure it > makes sense. Will check better, but I am not sure if we can improve much in > the behavior here. I think it's fine, now I understand this. > With regard to the library, I was doing my tests in ubuntu, but also had to > install libfreeipmi-dev (and libipmimonitoring-dev) to successfully build > the library. So I will try to properly document this, as you mentioned in > the other bug. > > I understand your concerns with the libraries. I would not expect the dev > packages to be needed in compute nodes. Since, dev packages should be mainly > needed to get the header files for compilation, but once it is compiled, I > see no point in having them. However, I'll check about the symlink mentioned. ... > So, as you can see, it looks for the library versioned, not the basic > symlink. Dev package does not seem to be needed. OK, this wasn't obvious. I'm very glad that you tested without the symlink /usr/lib64/libfreeipmi.so and showed that it actually works. > In any case, tomorrow I will test it out in CentOS, but I am expecting a > similar behavior here. I agree, but it's better to test also CentOS to be 100% sure. Then we will finally be sure that the required freeipmi libraries are installed as dependencies with the Slurm RPMs, and all should be good :-) Thanks a lot, Ole
Hi Oscar, (In reply to Oscar Hernández from comment #2) > > Next, will it be possible to have a fallback acct_gather_energy plugin such > > as the RAPL plugin so at least CPU+DIMM power readings will be reported in > > stead of any invalid zero values from DCMI? I guess the question is whether > > a list of AcctGatherEnergyType plugins is possible? > This is not currently supported. Only one plugin should be configured for > AcctGatherEnergyType. > > I have tested setting them both, in a comma separated list. And slurm will > not complain, will simply load both at the same time. But I expect this to > have unpredictable results, having them both overwriting same data. Thanks for testing a comma separated list of plugins. It's surprising that this undocumented list even works :-) I understand that the results may become unpredictable, so this setup should not be used at present. In the future, it might be good to develop a prioritized list of plugins: If IPMI fails to work, then try RAPL, then try ... > The alternative I see to handle your case, would be to set a different > etc/acct_gather.conf file for each node. This way, you will be able to > configure EnergyIPMIPowerSensors matching the nodes capabilities. To use > RAPL though, you would also need to have different slurm.conf for each node. Thanks for the suggestion. We're using configless and are extremely happy with it, so having a different slurm.conf and acct_gather.conf on each node would be a step backward :-( > You might already be using that, but in case you have all configs > centralized under a shared directory, you could do a symlink like: > > /nfs/etc/slurm/acct_gather.conf -> /local/etc/slurm/acct_gather.conf > > But that would imply to maintain a copy of acct_gather.conf in each node (it > is not something that should change frequently). I see that this should work, but we would like to stay with configless. Another idea: Since slurm.conf now accepts INCLUDE MODIFIERS, we could have a node local file defining the relevant parameter AcctGatherEnergyType=acct_gather_energy/ipmi or AcctGatherEnergyType=acct_gather_energy/rapl. We could then have a global acct_gather.conf defining the IPMI parameters. Do you think it is a good idea to have in slurm.conf a line like this pointing to a local config file: include /local/etc/slurm/AcctGatherEnergyType.conf This file would have to exist on the slurmctld server as well as on all slurmd nodes. It is, however, not obvious to me if the include file gets resolved by slurmctld and passed to all slurmd's using configless, or if slurmd will read the file locally? Can you resolve this question? Thanks, Ole
Hi Ole, >I agree, but it's better to test also CentOS to be 100% sure. Then we will >finally be sure that the required freeipmi libraries are installed as >dependencies with the Slurm RPMs, and all should be good :-) Tested in a Alma8 system I had in hand (which uses same package system).. Packages include the same: $ rpm -ql freeipmi | grep lib64/libfree /usr/lib64/libfreeipmi.so.17 /usr/lib64/libfreeipmi.so.17.2.7 $ rpm -ql freeipmi-devel | grep lib64/libfree /usr/lib64/libfreeipmi.so After running: $ yum remove freeipmi-devel Slurm still initializes and load the plugin. However, after running: $ yum remove freeipmi >[2023-09-22T09:54:49.691] error: plugin_load_from_file: dlopen(/home/vagrant/slurm-23/install-23.11/lib/slurm/acct_gather_energy_ipmi.so): libipmimonitoring.so.6: cannot open shared object file: No such file or directory >[2023-09-22T09:54:49.691] error: Couldn't load specified plugin name for acct_gather_energy/ipmi: Dlopen of plugin file failed >[2023-09-22T09:54:49.691] error: cannot create acct_gather_energy context for acct_gather_energy/ipmi >[2023-09-22T09:54:49.691] fatal: can not open the (null) plugin You could also check if it is linked to the versioned lib with ldd: $ ldd /home/test/slurm-23/install/lib/slurm/acct_gather_energy_ipmi.so linux-vdso.so.1 (0x00007ffcabd21000) libipmimonitoring.so.6 => not found <- libfreeipmi.so.17 => not found <- libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007f080ef00000) ... I can confirm that having only libfreeipmi won't fatal. >In the future, it might be good to develop a prioritized list of plugins: If >IPMI fails to work, then try RAPL, then try ... I see the idea here, it seems this would also require some rework also for acct_gather.conf. Since its options may crash slurmd if they do not match the option picked from the list. I am also thinking that it could also be ambiguous to know which plugin was finally used for each node. Anyway, these are just some thoughts that came to my mind, the suggestion is much appreciated. >This file would have to exist on the slurmctld server as well as on all slurmd >nodes. It is, however, not obvious to me if the include file gets resolved by >slurmctld and passed to all slurmd's using configless, or if slurmd will read >the file locally? Can you resolve this question? When configless is enabled, the slurm controller will share the trackable configurations files located in the same folder. This includes any "include" that might be in the same folder of slurm.conf. But you can have includes to files to different local folders, and these ones will not be sent over, which is convenient for our case. I have done some testing, and I believe your idea using includes will work. This is the scenario I tested: Do not set AcctGatherEnergyType in the controllers slurm.conf. Since this will be spread to all nodes. Instead, as you suggested, add a line with(just using /etc/custom as an example path here, this should be a node-local path and different to the path where slurm.conf is placed in the controller): include etc/custom/AcctGatherEnergyType.conf Then, in acct_gather.conf, do not set any option referring to Energy plugins. Since as you already saw, having options for "ipmi" when "rapl" is loaded will cause the deamon to fatal on startup. Instead of the options for the enrgy plugin, add: include /etc/custom/EnergyOptions.conf In all your slurmds/slurmctld, you will need to have a couple of local files (this files content will be permanent, only defining the energygather options): etc/custom/AcctGatherEnergyType.conf etc/custom/EnergyOptions.conf Contents for that files, should be(depending on the node): ####etc/custom/AcctGatherEnergyType.conf AcctGatherEnergyType=acct_gather_energy/ipmi or AcctGatherEnergyType=acct_gather_energy/rapl ####etc/custom/EnergyOptions.conf (depoending on the plugin loaded) #for ipmi, these options can be tuned for each node. EnergyIPMIPowerSensors=Node=DCMI EnergyIPMIFrequency=5 #in case of rapl, should be an empty file. That way, you will be able to have a general configuration distributed via configless. But also having some custom EnergyGather configuration for some specific nodes. I have tested it, but would suggest to test it out in a small subset first, to make sure things run as expected. Let me know if you have any doubt/question with that suggestion. Or if there is something inconvenient with it. Kind regards, Oscar
Hi Oscar, Thanks for testing the libraries, so I think we're in a good situation with the libfreeipmi and it ought to work Your detailed description in comment 7 of exactly how to use include files makes a lot of sense. I will consider this, once I have configured AcctGatherEnergyType=acct_gather_energy/ipmi in stead of our current RAPL. I will put this on my ToDo list and look at it later. At this time I have all the necessary information, so you are welcome to close this case. Thanks for your excellent support! Ole
Great! Closing then. Just re-open if you have any doubt/issue. Oscar
Hi Ole, Talking with some colleagues today, they brought to my attention a bug we recently found when using DCMI: Bug 17639. In brief, thanks to what Marshall was able to get from the backtrace in Bug 17639 comment 30. It seems that there is a current limitation in the freeipmi lib, it is using select(), which is limited to 1024 file descriptors. So when a greater device_fd is assigned, like it is happening in the bug, it crashes the slurmd process. Since this is part of an external library, all we can do now is suggest to avoid using it(only DCMI is affected). But we are currently looking for alternatives to handle the crash. Apologies for bringing up this news now, but I was not aware of them last week. And I consider them relevant given your current intention of switching AcctGatherEnergyType plugins. Kind regards, Oscar
Hi Oscar, Thanks so much for the important information: (In reply to Oscar Hernández from comment #10) > Talking with some colleagues today, they brought to my attention a bug we > recently found when using DCMI: Bug 17639. > > In brief, thanks to what Marshall was able to get from the backtrace in Bug > 17639 comment 30. It seems that there is a current limitation in the > freeipmi lib, it is using select(), which is limited to 1024 file > descriptors. So when a greater device_fd is assigned, like it is happening > in the bug, it crashes the slurmd process. > > Since this is part of an external library, all we can do now is suggest to > avoid using it(only DCMI is affected). But we are currently looking for > alternatives to handle the crash. I will put my work on using FreeIPMI power monitoring on hold for the time being. IMHO, it would be good if SchedMD can find some workaround for the problem. I may also try to alert the FreeIPMI developer to this issue using their mailing list. Thanks, Ole
Ole, > I will put my work on using FreeIPMI power monitoring on hold for the time > being. IMHO, it would be good if SchedMD can find some workaround for the > problem. Thanks for your understanding. > I may also try to alert the FreeIPMI developer to this issue using > their mailing list. We are also looking into some patch proposal for FreeIPMI. Cheers, Oscar