| Summary: | Wrong "core file size" limit on slurmd | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Misha Ahmadian <misha.ahmadian> |
| Component: | Limits | Assignee: | Carlos Tripiana Montes <tripiana> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 20.11.7 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | TTU | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: | slurm.conf | ||
|
Description
Misha Ahmadian
2022-02-28 13:11:15 MST
Hi Misha, I want to take a look at: $ pstree -alnpst $(pidof slurmd) But most probably, the thing goes as follows: 1. As stated in [1]: > Resource limits not configured explicitly for a unit default to the value configured in the various DefaultLimitCPU=, DefaultLimitFSIZE=, … options available in systemd-system.conf(5), and – if not configured there – the kernel or per-user defaults, as defined by the OS (the latter only for user services, see below). As no LimitCORE seems to be defined for this systemd service, this value is taken from DefaultLimitCORE. In [2]: > DefaultLimitCORE= does not have a default but it is worth mentioning that RLIMIT_CORE is set to "infinity" by PID 1 which is inherited by its children. Following on, regarding limits.conf/limits.d [3], these files relate only to pam_limits.so. This means *only to user login session*. For this reason limits there don't apply to systemd services, which are affected by the whole [4] (being [1] part of [4]). At the end of the day, I think any service will have unlimited core dump size, not just slurmd, unless explicit LimitCORE is added to the service file or something. That should be easy to check. As a conclusion: It doesn't seem that there's a Slurm bug, propagating unlimited core size from logins. It is that limits.conf is not used for services, and you need to add either DefaultLimitCORE in systemd/system.conf or LimitCORE to slurmd service file. Cheers, Carlos. [1] https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Process%20Properties [2] https://www.freedesktop.org/software/systemd/man/systemd-system.conf.html#DefaultLimitCPU= [3] https://linux.die.net/man/5/limits.conf [4] https://www.freedesktop.org/software/systemd/man/systemd-system.conf.html Misha, Guessing you have the issue covered by my last answer I'm going to close the issue as info given by now. Please reopen if needed. Cheers, Carlos. Hi Carlos, Sorry for the delay in my reply. I was busy with other stuff. Thank you very much. I'll let you know if I had further questions. Best, Misha |