| Summary: | sacct cputime documentation | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Ryan Cox <ryan_cox> |
| Component: | Accounting | Assignee: | David Bigagli <david> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | da |
| Version: | 14.03.6 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | BYU - Brigham Young University | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | 14.03.7 | |
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: | sacct_cputime_clarification.diff | ||
|
Description
Ryan Cox
2014-07-31 09:15:12 MDT
Okay, so it appears I messed up in my understanding of the code in one way. I said that the charge is (timelimit * cpus). It is actually (elapsed * cpus). Some of the similarly named variables (cpu_run_delta vs run_delta vs run_decay) all in close proximity seem to have gotten combined in my brain... However, that doesn't affect the need for the manpage clarification. The patch is still valid. In Slurm the CPU time always refers to elapsed time, I think that allocated is correct because it is the cputime that has been allocated to the job/step times the number of cpus. This is also true for GrpCPURunMins and GrpCPUMins which always refer to elapsed time. I think this concept comes from parallel job computing where times and speed up are always considered as elapsed/wall clock times. We should explain what the concept is... somewhere. David Okay. It's just that "elapsed" and "allocated" are different in my mind. You may be allocated 4 CPUs but only use 1 (not threading properly). Likewise, you may be allocated 1 hour (timelimit) but only run for 5 minutes. If you don't think it needs changing that's fine too. It just seemed ambiguous. I agree with you the concept is not explained right. The Slurm way of thinking is that you have been allocated x resources and whatever you do on then, run or not is your business, so every second that passes Slurm counts it as cpu time/allocated time. Those parameters in my opinion should not be called cputime and cputimeraw, they should be called elapsed/wall-clock time or time since the allocation started to meet the universally accepted context that cputime is the time consumed by using. I think that your patch clarifies things a bit. Committed d1b0dfd6d5 with minor change. David |