| Summary: | GPU accounting for AllocTRES is missing GPU subtype for some jobs | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Trey Dockendorf <tdockendorf> |
| Component: | Accounting | Assignee: | Scott Hilton <scott> |
| Status: | RESOLVED DUPLICATE | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 20.02.4 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Ohio State OSC | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: |
gres.conf
slurm.conf |
||
|
Description
Trey Dockendorf
2020-08-07 14:40:31 MDT
Created attachment 15358 [details]
gres.conf
Created attachment 15359 [details]
slurm.conf
Maybe this is expected behavior after reading AccountingStorageTRES more closely. I suppose the exclusive job only got the subtype because all GRES got assigned to the job. If I removed "gres/gpu" from AccountingStorageTRES and only had the items with subtype, would that ensure a job like my example would be assigned the subtype in accounting? The docs make it seem like there is some issues with that. Trey,
This is a known issue and I just happen to be currently working on a fix the 20.11 release.
This is just an accounting issue, the gpus are being allocated to the jobs just fine.
Until, if you really want to track gres types all requests users would need to specify the type they are asking for, like this:
>sbatch --gres=gpu:v100:1 hostname.sbatch
Thanks,
-Scott
Trey, As an update, this is unlikely to be be finished in 20.11 but is being actively worked on. It may take a while due to the refactoring needed to address this issue. This was also brought up in 8024. I am marking this as a duplicate. If you personally have a question about this feel free to reopen this ticket. --Scott *** This ticket has been marked as a duplicate of ticket 8024 *** |