Ticket 24994 - *_nodes_alloc metrics reports inflated counts
Summary: *_nodes_alloc metrics reports inflated counts
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 26.11.x
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Michael Steed
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2026-04-08 13:57 MDT by Charlie Getzen
Modified: 2026-04-20 07:06 MDT (History)
2 users (show)

See Also:
Site: -Other-
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
node_bitmap solution (3.01 KB, patch)
2026-04-08 13:57 MDT, Charlie Getzen
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Charlie Getzen 2026-04-08 13:57:31 MDT
Created attachment 44909 [details]
node_bitmap solution

## Problem

`slurm_user_jobs_nodes_alloc` and `slurm_jobs_nodes_alloc` report inflated node counts when a user runs multiple sub-node-sized jobs that shared physical nodes.

## Root cause

`nodes_alloc` is computed by summing `total_nodes` across each individual running job for the user. When multiple jobs fit on the same physical node (e.g. jobs requesting only a fraction of a node's CPUs), each job contributed its own `total_nodes = 1` to the sum, even though they all shared the same node.

## Solution

Track node allocation using bitmaps and OR each job's `node_bitmap`. After all jobs have been processed, count the set bits to get the actual number of unique nodes in use.
Comment 1 Michael Steed 2026-04-14 14:07:43 MDT
https://support.schedmd.com/show_bug.cgi?id=24999
Comment 2 Felip Moll 2026-04-20 07:01:18 MDT
Hi Charlie,

Thanks for reporting. We are aware of this issue and a new fix is being implemented.
We'll let you know once it is done.

Thanks!