Ticket 21766 - Jobs oversubscribing when resources should be allocated
Summary: Jobs oversubscribing when resources should be allocated
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 23.02.7
Hardware: Linux Linux
: 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2025-01-08 07:55 MST by sdavis2
Modified: 2025-01-08 07:56 MST (History)
0 users

See Also:
Site: -Other-
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
slurm.conf (4.34 KB, text/plain)
2025-01-08 07:55 MST, sdavis2
Details
oversubscribe test submit (633 bytes, text/x-sh)
2025-01-08 07:56 MST, sdavis2
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description sdavis2 2025-01-08 07:55:47 MST
Created attachment 40309 [details]
slurm.conf

I searched around for a similar issue, and haven't been able to find it, but sorry if it's been discussed before. 
We have a small cluster (14 nodes) and are running into an oversubscribe issue that seems like it shouldn't be there.
The partition I'm testing on has 256GB of Ram and 80 cores.
It's configured this way - 
PartitionName="phyq" MinNodes=1 DefaultTime=UNLIMITED MaxTime=UNLIMITED AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 OverSubscribe=FORCE:4 PreemptMode=OFF MaxMemPerNode=240000 DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL Nodes=phygrid[01-04]

Our Slurm.conf is set like this - 
SelectType=select/linear
SelectTypeParameters=CR_Memory

The job submitted is simply this - 
#!/bin/bash
#SBATCH --job-name=test_oversubscription    # Job name
#SBATCH --output=test_oversubscription%j.out # Output file
#SBATCH --error=test_oversubscription.err  # Error file
#SBATCH --mem=150G                         # Request 150 GB memory
#SBATCH --ntasks=1                         # Number of tasks
#SBATCH --cpus-per-task=60                  # CPUs per task
#SBATCH --time=00:05:00                    # Run for 5 minutes
#SBATCH --partition=phyq       # Replace with your partition name

# Display allocated resources
echo "Job running on node(s): $SLURM_NODELIST"
echo "Requested CPUs: $SLURM_CPUS_ON_NODE"
echo "Requested memory: $SLURM_MEM_PER_NODE MB"

# Simulate workload
sleep 300

In my mind this should submit to nodes 1, 2, 3, 4 and then when I submit a 5th job it should sit in Pending and when the first job ends it should go, but when I send the 5th job it goes to node 1. When a real job does this the performance goes way down because it's sharing resources even though they are requested. 

Am I missing something painfully obvious? 

Thanks for any help/advice.
Steve Davis
Comment 1 sdavis2 2025-01-08 07:56:18 MST
Created attachment 40310 [details]
oversubscribe test submit