Ticket 191

Summary: salloc -n behavior
Product: Slurm Reporter: Don Lipari <lipari1>
Component: Bluegene select pluginAssignee: Danny Auble <da>
Status: RESOLVED FIXED QA Contact:
Severity: 3 - Medium Impact    
Priority: ---    
Version: 2.4.x   
Hardware: IBM BlueGene   
OS: Linux   
Site: LLNL Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Don Lipari 2012-12-18 10:13:02 MST
While we worked with you (bugs 157 and 166) to obtain the correct behavior for srun task options, there is now an inconsistency with how salloc works.

While a straight srun -n64 will correctly allocate 4 nodes, salloc -n64 allocates 64 nodes.

While a straight srun -N1 -n64 will correctly complain "This isn't a valid request without --overcommit", salloc -N1 -n64 succeeds and allocates 64 nodes.

Is there a rationale for this discrepancy or is it a bug?
Comment 1 Danny Auble 2012-12-18 10:28:52 MST
You shouldn't get 64 nodes there.  I'll see what I can find.  I am guessing this was always the case with salloc and not directly related to anything we did with the 2 bugs you mention here.
Comment 2 Danny Auble 2012-12-18 10:52:01 MST
This is fixed in 2.5.  It was referencing code that only applied to an L or P system.  sbatch was affected in the same way.

If you want to backport it to 2.4 the patch is here...

https://github.com/SchedMD/slurm/commit/3e89da1164312ab8a0d049cb70931347942340fa