Ticket 191

Summary:	salloc -n behavior
Product:	Slurm	Reporter:	Don Lipari <lipari1>
Component:	Bluegene select plugin	Assignee:	Danny Auble <da>
Status:	RESOLVED FIXED	QA Contact:
Severity:	3 - Medium Impact
Priority:	---
Version:	2.4.x
Hardware:	IBM BlueGene
OS:	Linux
Site:	LLNL	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description Don Lipari 2012-12-18 10:13:02 MST

While we worked with you (bugs 157 and 166) to obtain the correct behavior for srun task options, there is now an inconsistency with how salloc works.

While a straight srun -n64 will correctly allocate 4 nodes, salloc -n64 allocates 64 nodes.

While a straight srun -N1 -n64 will correctly complain "This isn't a valid request without --overcommit", salloc -N1 -n64 succeeds and allocates 64 nodes.

Is there a rationale for this discrepancy or is it a bug?

Comment 1 Danny Auble 2012-12-18 10:28:52 MST

You shouldn't get 64 nodes there.  I'll see what I can find.  I am guessing this was always the case with salloc and not directly related to anything we did with the 2 bugs you mention here.

Comment 2 Danny Auble 2012-12-18 10:52:01 MST

This is fixed in 2.5.  It was referencing code that only applied to an L or P system.  sbatch was affected in the same way.

If you want to backport it to 2.4 the patch is here...

https://github.com/SchedMD/slurm/commit/3e89da1164312ab8a0d049cb70931347942340fa