Ticket 6146

Summary: minor clarification to salloc / sbatch / srun / scontrol man page - singleton dependency option
Product: Slurm Reporter: Deric Sullivan <deric.sullivan>
Component: DocumentationAssignee: Jacob Jenson <jacob>
Status: RESOLVED FIXED QA Contact:
Severity: C - Contributions    
Priority: ---    
Version: 19.05.x   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 18.08.4 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Deric Sullivan 2018-11-30 12:59:27 MST
Hi,
There is some ambiguity in the wording around the singleton dependency option for the man pages of salloc, sbatch, srun and scontrol.  With the current wording, it's not clear if the dependency implies that the job can begin execution "if any one" or "if all" jobs fitting the criteria have terminated.  The actual case is "if all".  Also, using "submitted jobs" instead of "launched jobs" seems to help clarify the wording.
(Note that even this is not 100% accurate when looking at the source code from ./src/slurmctld/job_scheduler.c and seeing that the comparison uses the job_id (qjob_ptr->job_id < job_ptr->job_id) which will not work as expected for the edge case when the job IDs wrap.  I would consider this a bug but reporting a bug in the C code is not the intent of this bug report.)


In case you want a patch for each of these man pages:
$diff -Naur doc/man/man1/salloc.1 doc/man/man1/salloc.1.new 
--- doc/man/man1/salloc.1	2018-11-29 22:33:46.226503000 +0000
+++ doc/man/man1/salloc.1.new	2018-11-30 18:22:48.549241000 +0000
@@ -468,7 +468,7 @@
 Gang scheduling of resources in the partition is also not supported.
 .TP
 \fBsingleton\fR
-This job can begin execution after any previously launched jobs
+This job can begin execution after all previously submitted jobs
 sharing the same job name and user have terminated.
 .RE
 
$
$diff -Naur doc/man/man1/sbatch.1 doc/man/man1/sbatch.1.new 
--- doc/man/man1/sbatch.1	2018-11-29 22:33:46.231483000 +0000
+++ doc/man/man1/sbatch.1.new	2018-11-30 19:21:59.828174000 +0000
@@ -513,7 +513,7 @@
 Gang scheduling of resources in the partition is also not supported.
 .TP
 \fBsingleton\fR
-This job can begin execution after any previously launched jobs
+This job can begin execution after all previously submitted jobs
 sharing the same job name and user have terminated.
 .RE
 
$
$diff -Naur doc/man/man1/srun.1 doc/man/man1/srun.1.new 
--- doc/man/man1/srun.1	2018-11-29 22:33:46.263486000 +0000
+++ doc/man/man1/srun.1.new	2018-11-30 19:22:37.106708000 +0000
@@ -694,7 +694,7 @@
 Gang scheduling of resources in the partition is also not supported.
 .TP
 \fBsingleton\fR
-This job can begin execution after any previously launched jobs
+This job can begin execution after all previously submitted jobs
 sharing the same job name and user have terminated.
 .RE
 
$
$diff -Naur doc/man/man1/scontrol.1 doc/man/man1/scontrol.1.new 
--- doc/man/man1/scontrol.1	2018-11-29 22:33:46.240471000 +0000
+++ doc/man/man1/scontrol.1.new	2018-11-30 19:23:25.713113000 +0000
@@ -657,7 +657,7 @@
 executed (ran to completion with an exit code of zero).
 .TP
 \fBsingleton\fR
-This job can begin execution after any previously launched jobs
+This job can begin execution after all previously submitted jobs
 sharing the same job name and user have terminated.
 .RE
 .TP
$



Thanks,
Deric
Comment 2 Moe Jette 2018-12-05 08:09:57 MST
I believe the current wording is correct, but did expand the explanation a bit in this commit:

https://github.com/SchedMD/slurm/commit/9ef8dfd8ad2c03f8bff3cb27069853b7f2199e84

In the future if you have a patch to submit, please set the "Importance" field to "C-Contributions" rather than "6-No support contract". That would make it far more likely to get looked at by one of the developers or support staff rather than sales staff.

Regarding the job-id ordering, changing to submit time rather than job-id would not work. Jobs that get requeued have their submit time updated, so I think we'll just leave it as is for now.
Comment 3 Deric Sullivan 2018-12-05 12:31:44 MST
Thank you Moe.
Deric Sullivan