| Summary: | job_submit filter rejects batch scripts named 'batch' | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | pbisbal |
| Component: | Other | Assignee: | Nate Rini <nate> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | brian.gilmer |
| Version: | 18.08.4 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: |
https://bugs.schedmd.com/show_bug.cgi?id=5848 https://bugs.schedmd.com/show_bug.cgi?id=6278 |
||
| Site: | Princeton Plasma Physics Laboratory (PPPL) | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 18.08.5, 19.05 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: |
job_submit.lua file.
slurm.conf file my mpihello.sbatch file |
||
(In reply to pbisbal from comment #0) > This appears to be specific to 18.08.4. This issue did not occur with > 18.08.3. The same user who initially reported this problem used the same > batch script to submit a job to 18.08.3 less than 24 hours before the > upgrade. We upgraded to 18.08.3 on 11/20/18, so we've been using 18.08.3 for > about 1 month with no issues. Having a batch script named batch works on my test setup with your job_submit.lua: > $ sbatch batch > sbatch: NOTICE: Job assigned to ellis partition due to low number of nodes or CPUs requested. > Submitted batch job 7 > $ sbatch -V > slurm 18.08.4 I trimmed the attachment to the cat command. Can you please attach your slurm.conf. --Nate Created attachment 8714 [details]
slurm.conf file
Slurm.conf attached. (In reply to pbisbal from comment #4) > Slurm.conf attached. In an unrelated topic while reviewing your config: > TaskPlugin=task/cgroup From https://slurm.schedmd.com/slurm.conf.html: >NOTE: It is recommended to stack task/affinity,task/cgroup together when configuring TaskPlugin, and setting TaskAffinity=no and ConstrainCores=yes in cgroup.conf. (In reply to pbisbal from comment #4) > Slurm.conf attached. Have an error now: >$ sbatch batch > sbatch: error: NOTICE: Job assigned to ellis partition due to low number of nodes or CPUs requested. > sbatch: error: Batch job submission failed: Invalid qos specification Can you please provide the output of the following: > $ sacctmgr -p show qos --Nate (In reply to Nate Rini from comment #6) > (In reply to pbisbal from comment #4) I may have replicated the issue: $ sbatch batch sbatch: error: ERROR: A time limit must be specified sbatch: error: Batch job submission failed: Time limit specification required, but not provided $ cat batch #!/bin/bash #SBATCH -n 4 #SBATCH --mem=100M #SBATCH -p debug #SBATCH -J mpihello #SBATCH -o mpihello-%j.out #SBATCH -e mpihello-%j.err srun echo test #module load gcc/7.3.0 #module load openmpi/3.0.0 #mpiexec ./mpihello Can you please try this: > sbatch -t 00:01:00 batch Can you please attach your batch script too? Maybe I'm missing some whitespace issue. --Nate (In reply to Nate Rini from comment #5) > (In reply to pbisbal from comment #4) > > Slurm.conf attached. > > In an unrelated topic while reviewing your config: > > TaskPlugin=task/cgroup > > From https://slurm.schedmd.com/slurm.conf.html: > >NOTE: It is recommended to stack task/affinity,task/cgroup together when configuring TaskPlugin, and setting TaskAffinity=no and ConstrainCores=yes in cgroup.conf. Thanks for that unrelated tip. I assume to "stack" them together, I list them both like this in slurm.conf: TaskPlugin=task/affinity,task/cgroup Is that correct? Prentice Created attachment 8716 [details]
my mpihello.sbatch file
This is the exact sbatch file I've been using for my testing.
(In reply to pbisbal from comment #8) > (In reply to Nate Rini from comment #5) > > (In reply to pbisbal from comment #4) > > > Slurm.conf attached. > > > > In an unrelated topic while reviewing your config: > > > TaskPlugin=task/cgroup > > > > From https://slurm.schedmd.com/slurm.conf.html: > > >NOTE: It is recommended to stack task/affinity,task/cgroup together when configuring TaskPlugin, and setting TaskAffinity=no and ConstrainCores=yes in cgroup.conf. > > Thanks for that unrelated tip. I assume to "stack" them together, I list > them both like this in slurm.conf: > > TaskPlugin=task/affinity,task/cgroup > > Is that correct? Yes, that is correct. (In reply to pbisbal from comment #9) > Created attachment 8716 [details] > my mpihello.sbatch file > > This is the exact sbatch file I've been using for my testing. Using the same file: > $ ls -la batch > lrwxrwxrwx 1 nate nate 15 Dec 19 14:53 batch -> mpihello.sbatch > $ md5sum batch > d915e0d5a5eac4b702b6f11777c30bb5 batch > $ sbatch mpihello.sbatch > Submitted batch job 11 > $ sbatch batch > Submitted batch job 12 > $ md5sum etc/job_submit.lua > 6a5d57f4fb01c6c37e2dbffc65864fe8 etc/job_submit.lua I added the QOSs listed in job_submit.lua to avoid the invalid QOS error. Can you please call: > sbatch -v batch --Nate My sbatch file does specify a time limit in the file itself:
$ cat mpihello.sbatch
#!/bin/bash
#SBATCH -n 4
#SBATCH --mem=63000M
#SBATCH -p ellis
#SBATCH -t 00:01:00
#SBATCH -J mpihello
#SBATCH -o mpihello-%j.out
#SBATCH -e mpihello-%j.err
#SBATCH --mail-type=ALL
module load gcc/7.3.0
module load openmpi/3.0.0
#echo "SLURM_JOB_NODELIST=$SLURM_JOB_NODELIST"
#srun --mpi=pmi2 ./mpihello
mpiexec ./mpihello
But specifying -t on the command-line definitely fixes the problem:
$ ls -l batch
lrwxrwxrwx 1 pbisbal users 15 Dec 19 16:52 batch -> mpihello.sbatch
$ sbatch batch
sbatch: error: ERROR: A time limit must be specified
sbatch: error: Batch job submission failed: Time limit specification required, but not provided
$ sbatch -t 00:01:00 batch
sbatch: NOTICE: Job assigned to ellis partition due to low number of nodes or CPUs requested.
Submitted batch job 398549
That notice about the job being assigned to the ellis partition is coming from my job_submit.lua script. In this case, since I'm already requesting the ellis partition, that message should not be printed. If you look at my job_submit.lua script, that message should only be printed if a different partition was requested, or no partition was specified. Here's the code that does this:
-- time limits up to 8-08:00:00 are allowed on ellis, so move logic for ellis to here.
-- if a job requests <= 15 CPUs and 1 node it goes to ellis
if ( job_desc.min_cpus <= 15 and job_desc.time_limit <= 12000 and (job_desc.min_nodes == 0xfffffffe or job_desc.min_nodes == 1 )) then
if job_desc.partition ~= 'ellis' then
slurm.user_msg("NOTICE: Job assigned to ellis partition due to low number of nodes or CPUs requested.")
end
job_desc.partition = 'ellis'
job_desc.qos = 'ellis'
return slurm.SUCCESS
elseif ( job_desc.min_cpus <= 15 and job_desc.time_limit >= 12000 ) then
slurm.user_msg("Job rejected. Max. time limit for jobs in ellis partition is 8-08:00:00 (12000 minutes)" )
return 2051 -- ESLURM_INVALID_TIME_LIMIT
end
This logic worked the last time I tested it, but I think that was on 17.11.4 or 17.11.5. I don't think I personally tested this since upgrading to 18.08.3
In response to your earlier request: $ sacctmgr -p show qos Name|Priority|GraceTime|Preempt|PreemptMode|Flags|UsageThres|UsageFactor|GrpTRES|GrpTRESMins|GrpTRESRunMins|GrpJobs|GrpSubmit|GrpWall|MaxTRES|MaxTRESPerNode|MaxTRESMins|MaxWall|MaxTRESPU|MaxJobsPU|MaxSubmitPU|MaxTRESPA|MaxJobsPA|MaxSubmitPA|MinTRES| normal|0|00:00:00||cluster|||1.000000|||||||||||||||||| dawson|100|00:00:00||cluster|||1.000000|||||||cpu=1024||||cpu=1024|30|||||cpu=16| ellis|100|00:00:00||cluster|||1.000000|||||||cpu=15,node=1||||cpu=80|45|||||| kruskal|100|00:00:00||cluster|||1.000000|||||||cpu=512||||cpu=512|8|||||cpu=16| mque|100|00:00:00||cluster|||1.000000|||||||cpu=128,node=4|||||20|||||| default|0|00:00:00||cluster|||1.000000||||||||||||40|||||| mccune|100|00:00:00||cluster|||1.000000|||||||cpu=256||||cpu=256||||||| sque|100|00:00:00||cluster|||1.000000|||||||cpu=512,node=100||||||||||| fenx|100|00:00:00||cluster|||1.000000|||||||cpu=40,node=16||||cpu=40,node=16||||||| fielder|100|00:00:00||cluster|||1.000000|||||||cpu=512,node=96||||cpu=96,node=12||||||| gque|100|00:00:00||cluster|||1.000000|||||||cpu=32,node=1||||||||||| jassby|100|00:00:00||cluster|||1.000000|||||||cpu=96,node=6||||cpu=96,node=6||||||| greene|100|00:00:00||cluster|||1.000000|||||||cpu=512,node=32||||cpu=512,node=32||||||| pswift|100|00:00:00||cluster|||1.000000|||||||||||||||||| interactive|100|00:00:00||cluster|||1.000000||||||||||12:00:00|||||||| beast|100|00:00:00||cluster|||1.000000|||||||cpu=32||||||||||| general|100|00:00:00||cluster|||1.000000|||||||||||||||||| interruptible|1|00:00:00||cluster|||1.000000|||||||||||||||||| $ sbatch -v batch sbatch: defined options for program `sbatch' sbatch: ----------------- --------------------- sbatch: user : `pbisbal' sbatch: uid : 41266 sbatch: gid : 589 sbatch: cwd : /u/pbisbal/testing/mpihello sbatch: ntasks : 1 (default) sbatch: nodes : 1 (default) sbatch: jobid : 4294967294 (default) sbatch: partition : default sbatch: profile : `NotSet' sbatch: job name : `batch' sbatch: reservation : `(null)' sbatch: wckey : `(null)' sbatch: distribution : unknown sbatch: verbose : 1 sbatch: overcommit : false sbatch: nice : -2 sbatch: account : (null) sbatch: comment : (null) sbatch: dependency : (null) sbatch: qos : (null) sbatch: constraints : sbatch: reboot : yes sbatch: network : (null) sbatch: array : N/A sbatch: cpu_freq_min : 4294967294 sbatch: cpu_freq_max : 4294967294 sbatch: cpu_freq_gov : 4294967294 sbatch: mail_type : NONE sbatch: mail_user : (null) sbatch: sockets-per-node : -2 sbatch: cores-per-socket : -2 sbatch: threads-per-core : -2 sbatch: ntasks-per-node : 0 sbatch: ntasks-per-socket : -2 sbatch: ntasks-per-core : -2 sbatch: mem-bind : default sbatch: plane_size : 4294967294 sbatch: propagate : NONE sbatch: switches : -1 sbatch: wait-for-switches : -1 sbatch: core-spec : NA sbatch: burst_buffer : `(null)' sbatch: burst_buffer_file : `(null)' sbatch: remote command : `/usr/bin/batch' sbatch: power : sbatch: wait : yes sbatch: cpus-per-gpu : 0 sbatch: gpus : (null) sbatch: gpu-bind : (null) sbatch: gpu-freq : (null) sbatch: gpus-per-node : (null) sbatch: gpus-per-socket : (null) sbatch: gpus-per-task : (null) sbatch: mem-per-gpu : 0 sbatch: Cray node selection plugin loaded sbatch: Serial Job Resource Selection plugin loaded with argument 17 sbatch: Linear node selection plugin loaded with argument 17 sbatch: Consumable Resources (CR) Node Selection plugin loaded with argument 17 sbatch: error: ERROR: A time limit must be specified sbatch: error: Batch job submission failed: Time limit specification required, but not provided From the sbatch -v batch output, it looks like this could be part of the problem: sbatch: remote command : `/usr/bin/batch' This command /usr/bin/batch, is part of the at package: $ rpm -qf /usr/bin/batch at-3.1.10-49.el6.x86_64 Which is probably on just about every RHEL-based system, and has definitely been on my systems for as long as I can tell. When I do sbatch ./batch, this problem goes away: $ sbatch -v ./batch sbatch: defined options for program `sbatch' sbatch: ----------------- --------------------- sbatch: user : `pbisbal' sbatch: uid : 41266 sbatch: gid : 589 sbatch: cwd : /u/pbisbal/testing/mpihello sbatch: ntasks : 4 (set) sbatch: nodes : 1 (default) sbatch: jobid : 4294967294 (default) sbatch: partition : ellis sbatch: profile : `NotSet' sbatch: job name : `mpihello' sbatch: reservation : `(null)' sbatch: wckey : `(null)' sbatch: distribution : unknown sbatch: verbose : 1 sbatch: overcommit : false sbatch: time_limit : 1 sbatch: nice : -2 sbatch: account : (null) sbatch: comment : (null) sbatch: dependency : (null) sbatch: qos : (null) sbatch: constraints : mem=63000M sbatch: reboot : yes sbatch: network : (null) sbatch: array : N/A sbatch: cpu_freq_min : 4294967294 sbatch: cpu_freq_max : 4294967294 sbatch: cpu_freq_gov : 4294967294 sbatch: mail_type : BEGIN,END,FAIL,REQUEUE,STAGE_OUT sbatch: mail_user : (null) sbatch: sockets-per-node : -2 sbatch: cores-per-socket : -2 sbatch: threads-per-core : -2 sbatch: ntasks-per-node : 0 sbatch: ntasks-per-socket : -2 sbatch: ntasks-per-core : -2 sbatch: mem-bind : default sbatch: plane_size : 4294967294 sbatch: propagate : NONE sbatch: switches : -1 sbatch: wait-for-switches : -1 sbatch: core-spec : NA sbatch: burst_buffer : `(null)' sbatch: burst_buffer_file : `(null)' sbatch: remote command : `./batch' sbatch: power : sbatch: wait : yes sbatch: cpus-per-gpu : 0 sbatch: gpus : (null) sbatch: gpu-bind : (null) sbatch: gpu-freq : (null) sbatch: gpus-per-node : (null) sbatch: gpus-per-socket : (null) sbatch: gpus-per-task : (null) sbatch: mem-per-gpu : 0 sbatch: Cray node selection plugin loaded sbatch: Serial Job Resource Selection plugin loaded with argument 17 sbatch: Linear node selection plugin loaded with argument 17 sbatch: Consumable Resources (CR) Node Selection plugin loaded with argument 17 Submitted batch job 398552 I hope that helps. (In reply to pbisbal from comment #15) > From the sbatch -v batch output, it looks like this could be part of the > problem: > > sbatch: remote command : `/usr/bin/batch' > > This command /usr/bin/batch, is part of the at package: > > $ rpm -qf /usr/bin/batch > at-3.1.10-49.el6.x86_64 > > Which is probably on just about every RHEL-based system, and has definitely > been on my systems for as long as I can tell. > > When I do sbatch ./batch, this problem goes away: The behavior changed with this patch in 18.08.4: https://github.com/SchedMD/slurm/commit/ccafaf7b60090155639edcbdbf4a3ab5e36967c6 Looking into what the correct behavior should be with sbatch as opposed to srun/salloc. --Nate Nate, Thanks. That bugfix makes perfect sense, both as to why you added it, and why it's causing problems for my user. I look forward to hearing if there's any difference between sbatch and srun or salloc. Prentice The behaviour change to sbatch has been corrected with this patch: https://github.com/SchedMD/slurm/commit/aca40d167b21835b0eeb61b18a2eaaef68c33865 Closing this ticket, please reply to re-open it. Thanks --Nate *** Ticket 6396 has been marked as a duplicate of this ticket. *** |
Created attachment 8711 [details] job_submit.lua file. I'm not sure whether this is bug in sbatch, or my job_submit.lua script, or the lua plug-in. Yesterday I upgraded from 18.08.3 to 18.08.4. Since that upgrade, it seems like sbatch or my job_submit.lua script will reject job scripts named 'batch'. For example: $ cat mpihello.sbatch #!/bin/bash #SBATCH -n 4 #SBATCH --mem=63000M #SBATCH -p ellis #SBATCH -t 00:01:00 #SBATCH -J mpihello #SBATCH -o mpihello-%j.out #SBATCH -e mpihello-%j.err #SBATCH --mail-type=ALL module load gcc/7.3.0 module load openmpi/3.0.0 mpiexec ./mpihello This script works when named "mpihello.sbatch" $ sbatch mpihello.sbatch Submitted batch job 398504 But doesn't when renamed to "batch" $ mv mpihello.sbatch batch $ sbatch batch sbatch: error: ERROR: A time limit must be specified sbatch: error: Batch job submission failed: Time limit specification required, but not provided It also fails if it's a symlink named "batch" $ mv batch mpihello.sbatch $ ln -s mpihello.sbatch batch $ ls -l batch lrwxrwxrwx 1 pbisbal users 15 Dec 19 13:27 batch -> mpihello.sbatch $ sbatch batch sbatch: error: ERROR: A time limit must be specified sbatch: error: Batch job submission failed: Time limit specification required, but not provided At the same time I did the upgrade. I added the following lines to my cgroups.conf file: ConstrainCores=yes ConstrainRAMSpace=yes ConstrainDevices=yes ConstrainKmemSpace=no TaskAffinity=no BI don't expect that to be related to this bug. Just offering that info in the spirit of full-disclosure. This appears to be specific to 18.08.4. This issue did not occur with 18.08.3. The same user who initially reported this problem used the same batch script to submit a job to 18.08.3 less than 24 hours before the upgrade. We upgraded to 18.08.3 on 11/20/18, so we've been using 18.08.3 for about 1 month with no issues. I've attached my job_submit.lua script. According to ls, it was last modified on 10/16/2018.