| Summary: | Batch job submission failed: Requested node configuration is not available | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Ole.H.Nielsen <Ole.H.Nielsen> |
| Component: | Scheduling | Assignee: | Marshall Garey <marshall> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | 17.02.9 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | DTU Physics | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: |
slurm.conf
slurmctld.log with debug flags (gzipped) topology.conf file |
||
|
Description
Ole.H.Nielsen@fysik.dtu.dk
2018-01-22 08:05:32 MST
That should work. I'm not able to reproduce this. Can you post the output of scontrol show reservations Test1? Can you also upload your slurm.conf? Thanks (In reply to Marshall Garey from comment #1) > That should work. I'm not able to reproduce this. > > Can you post the output of scontrol show reservations Test1? # scontrol show reservations Test1 ReservationName=Test1 StartTime=Mon 15:39:54 EndTime=Wed 15:39:54 Duration=30-00:00:00 Nodes=g[079,083] NodeCnt=2 CoreCnt=32 Features=(null) PartitionName=xeon16 Flags=SPEC_NODES TRES=cpu=32 Users=ohni,mikst Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a > Can you also upload your slurm.conf? Will do. Created attachment 5978 [details]
slurm.conf
Another observation: If I submit this script (with -N, -n and reservation lines removed), it submits correctly to the normal queue, but gets rejected when submitted to the reservation: $ sbatch -N 2-2 -n 2 --reservation=Test1 lolcow.slurm sbatch: error: Batch job submission failed: Requested node configuration is not available $ sbatch -N 2-2 -n 2 lolcow.slurm Submitted batch job 389825 Now I added a 3rd node to the reservation: # scontrol update reserv=Test1 Nodes=g[021,079,083] Reservation updated. # scontrol show reservations Test1 ReservationName=Test1 StartTime=Mon 15:39:54 EndTime=Wed 15:39:54 Duration=30-00:00:00 Nodes=g[021,079,083] NodeCnt=3 CoreCnt=48 Features=(null) PartitionName=xeon16 Flags=SPEC_NODES TRES=cpu=48 Users=ohni,mikst Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a Lo and behold, my 2-node job can now be submitted successfully: $ sbatch -N 2-2 -n 32 --reservation=Test1 lolcow.slurm Submitted batch job 389828 But I can't submit the job to all 3 nodes: $ sbatch -N 3-3 -n 48 --reservation=Test1 lolcow.slurm sbatch: error: Batch job submission failed: Requested node configuration is not available So the error seems to be localized to the reservation, and the batch job requesting the entire reservation. I still can't find why. I'm still looking, but in the meantime can you set your slurmctld debug level to debug with scontrol setdebug debug2 and turn on the reservation and selecttype flags: scontrol setdebugflags +reservation scontrol setdebugflags +selecttype Then try to submit the batch job again. Can you upload the slurmctld log file after that? Then go ahead and turn off the flags and set the debug level back to whatever you want. (In reply to Marshall Garey from comment #6) > I still can't find why. I'm still looking, but in the meantime can you set > your slurmctld debug level to debug with > > scontrol setdebug debug2 > > and turn on the reservation and selecttype flags: > > scontrol setdebugflags +reservation > scontrol setdebugflags +selecttype > > Then try to submit the batch job again. Can you upload the slurmctld log > file after that? Then go ahead and turn off the flags and set the debug > level back to whatever you want. I ran these 3 commands, then tried to submit the job: $ sbatch -N 3-3 -n 48 --reservation=Test1 lolcow.slurm sbatch: error: Batch job submission failed: Requested node configuration is not available I reset the debug flags by "scontrol reconfigure". Slurmctld log file will be uploaded. Created attachment 5981 [details]
slurmctld.log with debug flags (gzipped)
(In reply to Ole.H.Nielsen@fysik.dtu.dk from comment #8) > Created attachment 5981 [details] > slurmctld.log with debug flags (gzipped) FYI: The rejected job submission has Jobid 389877. It would be very useful if sbatch could be modified to print the Jobid also for jobs that are rejected. Status mail from Slurm: SLURM Job_id=389877 Name=lolcow.slurm Failed, Run time 00:00:00, FAILED Job ID: 389877 Cluster: niflheim User/Group: ohni/camdvip State: FAILED (exit code 1) Cores: 1 CPU Utilized: 00:00:00 CPU Efficiency: 0.00% of 00:00:00 core-walltime Memory Utilized: 0.00 MB (estimated maximum) Memory Efficiency: 0.00% of 4.00 GB (4.00 GB/node) I found this in your log file: [2018-01-23T08:59:36.354] debug: job 389877: best_fit topology failure: no switch currently has sufficient resource to satisfy the request [2018-01-23T08:59:36.354] cons_res: cr_job_test: test 0 fail: insufficient resources [2018-01-23T08:59:36.354] _pick_best_nodes: job 389877 never runnable in partition xeon16 I suspect there's something going on with your topology. Can you share your topology.conf file? I'd like to mimic it and try to reproduce this bug. (In reply to Marshall Garey from comment #10) > I found this in your log file: > > [2018-01-23T08:59:36.354] debug: job 389877: best_fit topology failure: no > switch currently has sufficient resource to satisfy the request > [2018-01-23T08:59:36.354] cons_res: cr_job_test: test 0 fail: insufficient > resources > [2018-01-23T08:59:36.354] _pick_best_nodes: job 389877 never runnable in > partition xeon16 > > > I suspect there's something going on with your topology. Can you share your > topology.conf file? I'd like to mimic it and try to reproduce this bug. It's true that our compute node fabric is divided into disjoint islands corresponding to several different generations of node and network hardware (Intel Omni-Path, Infiniband, and plain Gigabit Ethernet islands). I'll attach the topology.conf file. I agree with your analysis: Node g079 is connected to this fabric island: SwitchName=volt01234 Switches=volt0[1-4] while node g083 is connected to a disjoint island: SwitchName=mell01 Nodes=g[081-110],h[001-002] Please update the case as solved. The reason why this case had me confounded is the lack of and really unhelpful error messages from Slurm. If the "best_fit topology failure" message had been printed to stderr from sbatch, or even logged to slurmctld.log, I would have understood the error much sooner, and no support case would be required. Questions: 1. Can SchedMD modify the sbatch and srun commands so that error messages like the above would be printed or logged by default in the future? 2. Also, sbatch really ought to print out the Jobid of failed jobs, making it more user friendly to search for the Jobid in log files. Thanks for your support, Ole Created attachment 5985 [details]
topology.conf file
(In reply to Ole.H.Nielsen@fysik.dtu.dk from comment #11) > Questions: Some discussion will probably need to happen internally on both of these. For now, I'm creating tickets to track that discussion. > 1. Can SchedMD modify the sbatch and srun commands so that error messages > like the above would be printed or logged by default in the future? Bug 4687. I'll look into this. It might be the case that we simply say "increase the debug level temporarily" since the information is there. But there might be an elegant way to make the error message more helpful, since, as you point out, "Node configuration unavailable" isn't very helpful to know what is actually going on. > 2. Also, sbatch really ought to print out the Jobid of failed jobs, making > it more user friendly to search for the Jobid in log files. Bug 4686. From what I understand, sbatch submissions that are immediately rejected don't even get a job record created. In this case, a job record was created and put in the database. It makes sense to me to print the job id because a job record was created. Thanks. I'll close this as resolved. |