I have found an issue though. I have a user that has the largest priority on Niagara but his job doesn't run. It is stuck on the queue waiting for resources. I simplified the script and tested on our TDS and I could reproduce the problem. For example the following batch script: >#!/bin/bash -l >#SBATCH --ntasks=1 >#SBATCH --time=00-11:30 >#SBATCH --ntasks=1 >#SBATCH --nodes=1 >#SBATCH --cpus-per-task=1 >#SBATCH --mem-per-cpu=14000 >#SBATCH -J memory_per_cpu_test >#SBATCH -A scinet >#SBATCH --mail-type=ALL >#SBATCH --output=slurm-%j.out > >echo "#########################################" >echo " SLURM submission batch script stdout " >echo "#########################################" > >source ~/scripts/sbatch_job_envs.sh > >env > env_${SLURM_JOB_ID} > >srun -l hostname >sleep 600 is very similar to the user's one. However it gets stuck on the queue: >squeue > JOBID PARTITION NAME USER ACCOUNT ST TIME_LIMIT TIME TIME_LEFT START_TIME CPUS PRIORITY NODES NODELIST(REASON) > 5008 compute memory_per bmundim scinet PD 11:30:00 0:00 11:30:00 N/A 1 22739 1 (Resources) Even thought TDS cluster is free of jobs. If I comment out the following line: >#SBATCH --mem-per-cpu=14000 the job runs normally. Do you know why? I will attach the slurm.conf and cgroup.conf in a minute. Thanks, Bruno.
Hi Bruno, This looks like a duplicate of bug 9724, fixed by commit 49a7d7f9fb but only in 20.11. You can likely apply that commit in 20.02. Throw out the part in the NEWS file - it won't apply cleanly and you don't need that anyway. You can get the patchfile here: https://github.com/SchedMD/slurm/commit/49a7d7f9fb.patch Can you apply this patch and let us know if it fixes the problem?
I'm closing this as a duplicate of bug 9724. Let us know if you have any issues with the patch. *** This ticket has been marked as a duplicate of ticket 9724 ***