| Summary: | NCPUS/NumCPUs shows 2 even when use --cpus-per-task 1 with sbatch | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | George Hwa <george.hwa> |
| Component: | Scheduling | Assignee: | Director of Support <support> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | 19.05.0 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | KLA-Tencor RAPID | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | x | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
(sonic_tf23) [ghwa@rocks7fe fcv_3.6-bkmA_newBC]$ scontrol show config Configuration data as of 2020-11-05T10:50:08 AccountingStorageBackupHost = (null) AccountingStorageEnforce = associations,limits,qos,safe,wckeys AccountingStorageHost = rocks7fe AccountingStorageLoc = N/A AccountingStoragePort = 6819 AccountingStorageTRES = cpu,mem,energy,node,billing,fs/disk,vmem,pages,gres/gpu AccountingStorageType = accounting_storage/slurmdbd AccountingStorageUser = N/A AccountingStoreJobComment = Yes AcctGatherEnergyType = acct_gather_energy/none AcctGatherFilesystemType = acct_gather_filesystem/none AcctGatherInterconnectType = acct_gather_interconnect/none AcctGatherNodeFreq = 0 sec AcctGatherProfileType = acct_gather_profile/none AllowSpecResourcesUsage = 0 AuthInfo = (null) AuthType = auth/munge BatchStartTimeout = 10 sec BOOT_TIME = 2020-10-06T18:36:13 BurstBufferType = (null) CheckpointType = checkpoint/none ClusterName = luminizer6 CommunicationParameters = (null) CompleteWait = 0 sec CoreSpecPlugin = core_spec/none CpuFreqDef = Unknown CpuFreqGovernors = Performance,OnDemand CryptoType = crypto/munge DebugFlags = Backfill,BackfillMap,CPU_Bind,Gres,NO_CONF_HASH,Priority,Steps DefMemPerNode = UNLIMITED DisableRootJobs = No EioTimeout = 60 EnforcePartLimits = NO Epilog = (null) EpilogMsgTime = 2000 usec EpilogSlurmctld = (null) ExtSensorsType = ext_sensors/none ExtSensorsFreq = 0 sec FastSchedule = 1 FederationParameters = (null) FirstJobId = 1 GetEnvTimeout = 2 sec GresTypes = gpu,scn,sln,swn,plx GroupUpdateForce = 1 GroupUpdateTime = 600 sec HASH_VAL = Match HealthCheckInterval = 0 sec HealthCheckNodeState = ANY HealthCheckProgram = (null) InactiveLimit = 30 sec JobAcctGatherFrequency = 30 JobAcctGatherType = jobacct_gather/linux JobAcctGatherParams = NoOverMemoryKill JobCheckpointDir = /var/spool/slurm.checkpoint JobCompHost = rocks7fe JobCompLoc = /var/log/slurm_jobcomp.log JobCompPort = 0 JobCompType = jobcomp/none JobCompUser = root JobContainerType = job_container/none JobCredentialPrivateKey = (null) JobCredentialPublicCertificate = (null) JobDefaults = (null) JobFileAppend = 0 JobRequeue = 1 JobSubmitPlugins = (null) KeepAliveTime = SYSTEM_DEFAULT KillOnBadExit = 0 KillWait = 60 sec LaunchParameters = (null) LaunchType = launch/slurm Layouts = Licenses = (null) LicensesUsed = (null) LogTimeFormat = iso8601_ms MailDomain = (null) MailProg = /bin/mail MaxArraySize = 150000 MaxJobCount = 1000000 MaxJobId = 67043328 MaxMemPerNode = UNLIMITED MaxStepCount = 40000 MaxTasksPerNode = 512 MCSPlugin = mcs/none MCSParameters = (null) MemLimitEnforce = Yes MessageTimeout = 10 sec MinJobAge = 600 sec MpiDefault = none MpiParams = (null) MsgAggregationParams = (null) NEXT_JOB_ID = 16222580 NodeFeaturesPlugins = (null) OverTimeLimit = 0 min PluginDir = /usr/lib64/slurm PlugStackConfig = /etc/slurm/plugstack.conf PowerParameters = (null) PowerPlugin = PreemptMode = OFF PreemptType = preempt/none PriorityParameters = (null) PriorityType = priority/basic PrivateData = none ProctrackType = proctrack/linuxproc Prolog = (null) PrologEpilogTimeout = 65534 PrologSlurmctld = (null) PrologFlags = (null) PropagatePrioProcess = 0 PropagateResourceLimits = ALL PropagateResourceLimitsExcept = (null) RebootProgram = (null) ReconfigFlags = (null) RequeueExit = (null) RequeueExitHold = (null) ResumeFailProgram = (null) ResumeProgram = /etc/slurm/resumehost.sh ResumeRate = 4 nodes/min ResumeTimeout = 450 sec ResvEpilog = (null) ResvOverRun = 0 min ResvProlog = (null) ReturnToService = 2 RoutePlugin = route/default SallocDefaultCommand = (null) SbcastParameters = (null) SchedulerParameters = bf_max_job_test=1500,bf_interval=10,MessageTimeout=30,max_rpc_cnt=1000,sched_interval=20,default_queue_depth=1500 SchedulerTimeSlice = 30 sec SchedulerType = sched/backfill SelectType = select/cons_res SelectTypeParameters = CR_CORE_MEMORY SlurmUser = root(0) SlurmctldAddr = (null) SlurmctldDebug = info SlurmctldHost[0] = rocks7fe(10.2.1.1) SlurmctldLogFile = /var/log/slurm/slurmctld.log SlurmctldPort = 6817 SlurmctldSyslogDebug = unknown SlurmctldPrimaryOffProg = (null) SlurmctldPrimaryOnProg = (null) SlurmctldTimeout = 300 sec SlurmctldParameters = (null) SlurmdDebug = info SlurmdLogFile = /var/log/slurm/slurmd.log SlurmdParameters = (null) SlurmdPidFile = /var/run/slurmd.pid SlurmdPort = 6818 SlurmdSpoolDir = /var/spool/slurmd SlurmdSyslogDebug = unknown SlurmdTimeout = 300 sec SlurmdUser = root(0) SlurmSchedLogFile = (null) SlurmSchedLogLevel = 0 SlurmctldPidFile = /var/run/slurmctld.pid SlurmctldPlugstack = (null) SLURM_CONF = /etc/slurm/slurm.conf SLURM_VERSION = 18.08.0 SrunEpilog = (null) SrunPortRange = 0-0 SrunProlog = (null) StateSaveLocation = /var/spool/slurm.state SuspendExcNodes = (null) SuspendExcParts = (null) SuspendProgram = /etc/slurm/suspendhost.sh SuspendRate = 4 nodes/min SuspendTime = NONE SuspendTimeout = 45 sec SwitchType = switch/none TaskEpilog = (null) TaskPlugin = task/none TaskPluginParam = (null type) TaskProlog = (null) TCPTimeout = 2 sec TmpFS = /state/partition1 TopologyParam = (null) TopologyPlugin = topology/none TrackWCKey = Yes TreeWidth = 50 UsePam = 0 UnkillableStepProgram = (null) UnkillableStepTimeout = 60 sec VSizeFactor = 110 percent WaitTime = 60 sec X11Parameters = (null) Slurmctld(primary) at rocks7fe is UP (sonic_tf23) [ghwa@rocks7fe fcv_3.6-bkmA_newBC]$ Hi George, (In reply to George Hwa from comment #0) > My question is that is SLURM really allocating 2 CPU for my job? Yes. Since you have CR_CORE_MEMORY in your SelectTypeParameters, that means that even if you only request 1 CPU, you will get the entire core allocated (2 threads/core, thus 2 CPUs). From https://slurm.schedmd.com/slurm.conf.html#OPT_CR_Core_Memory: "CR_Core_Memory Cores and memory are consumable resources. On nodes with hyper-threads, each thread is counted as a CPU to satisfy a job's resource requirement, but multiple jobs are not allocated threads on the same core. The count of CPUs allocated to a job may be rounded up to account for every CPU on an allocated core." Here's an example of this, from https://slurm.schedmd.com/srun.html#OPT_cpus-per-task: "For example `srun -c2 --threads-per-core=1 prog` may allocate two cores for the job, but if each of those cores contains two threads, the job allocation will include four CPUs." Thanks, -Michael Michael, Got it. We changed to CR_CPU_Memory and now it is getting 1. Thanks George Ok, great. Just note that we usually recommend CR_CORE_MEMORY because there can be performance and security concerns when allowing separate jobs to run on the same core. Going a bit deeper on this topic: so if all my jobs are single CPU tasks, SLURM would still only schedule up to the number of CORES jobs on a node, not up to the number of CPUs, right? (In reply to George Hwa from comment #6) > so if all my jobs are single CPU tasks, SLURM would still only schedule up > to the number of CORES jobs on a node, not up to the number of CPUs, right? If cr_core_memory is specified, then yes, it will schedule up to the # of cores. If cr_cpu_memory is specified, it will schedule up to the # of CPUs. |
I submitted a simple job with the following command sbatch --cpus-per-task=1 sleeper.sh sacct shows (sonic_tf23) [ghwa@rocks7fe fcv_3.6-bkmA_newBC]$ sbatch --cpus-per-task=1 sleeper.sh Submitted batch job 16222579 (sonic_tf23) [ghwa@rocks7fe fcv_3.6-bkmA_newBC]$ sacct -o ReqCPUS,ReqTRES,ReqGRES,ReqMem,ReqCPUFreq -j 16222579 JobID Elapsed NCPUS NTasks AllocGRES State JobName User Timelimit NNodes NodeList Start End MaxVMSize MaxRSS ReqCPUS ReqTRES ReqGRES ReqMem ReqCPUFreq -------------------- ---------- ---------- -------- ------------ ---------- ------------------ --------- ---------- -------- ------------------- ------------------- ------------------- ---------- ---------- -------- ---------- ------------ ---------- ---------- 16222579 00:00:12 2 RUNNING sleeper.sh ghwa 01:00:00 1 compute-gpu-12-7 2020-11-05T10:46:25 Unknown 1 billing=1+ 512Mc Unknown and scontrol show job (sonic_tf23) [ghwa@rocks7fe fcv_3.6-bkmA_newBC]$ scontrol show job 16222579 JobId=16222579 JobName=sleeper.sh UserId=ghwa(5001) GroupId=sonic(21063) MCS_label=N/A Priority=3594828827 Nice=0 Account=local QOS=normal WCKey=*default JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:01:39 TimeLimit=01:00:00 TimeMin=N/A SubmitTime=2020-11-05T10:46:24 EligibleTime=2020-11-05T10:46:24 AccrueTime=2020-11-05T10:46:24 StartTime=2020-11-05T10:46:25 EndTime=2020-11-05T11:46:25 Deadline=N/A PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-11-05T10:46:25 Partition=snq2 AllocNode:Sid=rocks7fe:8814 ReqNodeList=(null) ExcNodeList=(null) NodeList=compute-gpu-12-7 BatchHost=compute-gpu-12-7 NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=2,mem=1G,node=1,billing=2 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryCPU=512M MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/gsshare/users/ghwa/sonic3FCV/fcv_3.6-bkmA_newBC/sleeper.sh WorkDir=/gsshare/users/ghwa/sonic3FCV/fcv_3.6-bkmA_newBC StdErr=/gsshare/users/ghwa/sonic3FCV/fcv_3.6-bkmA_newBC/slurm-16222579.out StdIn=/dev/null StdOut=/gsshare/users/ghwa/sonic3FCV/fcv_3.6-bkmA_newBC/slurm-16222579.out Power= (sonic_tf23) [ghwa@rocks7fe fcv_3.6-bkmA_newBC]$ cat sleeper.sh #!/bin/bash sleep 600 My question is that is SLURM really allocating 2 CPU for my job?