Ticket 3168

Summary: Fail to allocate all cpus on the node exclusively allocated to a job.
Product: Slurm Reporter: Zhengji Zhao <zzhao>
Component: SchedulingAssignee: Alejandro Sanchez <alex>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 3 - Medium Impact    
Priority: --- CC: alex, dmjacobsen
Version: 16.05.5   
Hardware: Cray XC   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=3183
Site: NERSC Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Zhengji Zhao 2016-10-12 16:13:02 MDT
Dear Moe,

I am opening a new ticket per your suggestion. Please feel free to close the bug 2655.

I can not get all the cpus on the nodes that exclusively allocated to my job. This problem occurs on both our Haswell (32 cores/64 cpus per node) and KNL (68 cores/272 cpus per node) clusters. The problem is demonstrated below:

zz217@cori08:~> scontrol show config |grep Select   
SelectType              = select/cray
SelectTypeParameters    = CR_SOCKET_MEMORY,OTHER_CONS_RES,NHC_NO,CR_ONE_TASK_PER_CORE,CR_CORE_DEFAULT_DIST_BLOCK

zz217@cori08:~> scontrol show partition debug
PartitionName=debug
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=YES QoS=part_debug
   DefaultTime=00:10:00 DisableRootJobs=YES ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=00:30:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=nid[00009-00063,00072-00075,00077-00127,00137-00191,00193-00195,00200-00203,00208-00255,00261-00263,00268-00319,00321-00323,00328-00383,00393-00447,00456-00459,00461-00463,00465-00466,00468-00511,00521-00575,00577-00579,00584-00587,00592-00639,00645-00647,00652-00703,00705-00707,00712-00767,00776-00831,00845-00895,00897-00899,00901-00959,00961-00963,00968-00971,00976-01023,01029-01031,01036-01091,01096-01099,01105-01106,01108-01151,01161-01215,01224-01231,01233-01234,01236-01279,01281-01283,01285-01343,01345-01347,01352-01355,01360-01407,01413-01415,01420-01475,01489-01490,01492-01535,01545-01599,01608-01611,01613-01663,01665-01727,01729-01731,01736-01739,01744-01791,01797-01799,01804-01859,01864-01867,01873-01874,01876-01919,01929-01983,01992-02047,02049-02051,02053-02111,02113-02115,02120-02123,02128-02175,02181-02183,02188-02225]
   PriorityJobFactor=1000 PriorityTier=1000 RootOnly=NO ReqResv=NO OverSubscribe=EXCLUSIVE PreemptMode=REQUEUE
   State=UP TotalCPUs=123904 TotalNodes=1936 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

zz217@cori08:~> salloc -N 1 -p debug 
salloc: Granted job allocation 3013739
salloc: Waiting for resource configuration
salloc: Nodes nid00010 are ready for job

zz217@nid00010:~> srun -n 64 hostname
srun: error: Unable to create job step: More processors requested than permitted
zz217@nid00010:~> 


I believe with a partition like the debug, in which the nodes are not shared between users/jobs, we should be able to get a full node even if we have the parameter CR_ONE_TASK_PER_CORE is set. And I think you also agreed it is correct exepectation. 

Thanks for looking into the issue. 

Thanks,
Zhengji
Comment 1 Moe Jette 2016-10-13 11:44:22 MDT
For some reason, I can't reproduce this. After you create the job allocation could you get details about that job and node. For example (replace the job ID and node name as appropriate):

scontrol -dd show job 3013739
scontrol show node nid00010

And also from within the salloc, run

env

Please attach all that information and I should have more information about where the problem is.
Comment 2 Zhengji Zhao 2016-10-13 12:11:11 MDT
Dear Moe,

Thanks for looking into the problem. The required info is attached below.

Zhengji


zz217@cori08:~/tests/slurm> salloc -N 1 -p debug 
salloc: Granted job allocation 3014395
salloc: Waiting for resource configuration
salloc: Nodes nid00010 are ready for job

zz217@nid00010:~/tests/slurm> srun -n 64 hostname
srun: error: Unable to create job step: More processors requested than permitted

zz217@nid00010:~/tests/slurm> scontrol -dd show job 3014395
JobId=3014395 JobName=sh
   UserId=zz217(32858) GroupId=zz217(1032858) MCS_label=N/A
   Priority=64800 Nice=0 Account=mpccc QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:25 TimeLimit=00:10:00 TimeMin=N/A
   SubmitTime=2016-10-13T11:00:56 EligibleTime=2016-10-13T11:00:56
   StartTime=2016-10-13T11:00:56 EndTime=2016-10-13T11:10:59 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=debug AllocNode:Sid=mom5:11583
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=nid00010
   BatchHost=nid00010
   NumNodes=1 NumCPUs=64 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=64,mem=122G,node=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*
     Nodes=nid00010 CPU_IDs=0-63 Mem=124928
   MinCPUsNode=1 MinMemoryNode=122G MinTmpDiskNode=0
   Features=(null) Gres=craynetwork:1 Reservation=(null)
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
   Command=(null)
   WorkDir=/global/u1/z/zz217/tests/slurm
   Power=

zz217@nid00010:~/tests/slurm> scontrol show node nid00010
NodeName=nid00010 Arch=x86_64 CoresPerSocket=16
   CPUAlloc=64 CPUErr=0 CPUTot=64 CPULoad=2.05
   AvailableFeatures=haswell
   ActiveFeatures=haswell
   Gres=craynetwork:4
   NodeAddr=nid00010 NodeHostName=nid00010 Version=16.05
   OS=Linux RealMemory=124928 AllocMem=124928 FreeMem=126431 Sockets=2 Boards=1
   State=ALLOCATED ThreadsPerCore=2 TmpDisk=0 Weight=100 Owner=N/A MCS_label=N/A
   BootTime=2016-10-10T19:37:11 SlurmdStartTime=2016-10-10T19:48:19
   CapWatts=n/a
   CurrentWatts=131 LowestJoules=41058255 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   

zz217@nid00010:~/tests/slurm> env
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint
KSH_AUTOLOAD=1
MODULE_VERSION_STACK=3.2.10.4
PE_SMA_DEFAULT_PKGCONFIG_VARIABLES=PE_SMA_COMPFLAG_@prgenv@
MKLROOT=/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl
PE_LIBSCI_VOLATILE_PRGENV=CRAY GNU INTEL
LESSKEY=/etc/lesskey.bin
SLURM_NODELIST=nid00010
DVS_MAXNODES=1__
NNTPSERVER=news
PE_PETSC_DEFAULT_GENCOMPILERS_CRAY_sandybridge=8.3
PE_CXX_PKGCONFIG_LIBS=mpichcxx
MANPATH=/usr/common/software/man:/usr/common/mss/man:/usr/common/nsg/man:/opt/cray/pe/mpt/7.4.0/gni/man/mpich:/opt/cray/pe/mpt/7.4.0/gni/man/shmem:/opt/cray/pe/atp/2.0.2/man:/opt/cray/alps/6.1.3-17.12/man:/opt/cray/job/1.5.5-3.58/man:/opt/cray/pe/libsci/16.06.1/man:/opt/cray/pe/man/csmlversion:/opt/cray/pe/craype/2.5.5/man:/opt/intel/compilers_and_libraries_2016.3.210/linux/man/common:/opt/cray/pe/modules/3.2.10.4/share/man:/usr/syscom/nsg/man:/opt/modules/3.2.6.7/man:/usr/local/man:/usr/share/man:/opt/cray/share/man:/opt/cray/pe/man:/opt/cray/share/man
SLURM_JOB_NAME=sh
XDG_SESSION_ID=11877
PE_LIBSCI_DEFAULT_GENCOMPS_GNU_mic_knl=51
SLURMD_NODENAME=nid00010
SLURM_TOPOLOGY_ADDR=nid00010
CRAY_UDREG_INCLUDE_OPTS=-I/opt/cray/udreg/2.3.2-4.6/include
HOSTNAME=cori08
PE_FFTW_DEFAULT_TARGET_mic_knl=mic_knl
PE_TPSL_64_DEFAULT_GENCOMPS_INTEL_interlagos=150
PE_TRILINOS_DEFAULT_GENCOMPS_CRAY_x86_64=85
RCLOCAL_BASEOPTS=true
SLURM_PRIO_PROCESS=0
PE_TRILINOS_DEFAULT_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/trilinos/12.6.3.0/@PRGENV@/@PE_TRILINOS_DEFAULT_GENCOMPS@/@PE_TRILINOS_DEFAULT_TARGET@/lib/pkgconfig
PE_NETCDF_DEFAULT_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/netcdf/4.4.0/@PRGENV@/@PE_NETCDF_DEFAULT_GENCOMPS@/lib/pkgconfig
PE_PARALLEL_NETCDF_DEFAULT_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/parallel-netcdf/1.7.0/@PRGENV@/@PE_PARALLEL_NETCDF_DEFAULT_GENCOMPS@/lib/pkgconfig
XKEYSYMDB=/usr/X11R6/lib/X11/XKeysymDB
LIBRARYMODULES=/opt/cray/pe/modules/3.2.10.4/init/.librarymodules:acml:alps:cray-dwarf:cray-fftw:cray-ga:cray-hdf5:cray-hdf5-parallel:cray-libsci:cray-libsci_acc:cray-mpich:cray-mpich-abi:cray-mpich2:cray-netcdf:cray-netcdf-hdf5parallel:cray-parallel-netcdf:cray-petsc:cray-petsc-complex:cray-shmem:cray-tpsl:cray-trilinos:cudatoolkit:fftw:ga:hdf5:hdf5-parallel:iobuf:libfast:netcdf:netcdf-hdf5parallel:ntk:onesided:papi:petsc:petsc-complex:pmi:tpsl:trilinos:xt-libsci:xt-mpich2:xt-mpt:xt-papi:/etc/opt/cray/pe/modules/site_librarymodules
PE_SMA_DEFAULT_COMPFLAG_GNU=-fcray-pointer
CRAY_SITE_LIST_DIR=/etc/opt/cray/pe/modules
SLURM_SRUN_COMM_PORT=62443
PE_HDF5_DEFAULT_GENCOMPILERS_INTEL=15.0 14.0
PE_HDF5_DEFAULT_GENCOMPILERS_GNU=5.1 4.9
PE_TPSL_64_DEFAULT_GENCOMPILERS_CRAY_x86_64=8.3
PE_SMA_DEFAULT_COMPFLAG=
INTEL_LICENSE_FILE=28518@dmv1.nersc.gov:28518@dmv.nersc.gov
PE_SMA_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/mpt/7.4.0/gni/sma@PE_SMA_DIR_DEFAULT64@/lib64/pkgconfig
PE_ENV=INTEL
HOST=cori08
TERM=xterm-256color
PKGCONFIG_ENABLED=1
SHELL=/bin/bash
SLURM_PTY_WIN_ROW=76
HISTSIZE=1000
INTEL_MINOR_VERSION=3.210
PE_PETSC_DEFAULT_GENCOMPS_CRAY_sandybridge=83
PROFILEREAD=true
SLURM_JOB_QOS=normal
SLURM_TOPOLOGY_ADDR_PATTERN=node
TMPDIR=/global/cscratch1/sd/zz217
PE_PETSC_DEFAULT_GENCOMPS_INTEL_haswell=150
PE_NETCDF_DEFAULT_VOLATILE_PRGENV=GNU
CRAY_XPMEM_POST_LINK_OPTS=-L/opt/cray/xpmem/0.1-4.5/lib64
CRAY_UGNI_POST_LINK_OPTS=-L/opt/cray/ugni/6.0.12-2.1/lib64
PE_PARALLEL_NETCDF_DEFAULT_VOLATILE_PRGENV=GNU
SSH_CLIENT=128.3.135.64 55407 22
PE_TRILINOS_DEFAULT_VOLATILE_PRGENV=CRAY GNU INTEL
CRAYPE_DIR=/opt/cray/pe/craype/2.5.5
PE_TPSL_DEFAULT_GENCOMPS_GNU_sandybridge=51 49
PE_TPSL_DEFAULT_REQUIRED_PRODUCTS=PE_MPICH:PE_LIBSCI
PE_PETSC_DEFAULT_GENCOMPS_GNU_haswell=51 49
PE_NETCDF_HDF5PARALLEL_DEFAULT_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/netcdf-hdf5parallel/4.4.0/@PRGENV@/@PE_NETCDF_HDF5PARALLEL_DEFAULT_GENCOMPS@/lib/pkgconfig
PE_FFTW_DEFAULT_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/fftw/3.3.4.8/@PE_FFTW_DEFAULT_TARGET@/lib/pkgconfig
PE_HDF5_PARALLEL_DEFAULT_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/hdf5-parallel/1.8.16/@PRGENV@/@PE_HDF5_PARALLEL_DEFAULT_GENCOMPS@/lib/pkgconfig
PE_LIBSCI_GENCOMPS_CRAY_mic_knl=85
ALT_LINKER=/usr/common/software/altd/2.0/bin/ld
PE_PETSC_DEFAULT_GENCOMPS_CRAY_interlagos=83
ALTD_SELECT_OFF_USERS=
PE_LIBSCI_DEFAULT_GENCOMPILERS_CRAY_mic_knl=8.5
CRAY_MPICH2_DIR=/opt/cray/pe/mpt/7.4.0/gni/mpich-intel/15.0
PE_HDF5_DEFAULT_VOLATILE_PRGENV=GNU INTEL
LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64
PMI_CONTROL_PORT=63551
PE_LIBSCI_DEFAULT_GENCOMPS_GNU_x86_64=51 49
PE_GA_DEFAULT_VOLATILE_PRGENV=GNU
PE_TPSL_64_DEFAULT_GENCOMPILERS_CRAY_interlagos=8.3
ALTD_SELECT_ON=0
INTEL_PATH=/opt/intel/compilers_and_libraries_2016.3.210
PE_TPSL_DEFAULT_GENCOMPS_CRAY_mic_knl=85
PE_SMA_PKGCONFIG_VARIABLES=PE_SMA_COMPFLAG_@prgenv@
SLURM_CPU_BIND_VERBOSE=quiet
FPATH=:/opt/cray/pe/modules/3.2.10.4/init/sh_funcs/no_redirect:/opt/cray/pe/modules/3.2.10.4/init/sh_funcs/no_redirect
PE_MPICH_DEFAULT_GENCOMPILERS_GNU=5.1 4.9
PE_PKGCONFIG_PRODUCTS=PE_MPICH:PE_SMA:PE_LIBSCI
PE_TPSL_DEFAULT_GENCOMPS_INTEL_x86_64=150
PE_SMA_PKGCONFIG_LIBS=sma
PE_LIBSCI_GENCOMPILERS_CRAY_mic_knl=8.5
MORE=-sl
PE_MPICH_GENCOMPS_GNU=51 49
PE_PAPI_DEFAULT_ACCEL_LIBS_nvidia35=,-lcupti,-lcudart,-lcuda
PE_PETSC_DEFAULT_REQUIRED_PRODUCTS=PE_MPICH:PE_LIBSCI:PE_HDF5_PARALLEL:PE_TPSL
ALTD_VERBOSE=0
PE_TPSL_64_DEFAULT_GENCOMPS_CRAY_haswell=83
PE_TPSL_64_DEFAULT_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/tpsl/16.06.1/@PRGENV@64/@PE_TPSL_64_DEFAULT_GENCOMPS@/@PE_TPSL_64_DEFAULT_TARGET@/lib/pkgconfig
PE_TRILINOS_DEFAULT_GENCOMPILERS_CRAY_x86_64=8.5
PE_CRAY_DEFAULT_FIXED_PKGCONFIG_PATH=/opt/cray/pe/parallel-netcdf/1.7.0/CRAY/8.3/lib/pkgconfig:/opt/cray/pe/netcdf-hdf5parallel/4.4.0/CRAY/8.3/lib/pkgconfig:/opt/cray/pe/netcdf/4.4.0/CRAY/8.3/lib/pkgconfig:/opt/cray/pe/hdf5-parallel/1.8.16/CRAY/8.3/lib/pkgconfig:/opt/cray/pe/hdf5/1.8.16/CRAY/8.3/lib/pkgconfig:/opt/cray/pe/ga/5.3.0.6/CRAY/8.3/lib/pkgconfig
PE_LIBSCI_DEFAULT_OMP_REQUIRES_openmp=_mp
vlist0=/scratch/scratchdirs/altdlogs/db_async
PE_FORTRAN_PKGCONFIG_LIBS=mpichf90
SSH_TTY=/dev/pts/18
PE_TPSL_64_DEFAULT_GENCOMPILERS_CRAY_sandybridge=8.3
PE_PETSC_DEFAULT_GENCOMPS_CRAY_x86_64=83
SLURM_SPANK_SHIFTER_GID=1032858
SLURM_CPU_BIND_LIST=0xFFFFFFFFFFFFFFFF
PE_SMA_DEFAULT_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/mpt/7.4.0/gni/sma@PE_SMA_DEFAULT_DIR_DEFAULT64@/lib64/pkgconfig
ALPS_APP_ID=3014395
CRAY_MPICH_BASEDIR=/opt/cray/pe/mpt/7.4.0/gni
CRAY_SHMEM_ROOTDIR=/opt/cray/pe/mpt/7.4.0
PE_TRILINOS_DEFAULT_GENCOMPS_INTEL_x86_64=150
PE_SMA_DIR_CRAY_DEFAULT64=64
USER=zz217
PE_HDF5_PARALLEL_DEFAULT_GENCOMPILERS_GNU=5.1 4.9
JRE_HOME=/usr/lib64/jvm/java/jre
PE_TPSL_64_DEFAULT_GENCOMPS_INTEL_haswell=150
PE_NETCDF_HDF5PARALLEL_DEFAULT_GENCOMPILERS_GNU=5.1 4.9
SLURM_NNODES=1
PE_TPSL_DEFAULT_GENCOMPS_CRAY_x86_64=83
PE_LIBSCI_DEFAULT_VOLATILE_PRGENV=CRAY GNU INTEL
LS_COLORS=no=00:fi=00:di=01;34:ln=00;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=41;33;01:ex=00;32:*.cmd=00;32:*.exe=01;32:*.com=01;32:*.bat=01;32:*.btm=01;32:*.dll=01;32:*.tar=00;31:*.tbz=00;31:*.tgz=00;31:*.rpm=00;31:*.deb=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.lzma=00;31:*.zip=00;31:*.zoo=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.tb2=00;31:*.tz2=00;31:*.tbz2=00;31:*.avi=01;35:*.bmp=01;35:*.fli=01;35:*.gif=01;35:*.jpg=01;35:*.jpeg=01;35:*.mng=01;35:*.mov=01;35:*.mpg=01;35:*.pcx=01;35:*.pbm=01;35:*.pgm=01;35:*.png=01;35:*.ppm=01;35:*.tga=01;35:*.tif=01;35:*.xbm=01;35:*.xpm=01;35:*.dl=01;35:*.gl=01;35:*.wmv=01;35:*.aiff=00;32:*.au=00;32:*.mid=00;32:*.mp3=00;32:*.ogg=00;32:*.voc=00;32:*.wav=00;32:
PE_FFTW_DEFAULT_TARGET_interlagos=interlagos
PE_TRILINOS_DEFAULT_GENCOMPILERS_GNU_x86_64=5.1 4.9
PE_TRILINOS_DEFAULT_GENCOMPILERS_INTEL_x86_64=15.0
LD_LIBRARY_PATH=/opt/cray/job/1.5.5-3.58/lib64:/opt/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64:/usr/syscom/nsg/lib
PE_PETSC_DEFAULT_GENCOMPILERS_INTEL_interlagos=15.0
PE_LIBSCI_GENCOMPILERS_GNU_mic_knl=5.1
CRAY_IAA_INFO_FILE=/tmp/cray_iaa_info.3014395
CSCRATCH=/global/cscratch1/sd/zz217
PE_LIBSCI_PKGCONFIG_VARIABLES=PE_LIBSCI_OMP_REQUIRES_@openmp@
PE_TPSL_64_DEFAULT_GENCOMPILERS_INTEL_haswell=15.0
PE_MPICH_FIXED_PRGENV=INTEL
PE_PKGCONFIG_LIBS=mpich:sma:AtpSigHandler:cray-rca:libsci_mpi:libsci
PE_TPSL_64_DEFAULT_GENCOMPILERS_GNU_sandybridge=5.1 4.9
PE_PETSC_DEFAULT_VOLATILE_PRGENV=CRAY CRAY64 GNU GNU64 INTEL INTEL64
CRAY_RCA_POST_LINK_OPTS=-L/opt/cray/rca/1.0.0-6.21/lib64 -lrca 
CSHRCREAD=true
PE_PETSC_DEFAULT_GENCOMPILERS_CRAY_x86_64=8.3
PE_PETSC_DEFAULT_GENCOMPS_INTEL_sandybridge=150
PE_PETSC_DEFAULT_GENCOMPILERS_CRAY_mic_knl=8.5
PE_PETSC_DEFAULT_GENCOMPS_GNU_sandybridge=51 49
PE_PETSC_DEFAULT_GENCOMPS_GNU_interlagos=51 49
vsw=/global/common/cori/software
PE_PETSC_DEFAULT_GENCOMPS_INTEL_interlagos=150
XNLSPATH=/usr/share/X11/nls
PE_TPSL_DEFAULT_GENCOMPS_GNU_haswell=51 49
SLURM_STEP_NUM_NODES=1
ALTD_ON=1
PE_PAPI_DEFAULT_PKGCONFIG_VARIABLES=PE_PAPI_ACCEL_LIBS_@accelerator@
PE_TPSL_64_DEFAULT_GENCOMPS_INTEL_sandybridge=150
PE_TPSL_64_DEFAULT_GENCOMPILERS_GNU_interlagos=5.1 4.9
MPICH_DIR=/opt/cray/pe/mpt/7.4.0/gni/mpich-intel/15.0
PE_PETSC_DEFAULT_GENCOMPILERS_CRAY_haswell=8.3
INTEL_VERSION=16.0.3.210
MPICH_ABORT_ON_ERROR=1
PE_LIBSCI_DEFAULT_GENCOMPS_CRAY_x86_64=83
SRUN_DEBUG=3
PE_FFTW_DEFAULT_REQUIRED_PRODUCTS=PE_MPICH
CPATH=/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/include
ATP_POST_LINK_OPTS=-Wl,-L/opt/cray/pe/atp/2.0.2/libApp/ 
PE_HDF5_PARALLEL_DEFAULT_REQUIRED_PRODUCTS=PE_MPICH
PE_NETCDF_HDF5PARALLEL_DEFAULT_REQUIRED_PRODUCTS=PE_HDF5_PARALLEL:PE_MPICH
HOSTTYPE=x86_64
PE_PETSC_DEFAULT_GENCOMPILERS_INTEL_sandybridge=15.0
PE_TPSL_64_DEFAULT_GENCOMPILERS_CRAY_haswell=8.3
PE_FFTW_DEFAULT_TARGET_sandybridge=sandybridge
MPICH_MPIIO_DVS_MAXNODES=32
PE_MPICH_FORTRAN_PKGCONFIG_LIBS=mpichf90
SLURM_JOBID=3014395
FORT_BUFFERED=yes
RCLOCAL_PRGENV=true
PE_LIBSCI_GENCOMPS_INTEL_x86_64=150
PE_TPSL_DEFAULT_GENCOMPILERS_CRAY_x86_64=8.3
FROM_HEADER=
PE_PRODUCT_LIST=CRAYPE_HASWELL:CRAY_RCA:CRAY_ALPS:DVS:CRAY_XPMEM:CRAY_DMAPP:CRAY_PMI:CRAY_UGNI:CRAY_UDREG:CRAY_LIBSCI:CRAYPE:INTEL
PE_LIBSCI_DEFAULT_GENCOMPILERS_INTEL_x86_64=15.0
PE_TPSL_DEFAULT_GENCOMPS_GNU_interlagos=51 49
ALPS_LLI_STATUS_OFFSET=1
SLURM_LAUNCH_NODE_IPADDR=10.128.1.214
PAGER=less
CRAY_MPICH_ROOTDIR=/opt/cray/pe/mpt/7.4.0
PE_PETSC_DEFAULT_GENCOMPILERS_GNU_x86_64=5.1 4.9
ALPS_APP_PE=0
SLURM_STEP_ID=0
CRAY_SHMEM_VER=7.4.0
CSHEDIT=emacs
PE_TPSL_64_DEFAULT_GENCOMPILERS_INTEL_x86_64=15.0
PE_MPICH_MODULE_NAME=cray-mpich
PE_LIBSCI_GENCOMPILERS_GNU_x86_64=5.1 4.9
PE_MPICH_GENCOMPILERS_CRAY=8.3
INTEL_MAJOR_VERSION=16.0
PE_TPSL_64_DEFAULT_REQUIRED_PRODUCTS=PE_MPICH:PE_LIBSCI
PE_LIBSCI_DEFAULT_GENCOMPILERS_CRAY_x86_64=8.3
PE_TPSL_DEFAULT_GENCOMPS_CRAY_haswell=83
XDG_CONFIG_DIRS=/etc/xdg
PE_LIBSCI_GENCOMPS_CRAY_x86_64=83
PE_MPICH_TARGET_VAR_nvidia20=-lcudart
PE_TPSL_DEFAULT_GENCOMPS_CRAY_sandybridge=83
PE_MPICH_DEFAULT_VOLATILE_PRGENV=CRAY GNU
CRAY_LIBSCI_BASE_DIR=/opt/cray/pe/libsci/16.06.1
CRAY_LIBSCI_DIR=/opt/cray/pe/libsci/16.06.1
USERMODULES=/opt/cray/pe/modules/3.2.10.4/init/.usermodules:PrgEnv-cray:PrgEnv-gnu:PrgEnv-intel:PrgEnv-pathscale:PrgEnv-pgi:acml:alps:apprentice:apprentice2:atp:blcr:cce:chapel:cray-ccdb:cray-fftw:cray-ga:cray-hdf5:cray-hdf5-parallel:cray-lgdb:cray-libsci:cray-libsci_acc:cray-mpich:cray-mpich-compat:cray-mpich2:cray-netcdf:cray-netcdf-hdf5parallel:cray-parallel-netcdf:cray-petsc:cray-petsc-complex:cray-shmem:cray-snplauncher:cray-tpsl:cray-trilinos:craypat:craype:craypkg-gen:cudatoolkit:ddt:fftw:ga:gcc:hdf5:hdf5-parallel:intel:iobuf:java:lgdb:libfast:libsci_acc:mpich1:netcdf:netcdf-hdf5parallel:netcdf-nofsync:netcdf-nofsync-hdf5parallel:ntk:onesided:papi:parallel-netcdf:pathscale:perftools:perftools-lite:petsc:petsc-complex:pgi:pmi:stat:totalview:tpsl:trilinos:xt-asyncpe:xt-craypat:xt-lgdb:xt-libsci:xt-mpich2:xt-mpt:xt-papi:xt-shmem:xt-totalview:/etc/opt/cray/pe/modules/site_usermodules
PE_LIBSCI_PKGCONFIG_LIBS=libsci_mpi:libsci
MINICOM=-c on
DVS_VERSION=0.9.0
PE_NETCDF_DEFAULT_GENCOMPS_GNU=51 49
LIBGL_DEBUG=quiet
PE_PARALLEL_NETCDF_DEFAULT_GENCOMPS_GNU=51 49
CRAY_DMAPP_INCLUDE_OPTS=-I/opt/cray/dmapp/7.1.0-12.37/include -I/opt/cray/gni-headers/5.0.7-3.1/include 
NLSPATH=/opt/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/locale/%l_%t/%N:/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/locale/%l_%t/%N
PE_TPSL_64_DEFAULT_GENCOMPS_GNU_x86_64=51 49
SLURM_STEP_LAUNCHER_PORT=62443
PE_PKGCONFIG_DEFAULT_PRODUCTS=PE_TRILINOS:PE_TPSL_64:PE_TPSL:PE_PETSC:PE_PARALLEL_NETCDF:PE_NETCDF_HDF5PARALLEL:PE_NETCDF:PE_MPICH:PE_LIBSCI:PE_HDF5_PARALLEL:PE_HDF5:PE_GA:PE_FFTW
PE_HDF5_DEFAULT_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/hdf5/1.8.16/@PRGENV@/@PE_HDF5_DEFAULT_GENCOMPS@/lib/pkgconfig
PE_TPSL_DEFAULT_GENCOMPS_CRAY_interlagos=83
MODULE_VERSION=3.2.10.4
LIBGL_ALWAYS_INDIRECT=1
PE_MPICH_GENCOMPILERS_GNU=5.1 4.9
PATH=/global/cscratch1/sd/zz217/spack/bin:.:/global/cscratch1/sd/zz217/spack/bin:.:.:/usr/common/software/altd/2.0/bin:/usr/common/software/bin:/usr/common/mss/bin:/usr/common/nsg/bin:/global/cscratch1/sd/zz217/spack/bin:.:/opt/cray/pe/mpt/7.4.0/gni/bin:/opt/cray/rca/1.0.0-6.21/bin:/opt/cray/alps/6.1.3-17.12/sbin:/opt/cray/job/1.5.5-3.58/bin:/opt/cray/pe/pmi/5.0.10-1.0000.11050.0.0.ari/bin:/opt/cray/pe/craype/2.5.5/bin:/opt/intel/compilers_and_libraries_2016.3.210/linux/bin/intel64:/opt/cray/pe/modules/3.2.10.4/bin:/usr/syscom/nsg/sbin:/usr/syscom/nsg/bin:/opt/modules/3.2.6.7/bin:/global/homes/z/zz217/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:/global/homes/z/zz217/bin:/global/homes/z/zz217/bin:/global/homes/z/zz217/bin
PE_TPSL_DEFAULT_GENCOMPILERS_GNU_x86_64=5.1 4.9
MAIL=/var/mail/zz217
SLURM_TASKS_PER_NODE=32
PMI_CRAY_NO_SMP_ORDER=0
CPU=x86_64
ATP_IGNORE_SIGTERM=1
ESWRAP_LOGIN=cmom01
XTPE_NETWORK_TARGET=aries
PE_SMA_COMPFLAG_GNU=-fcray-pointer
PE_PETSC_DEFAULT_GENCOMPS_CRAY_mic_knl=85
PE_TPSL_DEFAULT_GENCOMPILERS_GNU_haswell=5.1 4.9
PE_PARALLEL_NETCDF_DEFAULT_GENCOMPILERS_GNU=5.1 4.9
PE_NETCDF_DEFAULT_GENCOMPILERS_GNU=5.1 4.9
PMI_NO_FORK=1
JAVA_BINDIR=/usr/lib64/jvm/java/bin
PE_HDF5_PARALLEL_DEFAULT_FIXED_PRGENV=CRAY
PE_HDF5_PARALLEL_DEFAULT_GENCOMPS_GNU=51 49
PE_NETCDF_HDF5PARALLEL_DEFAULT_FIXED_PRGENV=CRAY INTEL
PE_SMA_DEFAULT_DIR_CRAY_DEFAULT64=64
PE_NETCDF_HDF5PARALLEL_DEFAULT_GENCOMPS_GNU=51 49
SLURM_JOB_ID=3014395
PE_TPSL_64_DEFAULT_GENCOMPS_CRAY_sandybridge=83
PE_LIBSCI_GENCOMPS_INTEL_mic_knl=150
PE_TPSL_DEFAULT_GENCOMPILERS_CRAY_mic_knl=8.5
PE_TPSL_DEFAULT_GENCOMPS_INTEL_interlagos=150
PE_LIBSCI_DEFAULT_GENCOMPILERS_INTEL_mic_knl=15.0
PE_TPSL_64_DEFAULT_VOLATILE_PRGENV=CRAY CRAY64 GNU GNU64 INTEL INTEL64
CRAY_UDREG_POST_LINK_OPTS=-L/opt/cray/udreg/2.3.2-4.6/lib64
SLURM_JOB_USER=zz217
SLURM_STEPID=0
CRAY_ALPS_POST_LINK_OPTS=-L/opt/cray/alps/6.1.3-17.12/lib64
CRAYPE_VERSION=2.5.5
PWD=/global/homes/z/zz217/tests/slurm
PE_MPICH_VOLATILE_PRGENV=CRAY GNU
INPUTRC=/etc/inputrc
SLURM_SRUN_COMM_HOST=10.128.1.214
JAVA_HOME=/usr/lib64/jvm/java
PE_LIBSCI_DEFAULT_GENCOMPS_INTEL_mic_knl=150
PE_LIBSCI_DEFAULT_OMP_REQUIRES=
_LMFILES_=/opt/modulefiles/modules/3.2.6.7:/usr/syscom/nsg/modulefiles/nsg/1.2.0:/opt/cray/pe/modulefiles/modules/3.2.10.4:/opt/modulefiles/intel/16.0.3.210.nersc:/opt/cray/pe/craype/2.5.5/modulefiles/craype-network-aries:/opt/cray/pe/modulefiles/craype/2.5.5:/opt/cray/pe/modulefiles/cray-libsci/16.06.1:/opt/cray/ari/modulefiles/udreg/2.3.2-4.6:/opt/cray/ari/modulefiles/ugni/6.0.12-2.1:/opt/cray/pe/ari/modulefiles/pmi/5.0.10-1.0000.11050.0.0.ari:/opt/cray/ari/modulefiles/dmapp/7.1.0-12.37:/opt/cray/ari/modulefiles/gni-headers/5.0.7-3.1:/opt/cray/ari/modulefiles/xpmem/0.1-4.5:/opt/cray/ari/modulefiles/job/1.5.5-3.58:/opt/cray/ari/modulefiles/dvs/2.5_0.9.0-2.155:/opt/cray/ari/modulefiles/alps/6.1.3-17.12:/opt/cray/ari/modulefiles/rca/1.0.0-6.21:/opt/cray/pe/modulefiles/atp/2.0.2:/opt/cray/pe/modulefiles/PrgEnv-intel/6.0.3:/opt/cray/pe/craype/2.5.5/modulefiles/craype-haswell:/opt/cray/pe/modulefiles/cray-shmem/7.4.0:/opt/cray/pe/modulefiles/cray-mpich/7.4.0:/usr/common/software/modulefiles/altd/2.0:/opt/modulefiles/Base-opts/2.1.3-2.16
TARGETMODULES=/opt/cray/pe/modules/3.2.10.4/init/.targetmodules:craype-abudhabi:craype-abudhabi-cu:craype-accel-host:craype-accel-nvidia20:craype-accel-nvidia30:craype-accel-nvidia35:craype-barcelona:craype-broadwell:craype-haswell:craype-hugepages128K:craype-hugepages128M:craype-hugepages16M:craype-hugepages256M:craype-hugepages2M:craype-hugepages32M:craype-hugepages4M:craype-hugepages512K:craype-hugepages512M:craype-hugepages64M:craype-hugepages8M:craype-intel-knc:craype-interlagos:craype-interlagos-cu:craype-istanbul:craype-ivybridge:craype-mc12:craype-mc8:craype-mic-knl:craype-network-aries:craype-network-gemini:craype-network-infiniband:craype-network-none:craype-network-seastar:craype-sandybridge:craype-shanghai:craype-target-compute_node:craype-target-local_host:craype-target-native:craype-xeon:xtpe-barcelona:xtpe-interlagos:xtpe-interlagos-cu:xtpe-istanbul:xtpe-mc12:xtpe-mc8:xtpe-network-gemini:xtpe-network-seastar:xtpe-shanghai:xtpe-target-native:xtpe-xeon:/etc/opt/cray/pe/modules/site_targetmodules
PE_MPICH_DEFAULT_GENCOMPS_CRAY=83
PE_TPSL_DEFAULT_GENCOMPILERS_INTEL_haswell=15.0
PE_PETSC_DEFAULT_GENCOMPILERS_GNU_sandybridge=5.1 4.9
SLURM_CPU_BIND_TYPE=mask_cpu:
PE_LIBSCI_MODULE_NAME=cray-libsci/16.06.1
PE_TPSL_DEFAULT_GENCOMPILERS_CRAY_interlagos=8.3
PE_INTEL_FIXED_PKGCONFIG_PATH=/opt/cray/pe/mpt/7.4.0/gni/mpich-intel/15.0/lib/pkgconfig
SLURM_PTY_WIN_COL=160
SLURM_UMASK=0022
PE_LIBSCI_GENCOMPILERS_CRAY_x86_64=8.3
MODULEPATH=/opt/cray/pe/ari/modulefiles:/opt/cray/ari/modulefiles:/opt/cray/pe/craype/2.5.5/modulefiles:/opt/cray/pe/modulefiles:/opt/cray/modulefiles:/opt/modulefiles:/usr/common/software/modulefiles:/usr/syscom/nsg/modulefiles:/usr/syscom/nsg/opt/modulefiles:/usr/common/das/modulefiles:/usr/common/ftg/modulefiles:/opt/cray/craype/default/modulefiles
PE_MPICH_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/mpt/7.4.0/gni/mpich-@PRGENV@@PE_MPICH_DIR_DEFAULT64@/@PE_MPICH_GENCOMPS@/lib/pkgconfig
PYTHONSTARTUP=/etc/pythonstart
PE_MPICH_NV_LIBS_nvidia20=-lcudart
SLURM_JOB_UID=32858
LOADEDMODULES=modules/3.2.6.7:nsg/1.2.0:modules/3.2.10.4:intel/16.0.3.210.nersc:craype-network-aries:craype/2.5.5:cray-libsci/16.06.1:udreg/2.3.2-4.6:ugni/6.0.12-2.1:pmi/5.0.10-1.0000.11050.0.0.ari:dmapp/7.1.0-12.37:gni-headers/5.0.7-3.1:xpmem/0.1-4.5:job/1.5.5-3.58:dvs/2.5_0.9.0-2.155:alps/6.1.3-17.12:rca/1.0.0-6.21:atp/2.0.2:PrgEnv-intel/6.0.3:craype-haswell:cray-shmem/7.4.0:cray-mpich/7.4.0:altd/2.0:Base-opts/2.1.3-2.16
MAN_POSIXLY_CORRECT=1
SHMEM_ABORT_ON_ERROR=1
NSG_HOME=/usr/syscom/nsg
SDK_HOME=/usr/lib64/jvm/java
TZ=US/Pacific
SLURM_NODEID=0
PE_HDF5_DEFAULT_GENCOMPS_INTEL=150 140
CRAY_DMAPP_POST_LINK_OPTS=-L/opt/cray/dmapp/7.1.0-12.37/lib64
PE_TPSL_64_DEFAULT_GENCOMPS_GNU_interlagos=51 49
PE_PKG_CONFIG_PATH=/opt/cray/pe/cti/1.0.1/lib/pkgconfig
SLURM_STEP_RESV_PORTS=63551
CRAY_RCA_INCLUDE_OPTS=-I/opt/cray/rca/1.0.0-6.21/include -I/opt/cray/krca/1.0.0-3.55/include -I/opt/cray-hss-devel/8.0.0/include 
SCRATCH2=/scratch2/scratchdirs/zz217
PE_LIBSCI_OMP_REQUIRES_openmp=_mp
PE_SMA_DIR_PGI_DEFAULT64=64
SLURM_SUBMIT_DIR=/global/u1/z/zz217/tests/slurm
SLURM_TASK_PID=60841
PE_TPSL_DEFAULT_GENCOMPILERS_INTEL_x86_64=15.0
PE_TPSL_64_DEFAULT_GENCOMPS_CRAY_mic_knl=85
CRAY_MPICH_DIR=/opt/cray/pe/mpt/7.4.0/gni/mpich-intel/15.0
PE_MPICH_CXX_PKGCONFIG_LIBS=mpichcxx
SLURM_CPUS_ON_NODE=64
PE_LIBSCI_DEFAULT_GENCOMPS_INTEL_x86_64=150
PE_MPICH_PKGCONFIG_VARIABLES=PE_MPICH_NV_LIBS_@accelerator@:PE_MPICH_MULTITHREADED_LIBS_@multithreaded@
SLURM_PROCID=0
PE_TPSL_DEFAULT_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/tpsl/16.06.1/@PRGENV@/@PE_TPSL_DEFAULT_GENCOMPS@/@PE_TPSL_DEFAULT_TARGET@/lib/pkgconfig
PE_HDF5_DEFAULT_FIXED_PRGENV=CRAY
PE_MPICH_PKGCONFIG_LIBS=mpich
CRAY_PMI_POST_LINK_OPTS=-L/opt/cray/pe/pmi/5.0.10-1.0000.11050.0.0.ari/lib64
CRAY_MPICH2_VER=7.4.0
PE_TPSL_64_DEFAULT_GENCOMPILERS_CRAY_mic_knl=8.5
PE_LIBSCI_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/libsci/16.06.1/@PRGENV@/@PE_LIBSCI_GENCOMPS@/@PE_LIBSCI_TARGET@/lib/pkgconfig
PE_GA_DEFAULT_GENCOMPILERS_GNU=5.1 4.9
PE_PARALLEL_NETCDF_DEFAULT_FIXED_PRGENV=CRAY INTEL
GPG_TTY=/dev/pts/0
PE_NETCDF_DEFAULT_FIXED_PRGENV=CRAY INTEL
SLURM_JOB_NODELIST=nid00010
SLURM_PTY_PORT=37437
CRAY_LIBSCI_VERSION=16.06.1
JDK_HOME=/usr/lib64/jvm/java
PE_PKGCONFIG_PRODUCTS_DEFAULT=PE_PAPI
HOME=/global/homes/z/zz217
PE_NETCDF_HDF5PARALLEL_DEFAULT_VOLATILE_PRGENV=GNU
SHLVL=3
PE_MPICH_TARGET_VAR_nvidia35=-lcudart
PE_TPSL_64_DEFAULT_GENCOMPS_GNU_haswell=51 49
PE_HDF5_PARALLEL_DEFAULT_VOLATILE_PRGENV=GNU INTEL
QT_SYSTEM_DIR=/usr/share/desktop-data
SLURM_LOCALID=0
LESS_ADVANCED_PREPROCESSOR=no
PE_TPSL_DEFAULT_GENCOMPILERS_INTEL_interlagos=15.0
OSTYPE=linux
PE_TPSL_DEFAULT_VOLATILE_PRGENV=CRAY CRAY64 GNU GNU64 INTEL INTEL64
ALTD_PATH=/usr/common/software/altd/2.0
PE_MPICH_DEFAULT_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/mpt/7.4.0/gni/mpich-@PRGENV@@PE_MPICH_DEFAULT_DIR_DEFAULT64@/@PE_MPICH_DEFAULT_GENCOMPS@/lib/pkgconfig
PE_PETSC_DEFAULT_GENCOMPILERS_CRAY_interlagos=8.3
LS_OPTIONS=-N --color=none -T 0
PE_TPSL_64_DEFAULT_GENCOMPS_CRAY_interlagos=83
CRAY_PMI_INCLUDE_OPTS=-I/opt/cray/pe/pmi/5.0.10-1.0000.11050.0.0.ari/include
PE_TPSL_DEFAULT_GENCOMPS_INTEL_sandybridge=150
SLURM_JOB_CPUS_PER_NODE=32
SLURM_CLUSTER_NAME=cori
SLURM_GTIDS=0
PKG_CONFIG_PATH_DEFAULT=/opt/cray/pe/papi/5.4.3.2/lib64/pkgconfig
CRAYPE_NETWORK_TARGET=aries
PE_TPSL_DEFAULT_GENCOMPILERS_CRAY_haswell=8.3
ATP_MRNET_COMM_PATH=/opt/cray/pe/atp/2.0.2/libexec/atp_mrnet_commnode_wrapper
PRGENVMODULES=/opt/cray/pe/modules/3.2.10.4/init/.prgenvmodules:PrgEnv-cray:PrgEnv-gnu:PrgEnv-intel:PrgEnv-pathscale:PrgEnv-pgi
PE_MPICH_DIR_CRAY_DEFAULT64=64
WINDOWMANAGER=
SLURM_SUBMIT_HOST=mom5
PE_PETSC_DEFAULT_GENCOMPILERS_GNU_haswell=5.1 4.9
PE_TPSL_DEFAULT_GENCOMPILERS_GNU_interlagos=5.1 4.9
BASH_ENV=/global/homes/z/zz217/.bashrc
vmod=/global/common/cori/usg/Modules/modulefiles
PE_LIBSCI_GENCOMPILERS_INTEL_mic_knl=15.0
PE_PAPI_DEFAULT_TARGET_VAR_nvidia35=,-lcupti,-lcudart,-lcuda
PE_TPSL_DEFAULT_GENCOMPILERS_INTEL_sandybridge=15.0
PE_MPICH_MULTITHREADED_LIBS_multithreaded=_mt
SLURM_JOB_PARTITION=debug
PE_HDF5_DEFAULT_GENCOMPS_GNU=51 49
PE_TRILINOS_DEFAULT_REQUIRED_PRODUCTS=PE_MPICH:PE_HDF5_PARALLEL:PE_NETCDF_HDF5PARALLEL:PE_LIBSCI:PE_TPSL
LESS=-M -I -R
PE_TPSL_DEFAULT_GENCOMPS_GNU_x86_64=51 49
LOGNAME=zz217
ALTD_SELECT_USERS=
MACHTYPE=x86_64-suse-linux
PE_MPICH_NV_LIBS=
PE_TPSL_64_DEFAULT_GENCOMPILERS_GNU_haswell=5.1 4.9
CRAY_LIBSCI_PREFIX_DIR=/opt/cray/pe/libsci/16.06.1/INTEL/15.0/x86_64
CRAY_GNI_HEADERS_INCLUDE_OPTS=-I/opt/cray/gni-headers/5.0.7-3.1/include
PE_TPSL_64_DEFAULT_GENCOMPILERS_INTEL_sandybridge=15.0
PE_NETCDF_DEFAULT_REQUIRED_PRODUCTS=PE_HDF5
SLURM_STEP_NUM_TASKS=1
CVS_RSH=ssh
PE_LIBSCI_OMP_REQUIRES=
PE_TRILINOS_DEFAULT_GENCOMPS_GNU_x86_64=51 49
PE_MPICH_DEFAULT_GENCOMPILERS_CRAY=8.3
DMAPP_ABORT_ON_ERROR=1
PE_MPICH_GENCOMPS_CRAY=83
vusg=/global/common/cori/usg
PE_MPICH_DEFAULT_FIXED_PRGENV=INTEL
PE_MPICH_DEFAULT_GENCOMPS_GNU=51 49
OMPI_MCA_orte_allocation_required=0
DVS_INCLUDE_OPTS=-I/opt/cray/dvs/2.5_0.9.0-2.155/include
PE_TPSL_64_DEFAULT_GENCOMPILERS_INTEL_interlagos=15.0
XDG_DATA_DIRS=/usr/share
PE_TPSL_DEFAULT_GENCOMPILERS_CRAY_sandybridge=8.3
SSH_CONNECTION=128.3.135.64 55407 128.55.209.23 22
TOOLMODULES=/opt/cray/pe/modules/3.2.10.4/init/.toolmodules:apprentice:apprentice2:atp:chapel:cray-lgdb:cray-snplauncher:craypat:craypkg-gen:ddt:gdb:iobuf:papi:perftools:perftools-lite:stat:totalview:xt-craypat:xt-lgdb:xt-papi:xt-totalview:/etc/opt/cray/pe/modules/site_toolmodules
PE_LIBSCI_DEFAULT_REQUIRED_PRODUCTS=PE_MPICH
SLURM_JOB_ACCOUNT=mpccc
PE_GA_DEFAULT_FIXED_PRGENV=CRAY INTEL
PE_SMA_FORTRAN_PKGCONFIG_LIBS=smaf
PE_TPSL_DEFAULT_GENCOMPILERS_GNU_sandybridge=5.1 4.9
CRAY_PRGENVINTEL=loaded
MODULESHOME=/opt/cray/pe/modules/3.2.10.4
PE_LIBSCI_DEFAULT_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/libsci/16.06.1/@PRGENV@/@PE_LIBSCI_DEFAULT_GENCOMPS@/@PE_LIBSCI_DEFAULT_TARGET@/lib/pkgconfig
SLURM_JOB_NUM_NODES=1
PE_PETSC_DEFAULT_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/petsc/3.7.0.0/complex/@PRGENV@/@PE_PETSC_DEFAULT_GENCOMPS@/@PE_PETSC_DEFAULT_TARGET@/lib/pkgconfig
LESSOPEN=lessopen.sh %s
PELOCAL_PRGENV=true
PE_MPICH_NV_LIBS_nvidia35=-lcudart
PKG_CONFIG_PATH=/opt/cray/rca/1.0.0-6.21/lib64/pkgconfig:/opt/cray/alps/6.1.3-17.12/lib64/pkgconfig:/opt/cray/xpmem/0.1-4.5/lib64/pkgconfig:/opt/cray/gni-headers/5.0.7-3.1/lib64/pkgconfig:/opt/cray/dmapp/7.1.0-12.37/lib64/pkgconfig:/opt/cray/pe/pmi/5.0.10-1.0000.11050.0.0.ari/lib64/pkgconfig:/opt/cray/ugni/6.0.12-2.1/lib64/pkgconfig:/opt/cray/udreg/2.3.2-4.6/lib64/pkgconfig:/opt/cray/pe/craype/2.5.5/pkg-config:/opt/cray/pe/iobuf/2.0.6/lib/pkgconfig:/opt/cray/pe/atp/2.0.2/lib/pkgconfig
CRAY_NUM_COOKIES=2
SLURM_STEP_TASKS_PER_NODE=1
PE_TPSL_64_DEFAULT_GENCOMPS_INTEL_x86_64=150
LIBSCI_BASE_DIR=/opt/cray/pe/libsci/16.06.1
CRAY_SHMEM_DIR=/opt/cray/pe/mpt/7.4.0/gni/sma
CRAY_COOKIES=3999924224,3999989760
SLURM_STEP_NODELIST=nid00010
LIBSCI_VERSION=16.06.1
PE_HDF5_PARALLEL_DEFAULT_GENCOMPILERS_INTEL=15.0 14.0
PE_TPSL_64_DEFAULT_GENCOMPS_GNU_sandybridge=51 49
CRAY_SHMEM_BASEDIR=/opt/cray/pe/mpt/7.4.0/gni
PE_LIBSCI_DEFAULT_PKGCONFIG_VARIABLES=PE_LIBSCI_DEFAULT_OMP_REQUIRES_@openmp@
DISPLAY=mom5:10.0
PE_TPSL_64_DEFAULT_GENCOMPILERS_GNU_x86_64=5.1 4.9
NERSC_HOST=cori
PE_FFTW_DEFAULT_TARGET_broadwell=broadwell
CRAY_CPU_TARGET=haswell
CRAY_ALPS_INCLUDE_OPTS=-I/opt/cray/alps/6.1.3-17.12/include
CRAY_PRE_COMPILE_OPTS=-hnetwork=aries
PE_LIBSCI_GENCOMPILERS_INTEL_x86_64=15.0
XDG_RUNTIME_DIR=/run/user/32858
PE_SMA_MODULE_NAME=cray-shmem
PE_LIBSCI_DEFAULT_GENCOMPS_CRAY_mic_knl=85
craype_already_loaded=0
CRAY_XPMEM_INCLUDE_OPTS=-I/opt/cray/xpmem/0.1-4.5/include
PE_LIBSCI_REQUIRED_PRODUCTS=PE_MPICH
PE_SMA_COMPFLAG=
PE_TPSL_64_DEFAULT_GENCOMPS_CRAY_x86_64=83
PE_HDF5_PARALLEL_DEFAULT_GENCOMPS_INTEL=150 140
CRAY_UGNI_INCLUDE_OPTS=-I/opt/cray/ugni/6.0.12-2.1/include
SLURM_CPU_BIND=quiet,mask_cpu:0xFFFFFFFFFFFFFFFF
PE_LIBSCI_GENCOMPS_GNU_mic_knl=51
PE_LIBSCI_DEFAULT_GENCOMPILERS_GNU_mic_knl=5.1
PE_LIBSCI_GENCOMPS_GNU_x86_64=51 49
PE_LIBSCI_DEFAULT_GENCOMPILERS_GNU_x86_64=5.1 4.9
PE_TPSL_DEFAULT_GENCOMPS_INTEL_haswell=150
ATP_HOME=/opt/cray/pe/atp/2.0.2
NO_AT_BRIDGE=1
LESSCLOSE=lessclose.sh %s %s
PE_FFTW_DEFAULT_TARGET_x86_64=x86_64
PE_PETSC_DEFAULT_GENCOMPILERS_INTEL_x86_64=15.0
PE_SMA_DEFAULT_DIR_PGI_DEFAULT64=64
CRAY_LD_LIBRARY_PATH=/opt/cray/pe/mpt/7.4.0/gni/mpich-intel/15.0/lib:/opt/cray/pe/mpt/7.4.0/gni/sma/lib64:/opt/cray/pe/mpt/7.4.0/gni/sma64/lib64:/opt/cray/rca/1.0.0-6.21/lib64:/opt/cray/alps/6.1.3-17.12/lib64:/opt/cray/xpmem/0.1-4.5/lib64:/opt/cray/dmapp/7.1.0-12.37/lib64:/opt/cray/pe/pmi/5.0.10-1.0000.11050.0.0.ari/lib64:/opt/cray/ugni/6.0.12-2.1/lib64:/opt/cray/udreg/2.3.2-4.6/lib64:/opt/cray/pe/libsci/16.06.1/INTEL/15.0/x86_64/lib
PE_INTEL_DEFAULT_FIXED_PKGCONFIG_PATH=/opt/cray/pe/parallel-netcdf/1.7.0/INTEL/15.0/lib/pkgconfig:/opt/cray/pe/netcdf-hdf5parallel/4.4.0/INTEL/15.0/lib/pkgconfig:/opt/cray/pe/netcdf/4.4.0/INTEL/15.0/lib/pkgconfig:/opt/cray/pe/mpt/7.4.0/gni/mpich-intel/15.0/lib/pkgconfig:/opt/cray/pe/ga/5.3.0.6/INTEL/15.0/lib/pkgconfig
PE_PAPI_DEFAULT_ACCEL_LIBS=
PE_PETSC_DEFAULT_GENCOMPILERS_GNU_interlagos=5.1 4.9
PE_GA_DEFAULT_GENCOMPS_GNU=51 49
G_BROKEN_FILENAMES=1
PE_GA_DEFAULT_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/ga/5.3.0.6/@PRGENV@/@PE_GA_DEFAULT_GENCOMPS@/lib/pkgconfig
PE_PETSC_DEFAULT_GENCOMPILERS_INTEL_haswell=15.0
PE_FFTW_DEFAULT_TARGET_haswell=haswell
ALTD_WORKDIR=/global/cscratch1/altd/logs
vomp=/project/projectdirs/omp
SCRATCH=/global/cscratch1/sd/zz217
SLURM_MEM_PER_NODE=124928
LS_OPTINS=-N --color=tty -T 0
JAVA_ROOT=/usr/lib64/jvm/java
PE_MPICH_DEFAULT_DIR_CRAY_DEFAULT64=64
PE_PETSC_DEFAULT_GENCOMPS_INTEL_x86_64=150
intel_already_loaded=0
PE_PETSC_DEFAULT_GENCOMPS_GNU_x86_64=51 49
PE_PETSC_DEFAULT_GENCOMPS_CRAY_haswell=83
COLORTERM=1
BASH_FUNC_mymodule()=() {  eval `/opt/modules/3.2.6.7/bin/modulecmd bash $*`
}
BASH_FUNC_module()=() {  eval `/opt/cray/pe/modules/3.2.10.4/bin/modulecmd bash $*`
}
_=/usr/bin/env
zz217@nid00010:~/tests/slurm>
Comment 8 Alejandro Sanchez 2016-10-17 06:44:24 MDT
Hi Zhengji Zhao - we've noticed that since you've CR_ONE_TASK_PER_CORE set up, Slurm only allocates resources for one task per core, and thus the step can not consume the full threads on the nodes. Could you please try allocating specifying the --ntasks-per-core=2 option?

$ salloc -N 1 --ntasks-per-core=2 -p debug

Our guess is that you should be able to then consume -n 64 tasks:

$ srun -n 64 hostname

Thanks.
Comment 14 Moe Jette 2016-10-17 09:32:46 MDT
(In reply to Alejandro Sanchez from comment #8)
> Hi Zhengji Zhao - we've noticed that since you've CR_ONE_TASK_PER_CORE set
> up, Slurm only allocates resources for one task per core, and thus the step
> can not consume the full threads on the nodes. Could you please try
> allocating specifying the --ntasks-per-core=2 option?
> 
> $ salloc -N 1 --ntasks-per-core=2 -p debug
> 
> Our guess is that you should be able to then consume -n 64 tasks:
> 
> $ srun -n 64 hostname
> 
> Thanks.

To expand upon what Alex says above.

The job allocation is made using the --ntasks-per-core parameter of 1 by default (from "SelectTypeParameters=CR_ONE_TASK_PER_CORE". By creating the job allocation with "--ntasks-per-core=2" on the salloc command line, you create the job with the ability to launch 2 tasks per core.

While the srun command can accept "--ntasks-per-core=2" on the execute line, that information is only used for job allocations and not step allocations, so the only way to make this work today is ask Alex has described above.

I have created a new bug with a feature request to enable specifying the option in the job step:
https://bugs.schedmd.com/show_bug.cgi?id=3183

The goal of that would be to permit something like this:
$ salloc -N 1 -p debug
$ srun -n 64 --ntasks-per-core=2 hostname
Comment 15 Zhengji Zhao 2016-10-21 12:48:40 MDT
Dear Moe,

Your proposed feature is exactly what we need. We used to do that when we were running Torque/Moab. We request a node at job allocation time, and then use the Cray's job launcher command line option (-j1: default, not to hyperthreads; -j 2: to use hyperthreads) to decide if we want to run with all CPUs or only run with physical cores. 

Thanks very much! I would like to get informed when the new feature is implemented in Slurm. 


Dear Alex,

Thanks for looking into the probem. Yes, if we use --ntasks-per-core=2 at the job allocation time, we then can use all the CPUs on the node. I notice that the CR_ONE_TASK_PER_CORE srestrict only the number of "tasks" per core to be 1, but not the threads. So although I can not run 64 tasks on the node, but I was able to run 32 tasks with 2 threads each, that way I was able to use all the CPUS on the node. So my oriignal description about the problem was not exacly right, Slurm does allow me to use the full node, but it restrict the number of tasks to be able to launhced on the node due to the CR_ONE_TASK_PER_CORE. So I think it is not a bug in Slurm. 

With Moe's proposed solution will help us to achieve what we want. Feel free to close this bug. I will follow up with the new feature bug that Moe has opened. 

Thanks a lot!
Zhengji
Comment 16 Moe Jette 2016-10-21 14:52:10 MDT
(In reply to Zhengji Zhao from comment #15)
> With Moe's proposed solution will help us to achieve what we want. Feel free
> to close this bug. I will follow up with the new feature bug that Moe has
> opened. 

See bug 3183 for remaining work involving new development.
https://bugs.schedmd.com/show_bug.cgi?id=3183