Ticket 1096

Summary: Need SALLOC_HINT environment variable
Product: Slurm Reporter: Mark Shry <kshry>
Component: SchedulingAssignee: Danny Auble <da>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: brian.gilmer, da
Version: 14.03.3   
Hardware: Linux   
OS: Linux   
Site: CRAY Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 14.03.8 14.11.0pre5 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Mark Shry 2014-09-10 02:31:31 MDT
Is it possible to get a SALLOC_HINT environmental variable so that we can setup salloc to be default "nomultithread". I know this issue was addressed for sbatch in bug 1052. We need something similar for salloc.

Current default(without --hint) behavior:

> kmshry@clogin73:~> salloc -n256 -c3 srun hostname | sort -n | uniq -c
> salloc: Pending job allocation 6644
> salloc: job 6644 queued and waiting for resources
> salloc: job 6644 has been allocated resources
> salloc: Granted job allocation 6644
> salloc: Relinquishing job allocation 6644
> salloc: Job allocation 6644 has been revoked.
>      16 nid00072
>      16 nid00073
>      16 nid00074
>      16 nid00075
>      16 nid00076
>      16 nid00077
>      16 nid00078
>      16 nid00079
>      16 nid00080
>      16 nid00081
>      16 nid00082
>      16 nid00083
>      16 nid00084
>      16 nid00085
>      16 nid00086
>      16 nid00087

Current behavior with hint:

> kmshry@clogin73:~> salloc -n256 -c3 --hint=nomultithread srun hostname | sort -n | uniq -c
> salloc: Pending job allocation 6646
> salloc: job 6646 queued and waiting for resources
> salloc: job 6646 has been allocated resources
> salloc: Granted job allocation 6646
> salloc: Relinquishing job allocation 6646
> salloc: Job allocation 6646 has been revoked.
>       8 nid00072
>       8 nid00073
>       8 nid00074
>       8 nid00075
>       8 nid00076
>       8 nid00077
>       8 nid00078
>       8 nid00079
>       8 nid00080
>       8 nid00081
>       8 nid00082
>       8 nid00083
>       8 nid00084
>       8 nid00085
>       8 nid00086
>       8 nid00087
>       8 nid00088
>       8 nid00089
>       8 nid00090
>       8 nid00091
>       8 nid00092
>       8 nid00093
>       8 nid00094
>       8 nid00095
>       8 nid00096
>       8 nid00097
>       8 nid00098
>       8 nid00099
>       8 nid00100
>       8 nid00101
>       8 nid00102
>       8 nid00103
Comment 1 Danny Auble 2014-09-10 04:57:06 MDT
I'll work on it now.  I thought it was done at the same time, but it appears to be missing from the current code.
Comment 2 Danny Auble 2014-09-10 05:26:40 MDT
This is in commit b1ad21dabd9a692f223030d8e8046619d64467e7

Using SLURM_HINT will work in all cases.  As well as SBATCH|SALLOC_HINT for the respective processes.
Comment 3 Mark Shry 2014-09-12 04:10:51 MDT
This patch was installed on 9/11/2014. This issue is resolved.

Thanks