Ticket 1379

Summary: use submit plugin to set different default partitions depending on login node
Product: Slurm Reporter: Steve McMahon <steve.mcmahon>
Component: OtherAssignee: Moe Jette <jette>
Status: RESOLVED FIXED QA Contact:
Severity: 3 - Medium Impact    
Priority: High CC: brian, da, steve.mcmahon
Version: 14.03.0   
Hardware: Linux   
OS: Linux   
Site: CSIRO Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 14.11.4 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Steve McMahon 2015-01-18 10:41:27 MST
Hi,

We have a design which has 3 "logical clusters" with one instance of the slurm server and Bright Cluster Manager to manage the nodes.  The logical clusters have different architectures - CPU, GPU or PHI - different login nodes and different partitions.

We want to use a submit plugin to set different default partitions depending on what login node the job is submitted from.

What's the best way to do this?

Also, we are still learning about running production clusters using slurm.  What's the best way to test this functionality without affecting production use?
Comment 1 Moe Jette 2015-01-18 12:55:37 MST
(In reply to Steve McMahon from comment #0)
> Hi,
> 
> We have a design which has 3 "logical clusters" with one instance of the
> slurm server and Bright Cluster Manager to manage the nodes.  The logical
> clusters have different architectures - CPU, GPU or PHI - different login
> nodes and different partitions.
> 
> We want to use a submit plugin to set different default partitions depending
> on what login node the job is submitted from.
> 
> What's the best way to do this?

The job submit plugin has access to all of the job parameters. Doing what you want should be pretty simple. There are several samples available that you can use as a model. If you want to do this using a LUA script, take a look at contribs/lua/job_submit.lua packaged with Slurm or online here:
https://github.com/SchedMD/slurm/blob/master/contribs/lua/job_submit.lua

If you prefer to use C, then see src/plugins/job_submit/partition/job_submit_partition.c
https://github.com/SchedMD/slurm/blob/master/src/plugins/job_submit/partition/job_submit_partition.c

or src/plugins/job_submit/all_partitions/job_submit_all_partitions.c
https://github.com/SchedMD/slurm/blob/master/src/plugins/job_submit/all_partitions/job_submit_all_partitions.c

> Also, we are still learning about running production clusters using slurm. 
> What's the best way to test this functionality without affecting production
> use?

I would recommend building a configuration on your desktop that you can use for emulating your systems, just don't try to run a bunch of big parallel jobs ;), submit jobs that just sleep. I would recommend the "front end" configuration described here:
http://slurm.schedmd.com/faq.html#multi_slurmd
Comment 2 Steve McMahon 2015-01-18 15:11:16 MST
Thanks Moe,

We have developed a LUA script.  We don’t know the name of the parameter which has the host name of the node the job was submitted from.  Will it be something like job_desc.AllocNode ?
Comment 3 Moe Jette 2015-01-18 15:25:15 MST
(In reply to Steve McMahon from comment #2)
> Thanks Moe,
> 
> We have developed a LUA script.  We don’t know the name of the parameter
> which has the host name of the node the job was submitted from.  Will it be
> something like job_desc.AllocNode ?

It should be job_desc.alloc_node

You will find brief descriptions of all of the names in slurm/slurm.h.in
Look starting around line 1127 here:
https://github.com/SchedMD/slurm/blob/master/slurm/slurm.h.in

Unfortunately, that variable doesn't seem to be getting exported to Lua today. I'll need to send you a patch for that. If you want to have a crack at making the patch yourself, it should be trivial, see the
_get_job_req_field() function in plugins/job_submit/lua/job_submit_lua.c around line 476 of the file:
https://github.com/SchedMD/slurm/blob/master/src/plugins/job_submit/lua/job_submit_lua.c

It should just take two lines being added:
} else if (!strcmp(name, "alloc_node")) {
lua_pushstring (L, job_desc->alloc_node);
Comment 4 Moe Jette 2015-01-19 04:44:26 MST
(In reply to Moe Jette from comment #3)
> It should just take two lines being added:
> } else if (!strcmp(name, "alloc_node")) {
> lua_pushstring (L, job_desc->alloc_node);

This will be in v14.11.4 when released. The commit is here:
https://github.com/SchedMD/slurm/commit/85b3cc2db4a2cffda9b35a6db86b2b7b9f5f5203
Comment 5 Moe Jette 2015-01-20 02:24:48 MST
Can we close this ticket?
Comment 6 Steve McMahon 2015-01-20 08:10:06 MST
(In reply to Moe Jette from comment #5)
> Can we close this ticket?

Yes, thanks.  We have enough to go on now.
Comment 7 Moe Jette 2015-01-20 08:40:48 MST
Closed based upon information provided to customer and Slurm patch