Ticket 12663

Summary: Default constraint
Product: Slurm Reporter: Gordon Dexter <gmdexter>
Component: User CommandsAssignee: Ben Roberts <ben>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 21.08.2   
Hardware: Linux   
OS: Linux   
Site: Johns Hopkins Univ. HLTCOE Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Gordon Dexter 2021-10-13 13:33:19 MDT
I have a cluster with features 'EL7' and 'EL8', for different OS versions.

I'd like to make all jobs request the EL7 feature, unless the user specifies an EL? constraint.  This way I can gradually transition nodes from EL7 to EL8 without reconfiguring partitions constantly.

#1364 seems to answer the same question but 1) it's a job_submit plugin which seems like it would be better done as a cli_filter plugin nowadays, and 2) it looks like it would fail if a user requests an unrelated feature (e.g. -C PublicIP).

Is the cli_filter.lua the best way to do this nowadays?  And is there a safer way for the plugin to check for particular features?  It seems like the opts['constraints'] is only a string that the plugin would have to parse, rather than some kind of iterable.
Comment 1 Ben Roberts 2021-10-13 16:48:09 MDT
Hi Gordon,

You're right that this is similar to bug 1364.  Whether you use a job_submit plugin or a cli_filter plugin is a matter of preference.  The job_submit plugin will run on the controller and the cli_filter will run on the submit host.  My personal preference is for the job_submit plugin.  

If you're not careful you can cause the submit plugin to fail if the user requests another plugin, but you can handle that case as well when writing the plugin.  The features you would be comparing against are strings as far as the plugin is concerned.  

We can't write submit plugins for you, but I'll provide a simple example that should get you started.  Here's what the relevant section of code looks like from the slurm_job_submit section in an example plugin:
-------------------------
    if not job_desc.features then
        job_desc.features="rh7"
    else
        curr_feature = job_desc.features
        if curr_feature == "rh8" then
            slurm.log_user("rh8 feature already on job")
        else
            curr_feature += ",rh7"
            job_desc.features=curr_feature
        end 
    end 
-------------------------

Here's an example where I submitted test jobs that requested different features or nothing:

$ sbatch -n1 -C rack2 --wrap='srun sleep 30'
sbatch: Setting rh7 feature on job
Submitted batch job 1669 on cluster knight

$ sbatch -n1 -C rh8 --wrap='srun sleep 30'
sbatch: rh8 feature already on job
Submitted batch job 1670 on cluster knight

$ sbatch -n1 --wrap='srun sleep 30'
Submitted batch job 1671 on cluster knight


$ scontrol show job | egrep 'JobId|Feature'
JobId=1669 JobName=wrap
   Features=rack2&rh7 DelayBoot=00:00:00
JobId=1670 JobName=wrap
   Features=rh8 DelayBoot=00:00:00
JobId=1671 JobName=wrap
   Features=(null) DelayBoot=00:00:00



I hope this helps.  Let me know if you have any questions.

Thanks,
Ben
Comment 2 Gordon Dexter 2021-10-15 09:25:34 MDT
For posterity, this is the script I came up with.

It takes advantage of the fact that none of our other features contain the string "EL", meaning that any constraint string that contains it already has an OS feature req.

"""
function slurm_cli_pre_submit(options, pack_offset)

	default_os="EL7"
	os_search_string="EL"

	oc=options["constraint"]
	if not oc or oc=="" then
		options["constraint"]=default_os
		slurm.log_verbose("Setting constraint to " .. default_os .. ".")
	else
		slurm.log_verbose("Old constraint, pre-plugin: %s",oc)
		el_found=string.find(oc,os_search_string)
		if el_found then
			slurm.log_verbose("Found " .. os_search_string .. " at %d",el_found)
		else
			slurm.log_verbose(os_search_string .. " not found, appending default (" .. default_os .. ") to constraints")
			options["constraint"]=oc .. "," .. default_os
			
		end
		slurm.log_verbose("New constraint, post-plugin: %s",options["constraint"])
		
	end

	return slurm.SUCCESS

end
"""