Ticket 11470

Summary: Limit srun allocation to one node
Product: Slurm Reporter: Torkil Svensgaard <torkil>
Component: ConfigurationAssignee: Marcin Stolarek <cinek>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 3 - Medium Impact    
Priority: --- CC: cinek, rkv
Version: 20.11.5   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=11411
Site: DRCMR Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- Google sites: ---
HPCnow Sites: --- HPE Sites: ---
IBM Sites: --- NOAA SIte: ---
NoveTech Sites: --- Nvidia HWinf-CS Sites: ---
OCF Sites: --- Recursion Pharma Sites: ---
SFW Sites: --- SNIC sites: ---
Tzag Elita Sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Torkil Svensgaard 2021-04-28 01:30:23 MDT
Hi

If I do this:

torkil@bill:/home/torkil$ srun --pty -n 96 bash

I get an allocation spanning 2 nodes:

torkil@gojira:/home/torkil$ scontrol show job 133891
JobId=133891 JobName=bash
   UserId=torkil(1018) GroupId=torkil(1018) MCS_label=N/A
   Priority=4294895817 Nice=0 Account=drcmr QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:42 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2021-04-28T09:22:05 EligibleTime=2021-04-28T09:22:05
   AccrueTime=Unknown
   StartTime=2021-04-28T09:22:05 EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-04-28T09:22:05
   Partition=HPC AllocNode:Sid=bill:2436219
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=gojira,smaug
   BatchHost=gojira
   NumNodes=2 NumCPUs=96 NumTasks=96 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=96,node=2,billing=96
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   WorkDir=/home/torkil
   Power=
   NtasksPerTRES:0

Is there a way to limit srun allocation to never span more than a single node? 

Mvh.

Torkil
Comment 1 Marcin Stolarek 2021-04-28 04:20:06 MDT
Torkil,

The natural way of doing that is by the addition of -N1 to srun options.

If you want to make it a site default, you can achieve this easily by cli_filter plugin[1].

cheers,
Marcin
[1]https://slurm.schedmd.com/cli_filter_plugins.html
Comment 2 Torkil Svensgaard 2021-04-28 04:26:42 MDT
(In reply to Marcin Stolarek from comment #1)

> The natural way of doing that is by the addition of -N1 to srun options.
> 
> If you want to make it a site default, you can achieve this easily by
> cli_filter plugin[1].

Thanks. I want to enforce it more than just set as default, so I have to do it in a job_submit plugin I guess?

Would you happen to have code that does something like that at hand?

Mvh.

Torkil
Comment 3 Marcin Stolarek 2021-04-30 06:41:08 MDT
>Thanks. I want to enforce it more than just set as default, so I have to do it in a job_submit plugin I guess?

Do you mean you want to forbid multinode jobs?

cheers,
Marcin
Comment 4 Torkil Svensgaard 2021-05-03 00:18:47 MDT
(In reply to Marcin Stolarek from comment #3)
> >Thanks. I want to enforce it more than just set as default, so I have to do it in a job_submit plugin I guess?
> 
> Do you mean you want to forbid multinode jobs?

With srun, yes. It's only used for interactive matlab where more than one node makes no sense.

Mvh.

Torkil
Comment 5 Marcin Stolarek 2021-05-03 01:48:38 MDT
Torkil,

>With srun, yes. It's only used for interactive matlab where more than one node makes no sense.
Please be aware that there are certain tools that use srun behind a scene (for instance openMPI) since srun is not only used for interactive "allocate and run" scenarios, but maybe even more often to create steps and launch tasks within the existing allocation.

Another important fact is that you can't really distinguish which tool was used by the end-user to prepare a job description from a job submit plugin perspective. Since it works on slurmctld side and the job can be even submitted by someone using Slurm API (or slurmrestd) directly (without sbatch/srun/salloc). 

cheers,
Marcin
Comment 6 Torkil Svensgaard 2021-05-03 03:09:40 MDT
Hi Marcin

Ok, we'll stick with just setting a default. I did this in job_submit.lua, that was what you had in mind?

if job_desc.max_nodes == 4294967294 then
  job_desc.max_nodes = 1
  slurm.log_info("Setting max_nodes to 1")
end

Mvh.

Torkil
Comment 7 Marcin Stolarek 2021-05-03 03:30:15 MDT
>Ok, we'll stick with just setting a default. I did this in job_submit.lua, that was what you had in mind?

No, it's not what I'd recommend. I think that the more suitable place is the cli_filter plugin[1], with the code like:
>function slurm_cli_setup_defaults(options, early_pass)
>        --[[
>        -- Set -N1 for srun as a default
>        ]]--
>        if options['type'] == 'srun' then
>                options['nodes'] = 1 
>        end
>        return slurm.SUCCESS
>end


cheers,
Marcin
[1]https://slurm.schedmd.com/cli_filter_plugins.html
Comment 8 Torkil Svensgaard 2021-05-03 04:14:16 MDT
(In reply to Marcin Stolarek from comment #7)
> >Ok, we'll stick with just setting a default. I did this in job_submit.lua, that was what you had in mind?
> 
> No, it's not what I'd recommend. I think that the more suitable place is the
> cli_filter plugin[1], with the code like:
> >function slurm_cli_setup_defaults(options, early_pass)
> >        --[[
> >        -- Set -N1 for srun as a default
> >        ]]--
> >        if options['type'] == 'srun' then
> >                options['nodes'] = 1 
> >        end
> >        return slurm.SUCCESS
> >end
> 
> 
> cheers,
> Marcin
> [1]https://slurm.schedmd.com/cli_filter_plugins.html

Ah, cool. I read [1] and didn't understand it, so went with job_submit. Can see I missed the reference to where to put cli_filter.lua.  

It looks like "scontrol reconfigure" doesn't copy the file to nodes and even if it did the submit hosts do not run slurmd.

Do I have to copy it manually to all hosts? And what is the default script dir on hosts with no /etc/slurm?

Mvh.

Torkil
Comment 9 Marcin Stolarek 2021-05-04 01:32:55 MDT
>It looks like "scontrol reconfigure" doesn't copy the file to nodes and even if it did the submit hosts do not run slurmd.
That's correct - config less does a copy of neither job_submit.lua nor cli_filter.lua as of today.

>Do I have to copy it manually to all hosts? And what is the default script dir on hosts with no /etc/slurm?
From the code perspective it's
>static const char lua_script_path[] = DEFAULT_SCRIPT_DIR "/cli_filter.lua";
while DEFAULT_SCRIPT_DIR is set during the build (based on options passed to configure), so if /etc/slurm is a default location for your config (on slurmctld) then you'll have to create this for job_submit/cli_filter lua.
This is something we're looking into in Bug 11411. It may change/improve in the next major release of Slurm.

Let me know if you have further questions.

cheers,
Marcin
Comment 10 Torkil Svensgaard 2021-05-04 01:50:00 MDT
(In reply to Marcin Stolarek from comment #9)
 
> >Do I have to copy it manually to all hosts? And what is the default script dir on hosts with no /etc/slurm?
> From the code perspective it's
> >static const char lua_script_path[] = DEFAULT_SCRIPT_DIR "/cli_filter.lua";
> while DEFAULT_SCRIPT_DIR is set during the build (based on options passed to
> configure), so if /etc/slurm is a default location for your config (on
> slurmctld) then you'll have to create this for job_submit/cli_filter lua.
> This is something we're looking into in Bug 11411. It may change/improve in
> the next major release of Slurm.

I created /etc/slurm and dumped cli_filter.lua there but it doesn't work or isn't found. How do I debug? Running srun with -v yielded no clues.

Thanks,

Torkil
Comment 11 Torkil Svensgaard 2021-05-04 02:07:12 MDT
(In reply to Torkil Svensgaard from comment #10)
> (In reply to Marcin Stolarek from comment #9)
>  
> > >Do I have to copy it manually to all hosts? And what is the default script dir on hosts with no /etc/slurm?
> > From the code perspective it's
> > >static const char lua_script_path[] = DEFAULT_SCRIPT_DIR "/cli_filter.lua";
> > while DEFAULT_SCRIPT_DIR is set during the build (based on options passed to
> > configure), so if /etc/slurm is a default location for your config (on
> > slurmctld) then you'll have to create this for job_submit/cli_filter lua.
> > This is something we're looking into in Bug 11411. It may change/improve in
> > the next major release of Slurm.
> 
> I created /etc/slurm and dumped cli_filter.lua there but it doesn't work or
> isn't found. How do I debug? Running srun with -v yielded no clues.

I also copied slurm.conf to which I added CliFilterPlugins=cli_filter.lua. Neither of thes two locatios for cli_filter.lua seems to work:

/etc/slurm/cli_filter.lua
/etc/slurm/cli_filter/cli_filter.lua

"
torkil@bill:/home/torkil$ srun -vvv --pty -n 96 bash
srun: error: Couldn't find the specified plugin name for cli_filter/cli_filter.lua looking at all files
srun: error: cannot find cli_filter plugin for cli_filter/cli_filter.lua
srun: error: cannot create cli_filter context for cli_filter/cli_filter.lua
srun: error: cli_filter plugin terminated with error
"

What at am I missing? The installed slurm packages are compiled with rpmbuild with no modifications.

Mvh.

Torkil
Comment 12 Torkil Svensgaard 2021-05-04 02:49:22 MDT
(In reply to Torkil Svensgaard from comment #11)
> (In reply to Torkil Svensgaard from comment #10)
> > (In reply to Marcin Stolarek from comment #9)
> >  
> > > >Do I have to copy it manually to all hosts? And what is the default script dir on hosts with no /etc/slurm?
> > > From the code perspective it's
> > > >static const char lua_script_path[] = DEFAULT_SCRIPT_DIR "/cli_filter.lua";
> > > while DEFAULT_SCRIPT_DIR is set during the build (based on options passed to
> > > configure), so if /etc/slurm is a default location for your config (on
> > > slurmctld) then you'll have to create this for job_submit/cli_filter lua.
> > > This is something we're looking into in Bug 11411. It may change/improve in
> > > the next major release of Slurm.
> > 
> > I created /etc/slurm and dumped cli_filter.lua there but it doesn't work or
> > isn't found. How do I debug? Running srun with -v yielded no clues.
> 
> I also copied slurm.conf to which I added CliFilterPlugins=cli_filter.lua.
> Neither of thes two locatios for cli_filter.lua seems to work:
> 
> /etc/slurm/cli_filter.lua
> /etc/slurm/cli_filter/cli_filter.lua
> 
> "
> torkil@bill:/home/torkil$ srun -vvv --pty -n 96 bash
> srun: error: Couldn't find the specified plugin name for
> cli_filter/cli_filter.lua looking at all files
> srun: error: cannot find cli_filter plugin for cli_filter/cli_filter.lua
> srun: error: cannot create cli_filter context for cli_filter/cli_filter.lua
> srun: error: cli_filter plugin terminated with error
> "
> 
> What at am I missing? The installed slurm packages are compiled with
> rpmbuild with no modifications.

I think I got it, posting it here for posterity.

On the submit node create /etc/slurm and copy slurm.conf. Add this line to slurm.conf:

CliFilterPlugins =lua

Complete cli_filter.lua (put in /etc/slurm):

"
function slurm_cli_pre_submit(cli, opts)
    return slurm.SUCCESS
end

function slurm_cli_post_submit(cli, opts)
    return slurm.SUCCESS
end

function slurm_cli_setup_defaults(options, early_pass)
        --[[
        -- Set -N1 for srun as a default
        ]]--
	if options['type'] == 'srun' then
                options['nodes'] = 1
        end
	return slurm.SUCCESS
end
"

Do I need the full slurm.conf on these submit hosts? I tried starting from scratch with an almost empty one but one error after another.

Mvh.

Torkil
Comment 13 Marcin Stolarek 2021-05-04 04:08:33 MDT
>Do I need the full slurm.conf on these submit hosts? I tried starting from scratch with an almost empty one but one error after another.
In general I'd recommend keeping all the configs in sync on all hosts.

cheers,
Marcin
Comment 14 Torkil Svensgaard 2021-05-04 04:13:33 MDT
(In reply to Marcin Stolarek from comment #13)
> >Do I need the full slurm.conf on these submit hosts? I tried starting from scratch with an almost empty one but one error after another.
> In general I'd recommend keeping all the configs in sync on all hosts.

We were very happy with the option of configless since sync was taken care of automatically but for this we are back at needing a slurm.conf in puppet. Not the end of the world keeping it in sync but it would have been nice if slurm.conf for login/submit nodes could have consisted of just:

CliFilterPlugins=lua

Then no sync issues at all, since that parameter isn't used on the master. Btw, where do these nodes get their configuration from when they run with no slurm.conf, which they did up til now? 

Mvh.

Torkil
Comment 15 Marcin Stolarek 2021-05-04 05:01:56 MDT
>We were very happy with the option of configless since sync was taken care of automatically[..]
It's not something I can commit to now, but we're considering a config less modification in Slurm 21.08 that will allow .lua scripts to be send together with configuration files supported today.

>[...]t would have been nice if slurm.conf for login/submit nodes could have consisted of just:
>CliFilterPlugins=lua
This will rather not be possible, since we don't merge different configuration sources. When slurm.conf source is found we just use it and just having CliFilterPlugins=.. won't be enough.

> Btw, where do these nodes get their configuration from when they run with no slurm.conf, which they did up til now? 
I guess you have _slurmctld._tcp SRV records in DNS [1]? 

cheers,
Marcin
[1]https://slurm.schedmd.com/configless_slurm.html
Comment 16 Torkil Svensgaard 2021-05-04 05:08:50 MDT
(In reply to Marcin Stolarek from comment #15)
> >We were very happy with the option of configless since sync was taken care of automatically[..]
> It's not something I can commit to now, but we're considering a config less
> modification in Slurm 21.08 that will allow .lua scripts to be send together
> with configuration files supported today.

That would be nice =)

> > Btw, where do these nodes get their configuration from when they run with no slurm.conf, which they did up til now? 
> I guess you have _slurmctld._tcp SRV records in DNS [1]? 

Of course I have, totally forgot about that. 

Thanks, feel free to close the ticket.

Mvh.

Torkil