| Summary: | Question Partition/Queue Setup | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | John Johnston <jbjohnston> |
| Component: | Configuration | Assignee: | Ben Glines <ben.glines> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | 21.08.8 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Oakland U | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: | slurm.conf file | ||
Hi John, you should be able to issue a "$ scontrol show config" on that other cluster. What is most likely happening is some type of cli_filter or job_submit which changes aspects of the job submission such as partition based on time limits. Frankly, without knowing what the other cluster is doing, it is hard to say for sure what they are doing exactly. Example job_submit (ran server side by the slurmctld) > JobSubmitPlugins=lua https://github.com/SchedMD/slurm/blob/master/contribs/lua/job_submit.lua Or cli_filter (ran client side) > CliFilterPlugins=lua https://github.com/SchedMD/slurm/blob/master/etc/cli_filter.lua.example What would be helpful is to understand your requirements for how jobs should be routed. Based on your description, it sounds like you want the user to submit without knowing partitions and have the correct partition selected based on the time limit of the job. The only exception here is account based jobs that should be routed to your buy-in nodes. Please let me know if this assumption is incorrect. Hi Jason,
Yes, you are correct - we want the user to be able to submit without
specifying a partition, and based on time limit (or in the case of
buy-in users, account) have the job routed to the appropriate queue.
I took your suggestion and ran a "scontrol show config" on the other
cluster to have another look. They are using "JobSubmitPlugins=lua" and
I suspect this is how they're doing it (nothing for CliFilterPlugins).
I've also looked at the linked lua script provided. Just for grins, I
dumped it into the same directory as slurm.conf, and set
"JobSubmitPlugins=lua", then restarted Slurm.
It did not work, but I can tell from the log it is "hitting" that script
when the job is submitted. I'm just trying to better understand how
this script needs to be modified for our needs. Do you have a good
reference or perhaps an example on how this script should/could be
customized?
The first problem I encountered was that the script was not identifying
the default account for the user. I added the following to the canned
script:
function slurm_job_submit(job_desc, part_list, submit_uid)
if job_desc.account == nil then
local getacct = io.popen("sacctmgr -n list USER $USER
-o format=DefaultAccount")
local defacct = getacct:read("*a")
local account = defacct
slurm.log_info("slurm_job_submit: job from uid %u,
setting default account value: %s",
submit_uid, account)
job_desc.account = account
This seems to use the current $USER to determine the default account if
none is specified. However, the portion where it is supposed to obtain
a partition list (part_list) seems to be giving me issues now. I tried
hard-coding in a list of partitions just to see if it would work, but
I'm uncertain as to what the format of this should be. I've tried:
part_list="general-short,general-long,science"
part_list="PartitionName=general-short,PartitionName=general-long,PartitionName=science"
part_list="1:general-short,2:general-long,3:science"
None of these seem to work. It seems to what some sort of table. The
issue is I'm not sure how those things might be passed directly from
SLURM into that script (without me using shell commands, hardcoded
strings, or other kludges). Any insight you can provide would be
appreciated.
Thanks,
On 8/10/22 12:07, bugs@schedmd.com wrote:
>
> *Comment # 1 <https://bugs.schedmd.com/show_bug.cgi?id=14719#c1> on
> bug 14719 <https://bugs.schedmd.com/show_bug.cgi?id=14719> from Jason
> Booth <mailto:jbooth@schedmd.com> *
> Hi John, you should be able to issue a "$ scontrol show config" on that other
> cluster. What is most likely happening is some type of cli_filter or job_submit
> which changes aspects of the job submission such as partition based on time
> limits.
>
> Frankly, without knowing what the other cluster is doing, it is hard to say for
> sure what they are doing exactly.
>
>
> Example job_submit (ran server side by the slurmctld)
>
> > JobSubmitPlugins=lua
>
> https://github.com/SchedMD/slurm/blob/master/contribs/lua/job_submit.lua
>
> Or cli_filter (ran client side)
>
> > CliFilterPlugins=lua
>
> https://github.com/SchedMD/slurm/blob/master/etc/cli_filter.lua.example
>
>
> What would be helpful is to understand your requirements for how jobs should be
> routed. Based on your description, it sounds like you want the user to submit
> without knowing partitions and have the correct partition selected based on the
> time limit of the job. The only exception here is account based jobs that
> should be routed to your buy-in nodes. Please let me know if this assumption is
> incorrect.
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
> * You reported the bug.
>
(In reply to John Johnston from comment #2) > It did not work, but I can tell from the log it is "hitting" that script > when the job is submitted. I'm just trying to better understand how > this script needs to be modified for our needs. Do you have a good > reference or perhaps an example on how this script should/could be > customized? This page explains a lot about the job_submit plugin API, including the functions and passed parameters: https://slurm.schedmd.com/job_submit_plugins.html Here you can see an example of how the job_submit plugin can be used for various tasks: https://github.com/SchedMD/slurm/blob/master/etc/cli_filter.lua.example > The first problem I encountered was that the script was not identifying > the default account for the user. I added the following to the canned > script: > > function slurm_job_submit(job_desc, part_list, submit_uid) > if job_desc.account == nil then > local getacct = io.popen("sacctmgr -n list USER $USER > -o format=DefaultAccount") > local defacct = getacct:read("*a") > local account = defacct > slurm.log_info("slurm_job_submit: job from uid %u, > setting default account value: %s", > submit_uid, account) > job_desc.account = account You can actually access the default_account directly with the following: > job_desc.default_account I also wouldn't recommend making any calls to user command such as sacctmgr, as this has the potential to severly slow down slurmctld, as it will have to do this for every job submitted. If it is absolutely necessary to make such a call, I would suggest calling this less frequently (every day, week, etc), caching the results, and then accessing them that way. > This seems to use the current $USER to determine the default account if > none is specified. However, the portion where it is supposed to obtain > a partition list (part_list) seems to be giving me issues now. I tried > hard-coding in a list of partitions just to see if it would work, but > I'm uncertain as to what the format of this should be. I've tried: > > part_list="general-short,general-long,science" > > part_list="PartitionName=general-short,PartitionName=general-long, > PartitionName=science" > > part_list="1:general-short,2:general-long,3:science" > > None of these seem to work. It seems to what some sort of table. The > issue is I'm not sure how those things might be passed directly from > SLURM into that script (without me using shell commands, hardcoded > strings, or other kludges). Any insight you can provide would be > appreciated. It seems that you're trying to modify the part_list that is passed into the slum_job_submit function? This parameter is actually an input, as seen in the job_submit plugin documentation (https://slurm.schedmd.com/job_submit_plugins.html#lua): > part_list (input) List of pointer to partitions which this user is > authorized to use. If you want to modify the partitions that a job can land on, you should modify `job_desc.partition`. Format this in the same way that you would format a partition list for a job submitted from the command-line. Example: > $ sinfo > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > A* up infinite 10 idle n-[1-10] > B up infinite 10 idle n-[11-20] Submitting from command-line: > $ srun --partition="A,B" hostname job_submit.lua > function slurm_job_submit(job_desc, part_list, submit_uid) > ... > job_desc.partition="A,B" > ... Let me know if you have any other questions If you don't have any further questions, then I'll close this bug out. Feel free to reopen it though and reply if you have any questions related to the original topic of this bug. |
Created attachment 26251 [details] slurm.conf file We are trying to reconfigure our cluster partitions in a way that will permit auto-segregation of user submitted jobs to the "most appropriate" queue. We are doing this chiefly to account for buy-in nodes (nodes purchased by research groups). Essentially, buy-in nodes can allow jobs to run for non-account holders, but only for a max time of 4 hours. So we have a "general-short" queue (all nodes, max runtime 4 hours), a "general-long" queue (all nodes EXCEPT buy-in nodes, max runtime 7 days), and account based queues (all nodes including respective buy-in nodes, max runtime 7 days). That's the essential background. Please note that we currently just use a single default partition. Another cluster I have access to (but no admin rights) does not appear to use a "Default" partition/queue. One doesn't need to specify the partition when submitting the job. The job appears to be tested against each partition in a particular order, and it is scheduled in the queue where it meets requirements and where resources are available. (e.g., a job with a time=3 hours should run in "general-short", not "general-long"). I've tried to replicate this setup without success. I'm attaching a copy of our current slurm.conf file for your review. I tried to use the "PartitionName=DEFAULT" queue as the "Default" but jobs submitted without a partition specified just fail since there are no nodes assigned to DEFAULT. I CAN submit a job and specify a specific partition, and it works fine. I've also tried submitting a job by specifying all the queues (e.g. sbatch -p general-short,general-long,science jobscript.sh). This sometimes works, but strangely, the job will try to run in the queue/partition that is listed LAST in the "slurm.conf". Please let me know how I might make this work. Thanks, John