| Summary: | "scontrol show job" / SlurmctldProlog envvars equivalence | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Kilian Cavalotti <kilian> |
| Component: | User Commands | Assignee: | Oriol Vilarrubi <jvilarru> |
| Status: | RESOLVED WONTFIX | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | bart, marshall |
| Version: | 20.11.8 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Stanford | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Kilian Cavalotti
2021-08-13 17:00:43 MDT
Hi Killian I would suggest for that task to use a Job Completion plugin, there are many of them: elasticsearch: If you are already using elasticsearch for something else I would recommend using this as you already have all the data in an structured form and is easy to make dashboards of it. The list of values stored is the following one: account,alloc_node,array_job_id,array_task_id,cluster,container,cpu_hours,cpus_per_task,derived_ec,elapsed,@eligible,@end,excluded_nodes,exit_code,group_id,groupname,het_job_id,het_job_offset,jobid,job_name,nodes,ntasks,ntasks_per_node,ntasks_per_tres,orig_dependency,pack_job_id,pack_job_offset,parent_accounts,partition,qos,@queue_wait,reservation_name,script,@start,state,std_err,std_in,std_out,@submit,time_limit,total_cpus,total_nodes,tres_alloc,tres_req,user_id,username,wc_key,work_dir filetxt: Most probably you find it insufficient for yout needs as not much data is stored using this, this is an example of a test job I launched before: JobId=7052 UserId=jvilarru(1000) GroupId=jvilarru(1000) Name=hostname JobState=COMPLETED Partition=debug TimeLimit=UNLIMITED StartTime=2021-08-23T16:13:10 EndTime=2021-08-23T16:13:10 NodeList=centos NodeCnt=1 ProcCnt=1 WorkDir=/home/jvilarru ReservationName= Tres=cpu=1,mem=200M,node=1,billing=1 Account=users QOS=normal WcKey= Cluster=cluster SubmitTime=2021-08-23T16:13:10 EligibleTime=2021-08-23T16:13:10 DerivedExitCode=0:0 ExitCode=0:0 script: This plugin executes the script you set in JobCompLoc with the environment populated with the job variables, you can find a full list of those in here: https://github.com/SchedMD/slurm/blob/master/src/plugins/jobcomp/script/README but again, this list does not contain all the list of data you want to extract from slurm. mysql: This plugin stores the data in mysql, the columns are the following: | jobid | uid | user_name | gid | group_name | name | state | partition | timelimit | starttime | endtime | nodelist | nodecnt | proc_cnt | connect_type | reboot | rotate | maxprocs | geometry | start | blockid | This is missing again some of the fields you need for your request And finally the you have the lua plugin, this one lets you access the job_record internal structure(filled in here https://github.com/SchedMD/slurm/blob/master/src/lua/slurm_lua.c#L340), so you can find all the data from inside slurm. I'm preparing and testing an example for you. In order to configure it you need to set JobCompType=jobcomp/lua and create the jobcomp.lua in the same directory as slurm.conf, In this file you need to have defined the function slurm_jobcomp_log_record with one parameter(the job record). So the minimum script would be the following one: jobcomp.lua: function slurm_jobcomp_log_record(job_rec) return slurm.SUCCESS end return slurm.SUCCESS Also take into account that in 21.08 the submission line is stored in the database, so you can access it with sacct, example: [root@centos ~]# sacct -P -j 7055 -Xo JobID,SubmitLine JobID|SubmitLine 7055|srun --mem=200 hostname And you can also store the job script itself in the database with AccountingStoreFlags=job_script in slurm.conf I'll come back to you as soon as I've finished the lua script example. Hi Oriol, Thanks for the thorough answer, much appreciated! Out of all the options you presented, the most promising and interesting to us seems to be the jobcomp/lua script. > Also take into account that in 21.08 the submission line is stored in the > database, so you can access it with sacct, example: > > [root@centos ~]# sacct -P -j 7055 -Xo JobID,SubmitLine > JobID|SubmitLine > 7055|srun --mem=200 hostname > > And you can also store the job script itself in the database with > AccountingStoreFlags=job_script in slurm.conf Ah good, that will be very helpful as well. > I'll come back to you as soon as I've finished the lua script example. Thanks! Looking forward to it. Cheers, -- Kilian Hi Killian, Some colleagues of mine told me that you have a lot of jobs in your environment, after knowing that the lua jobcompletion component is not the best option as with many jobs it might degrade the performance of slurm, now I'm finishing to test another idea which is the following: I do not know if you are aware that slurm now offers the possibility of querying it using a rest API, so my idea is that in the slurmCtldProlog you write only the jobid into a file, and that a script constantly checks that file and queries the REST api in order to get the information about the jobs and saving that in a DB, json files, etc... Now I'm testing this to see if it would provide all the necessary data for you. I'll keep you updated. Hi Oriol, (In reply to Oriol Vilarrubi from comment #5) > Some colleagues of mine told me that you have a lot of jobs in your > environment, That's true: we're averaging around 20,000 jobs in queue at any given moment, with job submission rates in the 100,000s/day. > after knowing that the lua jobcompletion component is not the > best option as with many jobs it might degrade the performance of slurm We're using a decently sized job_submit.lua script, that gets executed at every job submission, and this seems to be working fine. Do you think a jobcompletion lua script would be more impactful? If anything, it should run less often, since the job_sumit script even runs for jobs that are eventually rejected, and will never reach the jobcompletion phase. So I'm curious about the impact of a lua jobcompletion script vs a job_submit lua script. > I do not know if you are aware that slurm now offers the possibility of > querying it using a rest API, so my idea is that in the slurmCtldProlog you > write only the jobid into a file, and that a script constantly checks that > file and queries the REST api in order to get the information about the jobs > and saving that in a DB, json files, etc... Oh I see, that's an interesting idea. Although, if we go with an external process querying jobs, we can probably use `scontrol show job` directly instead of the extra slurmrestd layer, right? I assume slurmrestd will generate the same RPCs and potentially the same locks as `scontrol show job`, so in terms of load, on the controller, that's probably equivalent, correct? Thanks! -- Kilian Hello Killian, I've been doing some tests on the jobcomp plugin of lua and unfortunately not all the fields that you need are there. You can get a list of all the currently implemented fields by looking at the function slurm_lua_job_record_field in the source code: https://github.com/SchedMD/slurm/blob/master/src/lua/slurm_lua.c#L340 But not everything are bad news, I've also been "playing around" with the filetxt completion plugin and even though it does not have all the fields that you need, it is pretty easy to add them. Also the same for the filetxt one. So, how do you want to proceed? with the script or the filetxt? I'm inferring that the strategy is to move the process from the slurmctld prolog into the jobcompletion plugin? I'm also taking for granted that all that can be obtained using the accounting DB you do not want to use the jobcompletion plugin for it. Greetings. Hi Oriol, (In reply to Oriol Vilarrubi from comment #8) > I've been doing some tests on the jobcomp plugin of lua and unfortunately > not all the fields that you need are there. You can get a list of all the > currently implemented fields by looking at the function > slurm_lua_job_record_field in the source code: > https://github.com/SchedMD/slurm/blob/master/src/lua/slurm_lua.c#L340 > > But not everything are bad news, I've also been "playing around" with the > filetxt completion plugin and even though it does not have all the fields > that you need, it is pretty easy to add them. Also the same for the filetxt > one. That sounds great! > So, how do you want to proceed? with the script or the filetxt? I think that the lua job completion approach may be best of the two, as it would provide more flexibility for users to define the recording format they want. For instance, my understanding is that the filetxt completion plugin will store all job information in a single file, while we would need to have each job's information stored in a separate file. > I'm > inferring that the strategy is to move the process from the slurmctld prolog > into the jobcompletion plugin? I'm also taking for granted that all that can > be obtained using the accounting DB you do not want to use the jobcompletion > plugin for it. Yes, and actually, thinking more about this approach, there are a few questions, I guess: 1. recording that information through a jobcomp plugin actually seems a bit redundant with the accounting database. I know that the DB has been recently extended to store more information about jobs (like the submission script, workdir, etc) but parts are still missing. So instead of extending the information recorded by the jobcomp mechanism, wouldn't it make more sense to continue adding the missing bits in the accounting database, and thus have a single point of reference for all job information? Fragmenting the information across the accounting DB and a job completion plugin doesn't seem optimal in that respect. 2. our current scontrol-based system occurs during the SlurmctldProlog, when the job *starts*. With a jobcomp plugin, it would occurs when the job *ends*. Meaning that during the whole duration of the job, that information would not be available. And we routinely rely on that information while jobs are running, so moving to a jobcomp plugin wouldn't actually work for this, since the information wouldn't be available until a job has ended. On the other hand, job information becomes available in the accounting database as soon as the job starts. Both those points make me think that expanding the job accounting database to store the missing information would be best than using a separate jobcomp plugin. What do you think? Thanks! -- Kilian Hi Killian I will reply you inline of your text > I think that the lua job completion approach may be best of the two, as it > would provide more flexibility for users to define the recording format they > want. For instance, my understanding is that the filetxt completion plugin > will store all job information in a single file, while we would need to have > each job's information stored in a separate file. I agree with the point that the lua one will provide more flexibility, also as you said, the filetxt one stores it in the same file, so that would be problematic if you want to read it while it is being written. > > 1. recording that information through a jobcomp plugin actually seems a bit > redundant with the accounting database. I know that the DB has been recently > extended to store more information about jobs (like the submission script, > workdir, etc) but parts are still missing. So instead of extending the > information recorded by the jobcomp mechanism, wouldn't it make more sense > to continue adding the missing bits in the accounting database, and thus > have a single point of reference for all job information? > Fragmenting the information across the accounting DB and a job completion > plugin doesn't seem optimal in that respect. That would be a difficult topic, some sites have really really big databases, to the point where adding a single field would mean that their database would grew a lot, and might have some impact on the performance. Also if we store all these data separately (Dependecy, exclude/include nodelist etc) we might be duplication the data, let me explain this: As you know in 21.08 the submission line is stored in the DB, and there is an option to also store also the job script itself, in the vast majority of the jobs that contains the totality of the submission data, either inside the job script or in parameters in the submitline. Even saying that I understand your point that you want to have this data directly accessible without the need to parse the job script and the submit line, but as said before we need to be very careful in how we modify the DB fields. > 2. our current scontrol-based system occurs during the SlurmctldProlog, when > the job *starts*. With a jobcomp plugin, it would occurs when the job > *ends*. Meaning that during the whole duration of the job, that information > would not be available. And we routinely rely on that information while > jobs are running, so moving to a jobcomp plugin wouldn't actually work for > this, since the information wouldn't be available until a job has ended. > On the other hand, job information becomes available in the accounting > database as soon as the job starts. That is a very valid point, you would have no data while the job is running(if you move entirely to the jobcomp plugin). > Both those points make me think that expanding the job accounting database > to store the missing information would be best than using a separate jobcomp > plugin. As said before I am reluctant to modify the fields of what is stored in the database. Can I try to convice you so that I include the missing ENV vars in the slurmctldprolog? I cannot guarantee that this change would ship officially with the next slurm version, but I can provide you a patch for your local slurm installation to include it. Does that sounds good to you? Hi Oriol, (In reply to Oriol Vilarrubi from comment #10) > As said before I am reluctant to modify the fields of what is stored in the > database. And you made valid points about it, that's true. This is probably a larger issue than just this bug, and something that likely has been discussed many time, but the discrepancy (and sometimes redundancy) between all the different ways to look at a job (through squeue, scontrol show job, sstat or sacct) has always been and remains a great source of confusion for users and sysadmins alike. > Can I try to convice you so that I include the missing ENV vars in the > slurmctldprolog? Well, yes, that would be great! Having the missing bits available as environment variables in SlurmctldProlog would totally work for us. > I cannot guarantee that this change would ship officially > with the next slurm version, but I can provide you a patch for your local > slurm installation to include it. > Does that sounds good to you? I'm completely fine with testing and carrying a local patch. It will very likely benefit other sites as well, so I'm pretty sure it would be useful to eventually integrate it. Thank you! -- Kilian Killian - I have been reviewing this issue with Oriol. We would be willing to expand the database, however adding all the environment variables is something we are not interested at this time, at least not without some type of sponsored development. Recently, we added a flag that help sites record more information about their jobs, which I am sure you are aware of. New: job_script Current/Previously supported: job_env job_comment https://slurm.schedmd.com/slurm.conf.html#OPT_AccountingStoreFlags Although this does not obtain every check box or feature you are after, it does offer some added details for your jobs that was not previously stored. |