Hello, We have a cluster running in AWS with a few different partitions that currently use specific EC2 instances (cpu/mem/gpu optimized): -- $ scontrol show partitions | grep -i PartitionName PartitionName=C-16Cpu-30GB PartitionName=C-36Cpu-69GB PartitionName=C-72Cpu-139GB PartitionName=M-16Cpu-123GB PartitionName=M-48Cpu-371GB PartitionName=M-96Cpu-742GB PartitionName=G-1GPU-8Cpu-58GB PartitionName=G-4GPU-32Cpu-235GB PartitionName=G-8GPU-64Cpu-471GB -- I have to onboard new users that are not part of our business unit and need to track and charge back their usage for each of these partitions/instance_types. These partitions each map to a specific instance type and has a AWS cost associated with it. Is there a way in slurm that I can get a report of a given users usage of each of these partitions/instances so that I can then calculate the cost for charge back purposes. Currently I can only pull high level metrics for a given user: -- $ sreport cluster AccountUtilizationByUser start=01/01/21 end=01/31/21 | egrep -i 'Cluster|petr' Cluster/Account/User Utilization 2021-01-01T00:00:00 - 2021-01-30T23:59:59 (2592000 secs) Cluster Account Login Proper Name Used Energy slurm-ma+ titanium xxx xxx 4011033 0 $ sreport cluster AccountUtilizationByUser --tres="gres/gpu" start=01/01/21 end=01/31/21 | egrep -i 'Cluster|petr' Cluster/Account/User Utilization 2021-01-01T00:00:00 - 2021-01-30T23:59:59 (2592000 secs) Cluster Account Login Proper Name TRES Name Used slurm-ma+ titanium xxx xxx gres/gpu 501379 -- However this does not tell me how much hours did this user use a specific partition. Is there a way to get this data or a better approach on doing charge back for users in this type of an environment. I would really appreciate your input. Thanks, -Simran
Hi Simran, For you to get the information you want for jobs that have already run I would recommend using 'sacct' to get data on jobs that meet the criteria you're trying to report on. The sacct command will get some information about jobs that have run and had some information stored in the database. You can filter on things like the user and partition, along with a date range to get just the information you want. Here's an example of what a query might look like from my test system. sacct --starttime=2021-02-10 --endtime=2021-02-11 --partition debug --user user1 You can also control the fields that are displayed. You can see the fields that are available to be displayed by looking at 'sacct -e' or 'man sacct'. Going forward, you may want to consider using a Workload Characterization Key (WCKey) to attach a unique characteristic to jobs that you can then report on with sreport. You can create a WCKey that corresponds to each partition you have and then jobs with that WCKey/partition combination will be reported correctly. You can rely on users to request the correct WCKey when they submit, but the better option would probably be to create a submit filter that adds the correct WCKey based on the partition requested by a job. In order to use WCKeys you would need to add a couple lines to your slurm.conf: AccountingStorageEnforce=wckey (this may already have other entries) TrackWCKey=yes In your slurmdbd.conf file you would also need to add: TrackWCKey=yes You would also need to add WCKeys to your users with sacctmgr, like this: sacctmgr add user user1 wckey=partition1_key There is more information about using WCKeys in the documentation here: https://slurm.schedmd.com/wckey.html Once you have jobs running with a WCKey associated with them, you should be able to run reports that show the data you're looking for. The two reports in particular that should work are: sreport cluster UserUtilizationByWCKey sreport cluster WCKeyUtilizationByUser There are also job reports that don't sound like they would be as relevant for what you are asking for, but I'll point out: sreport job SizesByAccountAndWckey sreport job SizesByWckey Let me know if this sounds like it will work for you, or if you have any questions about implementing WCKeys. Thanks, Ben
Hi Ben, Thanks for your response. I like the sreport approach since it aggregates the hours in it's output unlike the sacct output, which makes it a bit easy for me to report on a monthly chargeback mode. The WCKey approach seems very interesting and something that I think I would like to pursue. If I understand this correctly, I would have a unique wckey defined for each partition and enforce this via our submit script and add the appropriate wckey for each partition when the job is submitted depending on what partition is being used. Then I should be able to get aggregated hours spent by a given user with a specific wckey (ec2 instance)? We can test this in sandbox and see if this would work for us. Once we have this implemented and it gives us the info we need, is this something that can be queried via slurmrestd or is this something we need to manually run the sreport commands for? We need to automate the chargeback so thinking of the best approach. Thanks, -Simran
Also, I am assuming we can use our job_sumit.lua script to inject this wckey based on which partition is being used (ex: C-16Cpu-30GB injects --wckey=c5.4xlarge). We don't need the user to provide this field and we just enforce it from our lua script and don't need to define any default wckey when the user is added to slurm either. Let me know if I am missing something. Thanks, -Simran
Hi Simran, Your understanding is correct, with unique WCKeys assigned to jobs in different partitions you can use sreport to get reports on hours spent by a specific user with a given WCKey. The WCKey is an attribute that shows up for jobs, so you should be able to see it when you query the jobs via slurmrestd. This shows the attributes (including WCKey) that you see when querying a job. https://slurm.schedmd.com/rest_api.html#slurmctldGetJob That's right, you can use your job_submit.lua script to add the WCKey so you don't need to define the default WCKey for each user. Thanks, Ben
Hi Ben, Would something like this in our job submit lua script work for what we have discussed: if (job_desc.partition == 'C-16Cpu-30GB') then job_desc.wckey = 'c5.4xlarge' elseif (job_desc.partition == 'C-36Cpu-69GB') then job_desc.wckey = 'c5.9xlarge' elseif (job_desc.partition == 'C-72Cpu-139GB') then job_desc.wckey = 'c5.18xlarge' elseif (job_desc.partition == 'M-16Cpu-123GB') then job_desc.wckey = 'r5.4xlarge' elseif (job_desc.partition == 'M-48Cpu-371GB') then job_desc.wckey = 'r5.12xlarge' elseif (job_desc.partition == 'M-96Cpu-742GB') then job_desc.wckey = 'r5.24xlarge' elseif (job_desc.partition == 'G-1GPU-8Cpu-58GB') then job_desc.wckey = 'p3.2xlarge' elseif (job_desc.partition == 'G-4GPU-32Cpu-235GB') then job_desc.wckey = 'p3.8xlarge' else (job_desc.partition == 'G-8GPU-64Cpu-471GB') job_desc.wckey = 'p3.16xlarge' end Not sure if there is a easier way to achieve this. Regards, -Simran
We enabled this capability and updated our job submit lua script but all jobs seems to be getting * for the wckey. We might be missing something in our lua script and would appreciate any feedback you can provide to set this up correctly: -- simran@spcdp-usw2-1104:~$ sbatch batch-test.sh Submitted batch job 1452 simran@spcdp-usw2-1104:~$ squeue -l -u simran Tue Feb 16 00:40:20 2021 JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON) 1452 C-16Cpu-3 simtest simran CONFIGUR 0:01 UNLIMITED 1 spcdp-usw2-0004 simran@spcdp-usw2-1104:~$ scontrol show job 1452 | grep -i wckey Priority=4294901757 Nice=0 Account=palladium QOS=special WCKey=* -- Here is our full job submit lua script: -- $ cat job_submit.lua --[[ Custom job_submit script for Apollo Deep Learning Environment - only allow users to submit GPU jobs with --gres=gpu:<> flag --]] function slurm_job_submit(job_desc, part_list, submit_uid) if (job_desc.partition == 'G-1GPU-8Cpu-58GB' or job_desc.partition == 'G-4GPU-32Cpu-235GB' or job_desc.partition == 'G-8GPU-64Cpu-471GB') and (not job_desc.gres or ( job_desc.gres and not (string.match(job_desc.gres, "gpu:%d+") or string.match(job_desc.gres, "gpu:.-:%d+")))) then slurm.log_info("slurm_job_submit: GPU job submitted by user_id:%d rejected - no GPU resources specified", job_desc.user_id) slurm.user_msg("Invalid submission of GPU job. Jobs to the GPU partition must specify GPU resources, (for example --gres=gpu:1 or --gres=gpu:4)") return 2072 end if (job_desc.partition == 'G-1GPU-8Cpu-58GB' or job_desc.partition == 'G-4GPU-32Cpu-235GB' or job_desc.partition == 'G-8GPU-64Cpu-471GB') and job_desc.qos then job_desc.qos = '' slurm.log_info("slurm_job_submit: Set user default QOS") slurm.user_msg("QOS is set automatically") end if (job_desc.partition == 'C-16Cpu-30GB') then job_desc.wckey = 'c5.4xlarge' elseif (job_desc.partition == 'C-36Cpu-69GB') then job_desc.wckey = 'c5.9xlarge' elseif (job_desc.partition == 'C-72Cpu-139GB') then job_desc.wckey = 'c5.18xlarge' elseif (job_desc.partition == 'M-16Cpu-123GB') then job_desc.wckey = 'r5.4xlarge' elseif (job_desc.partition == 'M-48Cpu-371GB') then job_desc.wckey = 'r5.12xlarge' elseif (job_desc.partition == 'M-96Cpu-742GB') then job_desc.wckey = 'r5.24xlarge' elseif (job_desc.partition == 'G-1GPU-8Cpu-58GB') then job_desc.wckey = 'p3.2xlarge' elseif (job_desc.partition == 'G-4GPU-32Cpu-235GB') then job_desc.wckey = 'p3.8xlarge' else (job_desc.partition == 'G-8GPU-64Cpu-471GB') job_desc.wckey = 'p3.16xlarge' end return slurm.SUCCESS end function slurm_job_modify(job_desc, job_rec, part_list, modify_uid) return slurm.SUCCESS end slurm.log_info("initialized") return slurm.SUCCESS -- Thanks, -Simran
Ben, Looks like this was related to a typo in the code where we were missing a then statement and using else instead of elseif. We have fixed this and now I see the correct wckey being defined as the job runs. Please let us know if there is a better way to assign the wckey than the if else loop. However, now that each job has the correct key assigned, I still don't see it being reported in sreport. Is there a delay on when sreport will have this information or am I missing something further? -- Successfully ran and completed a dummy sleep 60 job: $ scontrol show job 1466 JobId=1466 JobName=test-script.sh UserId=simran(85174) GroupId=dialout(20) MCS_label=N/A Priority=4294901743 Nice=0 Account=palladium QOS=special WCKey=p3.2xlarge JobState=COMPLETED Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:01:01 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2021-02-16T19:25:06 EligibleTime=2021-02-16T19:25:06 AccrueTime=2021-02-16T19:25:06 StartTime=2021-02-16T19:31:44 EndTime=2021-02-16T19:32:45 Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-02-16T19:25:06 Partition=G-1GPU-8Cpu-58GB AllocNode:Sid=spcdp-usw2-1104:18165 ReqNodeList=(null) ExcNodeList=(null) NodeList=spcdp-usw2-0676 BatchHost=spcdp-usw2-0676 NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=2,mem=14736M,node=1,billing=2,gres/gpu=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryCPU=7368M MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/home/simran/test-script.sh WorkDir=/home/simran StdErr=/home/simran/slurm-1466.out StdIn=/dev/null StdOut=/home/simran/slurm-1466.out Power= TresPerNode=gpu:1 MailUser=(null) MailType=NONE confirmed that the correct wckey is attached: $ scontrol show job 1466 | grep -i wckey Priority=4294901743 Nice=0 Account=palladium QOS=special WCKey=p3.2xlarge However, none of the sreport outputs are showing the details of this wckey: # sreport cluster WCKeyUtilizationByUser start=2/1/21 end=2/30/21 -------------------------------------------------------------------------------- Cluster/WCKey/User Utilization 2021-02-01T00:00:00 - 2021-02-16T19:59:59 (1368000 secs) Usage reported in CPU Minutes -------------------------------------------------------------------------------- Cluster WCKey Login Proper Name Used --------- --------------- --------- --------------- -------- slurm-ma+ * 205 slurm-ma+ * jhui Hui 200 slurm-ma+ * ravih1 Ravi 4 slurm-ma+ * simran Hansrai 1 # sreport cluster UserUtilizationByWCKey start=2/1/21 end=2/30/21 -------------------------------------------------------------------------------- Cluster/User/WCKey Utilization 2021-02-01T00:00:00 - 2021-02-16T19:59:59 (1368000 secs) Usage reported in CPU Minutes -------------------------------------------------------------------------------- Cluster Login Proper Name WCKey Used --------- --------- --------------- --------------- -------- slurm-ma+ jhui Hui * 200 slurm-ma+ ravih1 Ravi * 4 slurm-ma+ simran Hansrai * 1 -- What am I missing here? Thanks, -Simran
Hi Simran - Please be aware that we treat bug severity issues very seriously since attention to them will impact development on our side. https://www.schedmd.com/support.php Severity 2 — High Impact A Severity 2 issue is a high-impact problem that is causing sporadic outages or is consistently encountered by end users with adverse impact to end user interaction with the system. Ben is currently out of the office due to weather related issues in the US. If this is production impacting I can have someone else look at this today. If not then please wait for his reply.
Jason, I have updated the severity based on your feedback. Even though this is not impacting production services right now it is limiting us from onboarding any new user that is not part of our approved business unit. It is ok if I get a response in the next day or 2 but this will eventually become critical for us. For now, lowered the priority and will wait for a response from Ben now that I know he is not in the office. Thanks for the clarification and your support. Regards, -Simran
Hi Simran, I'll try to help while Ben is out. Could you verify that TrackWCKey=yes on both, slurm.conf and slurmdbd.conf? If they both have them, could you paste the output of: $ scontrol show config | grep -i wckey $ sacctmgr show config | grep -i wckey Although it seems that you wait enough, please also note that unlike sacct information, the sreport data is updated/aggregated in an hourly basis. Are you still getting only * WCKeys or are the right wckeys shown already? Regards, Albert
Hi Simran, My apologies for the delayed response. As Jason mentioned I've been dealing with weather related power outages. It looks like Albert has jumped in to help (thank you). I was going to respond with the same things, but have on addition to make. Can I have you look at the information reported for a job with sacct? sacct -j 1466 -o jobid,jobname,partition,account,wckey Thanks, Ben
Hi Ben, No worries, hope you are safe and doing well. Looks like I was missing the hourly update context for sreport. I am able to see the updated keys now: -- simran@spcdp-usw2-1104:~$ sacct -j 1466 -o jobid,jobname,partition,account,wckey JobID JobName Partition Account WCKey ------------ ---------- ---------- ---------- ---------- 1466 test-scri+ G-1GPU-8C+ palladium p3.2xlarge 1466.batch batch palladium simran@spcdp-usw2-1104:~$ sreport cluster WCKeyUtilizationByUser start=2/1/21 end=2/29/21 -------------------------------------------------------------------------------- Cluster/WCKey/User Utilization 2021-02-01T00:00:00 - 2021-02-17T17:59:59 (1447200 secs) Usage reported in CPU Minutes -------------------------------------------------------------------------------- Cluster WCKey Login Proper Name Used --------- --------------- --------- --------------- -------- slurm-ma+ * 208 slurm-ma+ * jhui Hui 200 slurm-ma+ * ravih1 Ravi 4 slurm-ma+ * simran Hansrai 4 slurm-ma+ c5.4xlarge 28 slurm-ma+ c5.4xlarge ravih1 Ravi 9 slurm-ma+ c5.4xlarge simran Hansrai 20 slurm-ma+ p3.2xlarge 242 slurm-ma+ p3.2xlarge jhui Hui 202 slurm-ma+ p3.2xlarge ravih1 Ravi 27 slurm-ma+ p3.2xlarge simran Hansrai 13 -- Another question, if I have a user job that gets submitted in Feb but ends in March. Where will this usage be reflected, if I am running this query on a monthly bases for chargeback. I am hoping that since it ends in March that my monthly March dump will capture that usage. Thanks, -Simran
I'm glad to see that things look like they're working as expected after the rollup. For jobs that have usage that span time periods, the usage should be reported in the time period in which the usage occurred. In other words, the usage that occurred in Feb will be reported as happening in Feb and the usage that happened in Mar will be reported for Mar. The hourly rollup looks at the usage for that period of time and accumulates it in a table. The same thing also happens on a daily and monthly basis to keep the usage for a given time period accurate. Let me know if you have any questions about this. Thanks, Ben
Thanks Ben. Would be great to get your input regarding the following code, in case there is a better way of doing it or does this look ok to you: -- if (job_desc.partition == 'C-16Cpu-30GB') then job_desc.wckey = 'c5.4xlarge' elseif (job_desc.partition == 'C-36Cpu-69GB') then job_desc.wckey = 'c5.9xlarge' elseif (job_desc.partition == 'C-72Cpu-139GB') then job_desc.wckey = 'c5.18xlarge' elseif (job_desc.partition == 'M-16Cpu-123GB') then job_desc.wckey = 'r5.4xlarge' elseif (job_desc.partition == 'M-48Cpu-371GB') then job_desc.wckey = 'r5.12xlarge' elseif (job_desc.partition == 'M-96Cpu-742GB') then job_desc.wckey = 'r5.24xlarge' elseif (job_desc.partition == 'G-1GPU-8Cpu-58GB') then job_desc.wckey = 'p3.2xlarge' elseif (job_desc.partition == 'G-4GPU-32Cpu-235GB') then job_desc.wckey = 'p3.8xlarge' elseif (job_desc.partition == 'G-8GPU-64Cpu-471GB') then job_desc.wckey = 'p3.16xlarge' end -- Thanks, -Simran
I forgot to address that part of your question. That if statement looks good. I don't know that there is going to be a more efficient way of mapping wckeys to partitions. Thanks, Ben
Thanks for all the help Ben. Much appreciated. Feel free to close this request. I will open a new one if we have further questions/issues. Regards, -Simran
I'm glad to hear you have a solution that works for you. Closing now.