Ticket 1208

Summary: Perl API access to list of assigned CPUs on nodes
Product: Slurm Reporter: Josko Plazonic <plazonic>
Component: OtherAssignee: Brian Christiansen <brian>
Status: RESOLVED FIXED QA Contact:
Severity: 5 - Enhancement    
Priority: --- CC: brian, da, dmcr
Version: 14.03.7   
Hardware: Linux   
OS: Linux   
Site: Princeton (PICSciE) Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 15.08.0pre2 15.08.0pre7 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Josko Plazonic 2014-10-24 02:05:45 MDT
At this time there is no way to retrieve via Perl API the list of allocated CPUs on a node for a particular job. 

To give you some background one of our developers, Dennis McRitchie, has been adapting pbstop to work with slurm - we are calling it slurmtop.  One of its features is to show which CPUs on nodes are used by which jobs. I.e. if a job 1212 is using CPUs 8-15 we would like to be able to get that info.

Currently the only place where one can get that information is in scontrol show job ... and we would really need to be able to get this directly via perl API rather then trying to parse out that kind of output.
Comment 1 Moe Jette 2014-10-24 09:10:26 MDT
The information required is in the C job data structure, in this field:
job_resources_t *job_resrcs; /* opaque data type, job resources */

It is currently available using the scontrol -d show job command, but not in the Perl API:
JobId=21470 Name=scontrol
   UserId=jette(1001) GroupId=jette(1001)
....
     Nodes=tux1 CPU_IDs=0-3 Mem=128
     Nodes=tux[2-3] CPU_IDs=0-1 Mem=128
Comment 2 Josko Plazonic 2014-12-11 01:45:01 MST
Hi,

I wanted to touch base on this one - we knew about scontrol way of getting to it but that's a very inefficient way for us to do that (slurmtop is ran by multiple users and would do that multiple times per minute for 1000+ jobs we have in the queue).

Any plans on exposing this info via perl API directly?

Thanks!
Comment 3 Moe Jette 2014-12-11 01:48:45 MST
(In reply to Josko Plazonic from comment #2)
> Hi,
> 
> I wanted to touch base on this one - we knew about scontrol way of getting
> to it but that's a very inefficient way for us to do that (slurmtop is ran
> by multiple users and would do that multiple times per minute for 1000+ jobs
> we have in the queue).
> 
> Any plans on exposing this info via perl API directly?
> 
> Thanks!

We'll try to get this into the v15.08 release.
Comment 4 Danny Auble 2014-12-12 10:34:25 MST
This has been added to 15.08 and the commit 4e15ddc1b04 should work with previous versions of Slurm.

A Simple script like 

#! /usr/bin/perl -w

use strict;
use Slurm ':all';

my $job_flags = SHOW_ALL | SHOW_DETAIL;
my $resp = Slurm->load_jobs(0, $job_flags);
foreach my $job (@{$resp->{job_array}}) {
	if ($job->{'node_rescrs'}) {
		foreach my $rescrs (@{$job->{'node_rescrs'}}) {
			printf("\tNodes = %s CPU_IDS = %s Mem = %s\n",
			       $rescrs->{'nodes'}, $rescrs->{'cpu_ids'},
			       $rescrs->{'mem'});
		}
	}
}

should get you what you want.  SHOW_DETAIL is the key here in the flags.
Comment 5 Josko Plazonic 2015-07-21 08:27:53 MDT
Hi,

one minor issue (pointed out by our Dennis who is starting to use this feature so he wants to make sure the correct name sticks) - shouldn't it be
node_resrcs
rather then node_rescrs?

It's a trivial change/fix:

--- ./contribs/perlapi/libslurm/perl/job.c.original     2015-07-21 15:51:01.882759414 -0400
+++ ./contribs/perlapi/libslurm/perl/job.c      2015-07-21 16:26:12.234828155 -0400
@@ -168,7 +168,7 @@
        }
        slurm_hostlist_destroy(hl);
        slurm_hostlist_destroy(hl_last);
-       hv_store_sv(hv, "node_rescrs", newRV_noinc((SV*)av));
+       hv_store_sv(hv, "node_resrcs", newRV_noinc((SV*)av));
 
        return 0;
 }

Thanks!
Comment 6 Brian Christiansen 2015-07-21 10:02:48 MDT
Corrected in:
https://github.com/SchedMD/slurm/commit/dad397f926ade6d7322bfeff7eb119055b86a171

THanks!