7714 – Add plugin to support AMD GPUs

Ticket 7714 - Add plugin to support AMD GPUs

Summary: Add plugin to support AMD GPUs

Status:	RESOLVED FIXED

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	GPU (show other tickets)
Version:	20.02.x
Hardware:	Linux Linux

Severity:	C - Contributions
Assignee:	Danny Auble
QA Contact:

URL:

Duplicates (1):	7560 (view as ticket list)
Depends on:
Blocks:

Reported:	2019-09-09 14:56 MDT by Harish Kasiviswanathan
Modified:	2021-05-24 14:47 MDT (History)
CC List:	4 users (show)

See Also:	7560
Site:	-Other-
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:	20.02.0-pre1
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
git patch (40.48 KB, application/mbox) 2019-09-09 14:56 MDT, Harish Kasiviswanathan	Details
Add brand type for AMD GPU (2.32 KB, patch) 2020-01-02 14:51 MST, Mike Li	Details \| Diff
Add PCI Info for AMD GPU (3.06 KB, patch) 2020-01-02 14:52 MST, Mike Li	Details \| Diff
Add energy accounting plugin for AMD GPU (28.95 KB, patch) 2020-01-02 14:53 MST, Mike Li	Details \| Diff
Show Obsolete (4) Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Harish Kasiviswanathan 2019-09-09 14:56:36 MDT

Created attachment 11517 [details]
git patch

Please review our patch that adds support for AMD GPU

Comment 3 Danny Auble 2019-09-30 15:45:36 MDT

Comment on attachment 11517 [details]
git patch

This is now in a branch 'bug7714'.  Please give any future patches against this branch.

Comment 4 Mike Li 2019-10-07 15:21:24 MDT

This is Mike Li and I’m working on AMD plugin phase 2.
I have some questions regarding the requirement for phase 2:
 
RE: power consumption, RAPL (GPU, CPU): 
Is this power consumption per node or per GPU
AcctGatherEnergy Cray/Aries, rapl are both per node.
Both above include only CPU + Memory. We can add all GPUs to sum. That will be per node.
 
RE: GPU utilization, GPU memory utilization, temperature
Can you point me to the interfaces where they can be added.
 
Thanks
 
Mike

Comment 5 Danny Auble 2019-10-07 15:34:56 MDT

(In reply to Mike Li from comment #4)
> This is Mike Li and I’m working on AMD plugin phase 2.
> I have some questions regarding the requirement for phase 2:
>  
> RE: power consumption, RAPL (GPU, CPU): 
> Is this power consumption per node or per GPU

The smaller resolution the better, but I am not the one coming up with the requirements.  If it were me I would want it by core ;).  Obviously that is most likely not going to happen, so as low as you are able to go would be prefered.

> AcctGatherEnergy Cray/Aries, rapl are both per node.
> Both above include only CPU + Memory. We can add all GPUs to sum. That will
> be per node.

Yeap, we deal with what we deal with.  As stated above the lower you can go the better.  It would be nice to not have to rely on node basis, we can easily add the GPUs up if give per GPU.

>  
> RE: GPU utilization, GPU memory utilization, temperature
> Can you point me to the interfaces where they can be added.

I think the best way to accomplish this is by stacking different acct_gather_energy plugins.  At the moment we don't "stack" acct_gather_energy plugins, but it wouldn't be hard to add that.

An example of a stackable plugin can be found in src/common/slurm_acct_gather_interconnect.c.

Commit da1e837a0e6570e10 will most likely be of great help.  It is fairly straightforward.

After that you can add a new acct_gather_energy plugin and go from there.

Good luck :).

Comment 8 Mike Li 2019-10-08 08:36:53 MDT

thanks

Comment 9 Danny Auble 2019-10-08 09:52:13 MDT

Mike, I just pushed a simple patch to allow stacking of the energy plugins to get you started.

Comment 10 Mike Li 2019-10-08 09:59:19 MDT

Thanks Danny

Mike

Comment 11 Mike Li 2019-10-08 14:06:59 MDT

Lets see if I understood correctlly. Base on the stack patch, I can do

AcctGatherEnergyTpye=acct_gather_energy/gpu0, acct_gather_energy/gpu1, ,, , acct_gather_energy/gpu8

Then I have to add 8 plugins for gpu0, … gpu8.


Mike

Comment 12 Danny Auble 2019-10-08 14:19:13 MDT

No.

I would expect you to have 1 new plugin acct_gather_energy/amd_gpu (whatever you call it)

Then I would expect that to handle all the different gpus.  You can see how we currently do that in ipmi (which handles multiple sensors).

So what I would expect is a configuration like

AcctGatherEnergyTpye=rapl,amd_gpu

This would use the rapl and the new amd_gpu to give you the power consumption of the node.

What I would expect the plugin to do is figure out what gpus are being used by the job (hopefully using cgroups) and then only query the ones that matter.

If I was looking for stats on the node outside of a job then I would query all on the node.

Profiling should be done on a per gpu basis.

Does that make sense?

Comment 13 Mike Li 2019-10-08 14:34:59 MDT

Yes, that's more clear.

Thanks

Mike

Comment 16 Mike Li 2019-10-16 15:01:05 MDT

RE: I would expect you to have 1 new plugin acct_gather_energy/amd_gpu (whatever you call it)
acct_gather_energy/amd_gpu

RE: Then I would expect that to handle all the different gpus. You can see how we currently do that in ipmi (which handles multiple sensors).
The amd_gpu plugin will be designed similar to acct_gather_energy/ipmi with following details:
1)The RSMI interface will provide the number of GPUs on the node and current watts for each GPU.
2)The amd_gpu plugin will initiates a node-level thread which will read the current watts periodically for all GPUs through RSMI
3)One acct_gather_energy_t structure will created for each GPU and will updated and maintained by the thread to provide the power consumed for each GPU
4)The plugin will figure out which gpus are being used by the job using cgroups and then only query the ones that matter.
5)For stats on the node outside of a job, the power consumed from all gpus on the node will provided.
6)Profiling should be done on a per gpu basis.
7)There no configurations like IPMI required currently.

RE: So what I would expect is a configuration like
AcctGatherEnergyTpye=rapl,amd_gpu to stack rapl and amd_plagin
This would use the rapl and the new amd_gpu to give you the power consumption
of the node.
Supported

What I would expect the plugin to do is figure out what gpus are being used by
the job (hopefully using cgroups) and then only query the ones that matter.
Supported

If I was looking for stats on the node outside of a job then I would query all
on the node.
Supported

Profiling should be done on a per gpu basis.
Supported

Comment 17 Mike Li 2019-11-05 12:20:13 MST

4)The plugin will figure out which gpus are being used by the job using cgroups and then only query the ones that matter. 

I have a issue to find out which gpus are being used by the job.
I'm trying to use ROCR_VISIBLE_DEVICES which should be "0,1". But I got 
ROCR_VISIBLE_DEVICES=NULL

static bool _check_device_visibility(uint16_t dv_ind)
{
	char *local_list;

	local_list = getenv("ROCR_VISIBLE_DEVICES");
	info("ROCR_VISIBLE_DEVICES=%s", local_list);
	if(local_list)
	{
		char *tmp, *tok;
		uint16_t i;

		tmp = xstrdup(local_list);
		info("ROCR_VISIBLE_DEVICES=%s", tmp);
		tok = strtok(tmp, ",");
		while (tok) {
			i = strtol(tok, NULL, 10);
			info("tok=%s, i=%d, dv_ind=%d", tok, i, dv_ind);
			if (dv_ind == i)
				return true;
			tok = strtok(NULL, ",");
		}
		xfree(tmp);
	}
	return false;
}

static void _get_node_energy_up(acct_gather_energy_t *energy)
{
	uint16_t i;
	acct_gather_energy_t *e;

	// sum the energy of all gpus for this job
	memset(energy, 0, sizeof(acct_gather_energy_t));
	for (i = 0; i < gpus_len; i++)
		if (_check_device_visibility(i) &&
				(gpus[i].energy.current_watts != NO_VAL)) {
			e = &gpus[i].energy;
			energy->base_consumed_energy += e->base_consumed_energy;
			energy->ave_watts += e->ave_watts;
			energy->consumed_energy += e->consumed_energy;
			energy->current_watts += e->current_watts;
			energy->previous_consumed_energy += e->previous_consumed_energy;
			/* node poll_time is computed as the oldest poll_time of
			   the gpus */
			if (energy->poll_time == 0 || energy->poll_time > e->poll_time)
				energy->poll_time = e->poll_time;
			if (debug_flags & DEBUG_FLAG_ENERGY)
				info("_get_node_energy_up: gpu: %d, current_watts: %u, consumed %"PRIu64" Joules %"PRIu64" new, ave watts %u",
					i,
					energy->current_watts,
					energy->consumed_energy,
					energy->base_consumed_energy,
					energy->ave_watts);
		}
}

Is there another way to do it?

Thanks

Mike

Comment 18 Mike Li 2019-11-05 12:38:39 MST

I have talked to Harish. He think that
AMD will provide instantaneous value for all gpus and	
Slurm will aggregate these information for the user and also correlate it with jobs.

acct_gather_energy_g_get_sum() need to be changed.

Comment 19 Danny Auble 2019-11-05 12:54:11 MST

(In reply to Mike Li from comment #17)
> 4)The plugin will figure out which gpus are being used by the job using
> cgroups and then only query the ones that matter. 
> 
> I have a issue to find out which gpus are being used by the job.
> I'm trying to use ROCR_VISIBLE_DEVICES which should be "0,1". But I got 
> ROCR_VISIBLE_DEVICES=NULL
> 
> static bool _check_device_visibility(uint16_t dv_ind)
> {
> 	char *local_list;
> 
> 	local_list = getenv("ROCR_VISIBLE_DEVICES");
> 	info("ROCR_VISIBLE_DEVICES=%s", local_list);
> 	if(local_list)
> 	{
> 		char *tmp, *tok;
> 		uint16_t i;
> 
> 		tmp = xstrdup(local_list);
> 		info("ROCR_VISIBLE_DEVICES=%s", tmp);
> 		tok = strtok(tmp, ",");
> 		while (tok) {
> 			i = strtol(tok, NULL, 10);
> 			info("tok=%s, i=%d, dv_ind=%d", tok, i, dv_ind);
> 			if (dv_ind == i)
> 				return true;
> 			tok = strtok(NULL, ",");
> 		}
> 		xfree(tmp);
> 	}
> 	return false;
> }
> 
> static void _get_node_energy_up(acct_gather_energy_t *energy)
> {
> 	uint16_t i;
> 	acct_gather_energy_t *e;
> 
> 	// sum the energy of all gpus for this job
> 	memset(energy, 0, sizeof(acct_gather_energy_t));
> 	for (i = 0; i < gpus_len; i++)
> 		if (_check_device_visibility(i) &&
> 				(gpus[i].energy.current_watts != NO_VAL)) {
> 			e = &gpus[i].energy;
> 			energy->base_consumed_energy += e->base_consumed_energy;
> 			energy->ave_watts += e->ave_watts;
> 			energy->consumed_energy += e->consumed_energy;
> 			energy->current_watts += e->current_watts;
> 			energy->previous_consumed_energy += e->previous_consumed_energy;
> 			/* node poll_time is computed as the oldest poll_time of
> 			   the gpus */
> 			if (energy->poll_time == 0 || energy->poll_time > e->poll_time)
> 				energy->poll_time = e->poll_time;
> 			if (debug_flags & DEBUG_FLAG_ENERGY)
> 				info("_get_node_energy_up: gpu: %d, current_watts: %u, consumed
> %"PRIu64" Joules %"PRIu64" new, ave watts %u",
> 					i,
> 					energy->current_watts,
> 					energy->consumed_energy,
> 					energy->base_consumed_energy,
> 					energy->ave_watts);
> 		}
> }
> 
> Is there another way to do it?
> 
> Thanks
> 
> Mike

Does it work to use the constrained devices in the cgroup.  That might be the easier way to do it.

Comment 20 Danny Auble 2019-11-05 12:57:25 MST

(In reply to Mike Li from comment #16)
> 2)The amd_gpu plugin will initiates a node-level thread which will read the
> current watts periodically for all GPUs through RSMI

This is fine to do (copying ipmi I presume) but the only reason it was done this way for ipmi is because the ipmi call could take up to 5 seconds which was something we didn't want to do in the step.

It is fine to do it this way, but just giving you history to avoid making things more complicated than they need to be.  If the amd call isn't that heavy then you might consider running it locally in the stepd.

Comment 21 Danny Auble 2019-11-05 13:04:01 MST

(In reply to Mike Li from comment #18)
> I have talked to Harish. He think that
> AMD will provide instantaneous value for all gpus and	
> Slurm will aggregate these information for the user and also correlate it
> with jobs.
> 
> acct_gather_energy_g_get_sum() need to be changed.

It depends on what you want to do I suppose.  At the moment we only profile or keep track of energy used on the node.  If we want to put this into a different bucket then we will probably need to do something like you are talking about.

It is unclear what your end game is though.

Comment 22 Mike Li 2019-11-05 13:25:47 MST

I understand slurmd is running on each node. Where is stepd running?

Comment 23 Danny Auble 2019-11-05 13:55:14 MST

The slurmstepd (stepd) is what is running the step or actual application.  It is started from the slurmd one per step on a node.

Comment 24 Mike Li 2019-11-06 08:49:59 MST

I have implemented amd_gpu energy plugin based on IPMI and got most stuff working.
I have a issue to find out which gpus are allovcated for the job. There are 2 options which I can see and which have been listed based on my preference:
1) When stepd calls acct_gather_energy_p_get_data with type ENERGY_DATA_NODE_ENERGY_UP, pass in a pointer of an array of all gpus. AMD plugin will fill the array with energy for all gpus. stepd will aggregate correlate them with jobs.

2) Pass the allocated gpus for the job (i.e. usable_gpu like freq set) to amd_plugin. amd_plugin will correlate and add only allocated gpus for job.


Thank


Mike

Comment 25 Danny Auble 2019-11-06 08:56:05 MST

I think you are experiencing your issues because you are running things from the slurmd instead of the stepd. The stepd should have this info. If the amd calls aren't heavy I would suggest you call things in the stepd instead. If they are heavy then you will have to change the request.

Comment 26 Mike Li 2019-11-06 09:13:47 MST

For getting energy value for AMD GPU, the library call isn't heavy and can be done within 100ms. I think I'm calling from stepd, may be not. How do I know? Where are the information?

Here is my code:

	case ENERGY_DATA_NODE_ENERGY_UP:
		slurm_mutex_lock(&amd_gpu_mutex);
		if (_is_thread_launcher()) {
			if (_thread_init() == SLURM_SUCCESS)
				_thread_update_node_energy();
		} else {
			_get_joules_task(10);
		}
		_get_node_energy_up(energy);
		slurm_mutex_unlock(&amd_gpu_mutex);
		break;


The following are logs:


[2019-11-05T10:56:47.534] debug2: Finish processing RPC: REQUEST_ACCT_GATHER_ENERGY
[2019-11-05T10:56:47.534] [169.0] _get_joules_task: consumed 270 Joules (received 270(9 watts) from slurmd)
[2019-11-05T10:56:47.534] [169.0] _get_joules_task: consumed 145 Joules (received 145(6 watts) from slurmd)
[2019-11-05T10:56:47.534] [169.0] ROCR_VISIBLE_DEVICES=(null)
[2019-11-05T10:56:47.534] [169.0] ROCR_VISIBLE_DEVICES=(null)
[2019-11-05T10:56:47.534] [169.0] debug2: jag_common_poll_data: energy = 0 watts = 0 ave_watts = 0

Comment 27 Danny Auble 2019-11-06 09:45:06 MST

You should just know ﷐[U+1F609]﷑. Read your logs. It should be very obvious if you have any debug around it.

Comment 28 Mike Li 2019-11-07 13:35:47 MST

It seems that _get_joules_task is called by stepd. I have dumped env and ROCR_VISIBLE_DEVICES isn't set for stepd. I can't use this env.


[2019-11-07T10:58:00.110] [177.0] _get_joules_task: consumed 0 Joules (received 0(9 watts) from slurmd)
[2019-11-07T10:58:00.110] [177.0] _get_joules_task: consumed 0 Joules (received 0(3 watts) from slurmd)
[2019-11-07T10:58:00.110] [177.0] This is stepd
[2019-11-07T10:58:00.110] [177.0] environ=LANG=en_CA.UTF-8
[2019-11-07T10:58:00.110] [177.0] LANG=en_CA.UTF-8
[2019-11-07T10:58:00.110] [177.0] LANGUAGE=en_CA:en
[2019-11-07T10:58:00.110] [177.0] PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
[2019-11-07T10:58:00.110] [177.0] INVOCATION_ID=f2570bdac01c41adb709484b05cf90d1
[2019-11-07T10:58:00.110] [177.0] JOURNAL_STREAM=9:181020
[2019-11-07T10:58:00.110] [177.0] SLURM_CONF=/etc/slurm/slurm.conf
[2019-11-07T10:58:00.110] [177.0] SLURMSTEPD_OOM_ADJ=-1000
[2019-11-07T10:58:00.110] [177.0] SLURM_MPI_TYPE=none
[2019-11-07T10:58:00.110] [177.0] ROCR_VISIBLE_DEVICES=(null)



How to access gres_bit_alloc from stepd and in energy acct plugin?

Can you pass it like gpu plugin when setting pgu freq?

Comment 29 Danny Auble 2019-11-07 14:34:19 MST

Mike, the stepd will not have this environment set, you are correct.

If you want to trust this envvar you should use **env of the stepd_step_rec_t.  At the moment that is not exported to that plugin though.  But note it will always start with 0 (if it behaves like nvidia does when added to cgroups).

The gpus would also be in the step_gres_list of the same structure which might be easier/better to use, but you still have the same problem with that structure not being exposed to the plugin.

Perhaps it makes sense to call rocm-smi in the init of the function there to grab the gpus it lists as it should report the ones you have access to.

Would that work?

If we need to adjust this to send in the stepd_step_rect_t that would work as well, but I would rather explore these other options first.

Let me know how it goes.

Comment 30 Mike Li 2019-11-08 08:15:50 MST

The current rocm-smi is based on sysfs and only provides the total GPUs available on that node. The only way I can see is to pass step_gres_list to energy accounting plugin.

Comment 31 Mike Li 2019-11-12 13:50:25 MST

1) We still need to send in the stepd_step_rect_t. 
2) I got:
CurrentWatts=12 AveWatts=6 
when I call scontrol show node. I was expected:
CorrentWatts=# LowestJoules=# ConsumedJoules=#
3) Another question is how to show the energy consumption of a job?

Comment 32 Danny Auble 2019-11-12 15:40:00 MST

(In reply to Mike Li from comment #30)
> The current rocm-smi is based on sysfs and only provides the total GPUs
> available on that node. The only way I can see is to pass step_gres_list to
> energy accounting plugin.

This is surprising.  Will this change in the future?  I would expect it to only show those you have access to.

Passing it shouldn't be a hard thing.  I would recommend sending in the entire stepd_step_rect_t instead of just the step_gres_list in case we need something else from it.  I would put probably modify acct_gather_energy_[g|p]_set_data() to handle this new type and set a static pointer to the main struct.

I am guessing you will be able to figure out that.

Comment 33 Danny Auble 2019-11-12 15:43:26 MST

(In reply to Mike Li from comment #31)
> 1) We still need to send in the stepd_step_rect_t. 
> 2) I got:
> CurrentWatts=12 AveWatts=6 
> when I call scontrol show node. I was expected:
> CorrentWatts=# LowestJoules=# ConsumedJoules=#

This was changed in 19.05, I am guessing you were looking at an older version of Slurm before?

> 3) Another question is how to show the energy consumption of a job?

I would expect you could use sstat while the job was running or you can run with hdf5 and get a profile while the job is running.  After the job is done you can use sacct to look at this.

Comment 34 Danny Auble 2019-12-18 15:22:13 MST

Just FYI, there will be a code freeze for 20.02 on January 15th.  So any new code not written by us should be in no later than Jan 1st if there will be a hope of getting it in.

NOTE: Getting the code to us by Jan 1st will not guarantee it will make it into 20.02.  But getting it to us after that date will guarantee it will not make it.

Comment 35 Mike Li 2020-01-02 14:51:40 MST

Created attachment 12659 [details]
Add brand type for AMD GPU

Comment 36 Mike Li 2020-01-02 14:52:30 MST

Created attachment 12660 [details]
Add PCI Info for AMD GPU

Comment 37 Mike Li 2020-01-02 14:53:54 MST

Created attachment 12661 [details]
Add energy accounting plugin for AMD GPU

Comment 38 Mike Li 2020-01-02 14:57:21 MST

I have added 3 patches. Sorry about the delay during the holidays, since we have to get AMD internal approval. Let e know if there is any questions.

Comment 39 Mike Li 2020-01-10 08:48:06 MST

We have an issue with sstat, it happens without the patches.

$ sstat -j 252 -a
       JobID  MaxVMSize  MaxVMSizeNode  MaxVMSizeTask  AveVMSize     MaxRSS MaxRSSNode MaxRSSTask     AveRSS MaxPages MaxPagesNode   MaxPagesTask   AvePages     MinCPU MinCPUNode MinCPUTask     AveCPU   NTasks AveCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov ConsumedEnergy  MaxDiskRead MaxDiskReadNode MaxDiskReadTask  AveDiskRead MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask AveDiskWrite TRESUsageInAve TRESUsageInMax TRESUsageInMaxNode TRESUsageInMaxTask TRESUsageInMin TRESUsageInMinNode TRESUsageInMinTask TRESUsageInTot TRESUsageOutAve TRESUsageOutMax TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsageOutMin TRESUsageOutMinNode TRESUsageOutMinTask TRESUsageOutTot 
------------ ---------- -------------- -------------- ---------- ---------- ---------- ---------- ---------- -------- ------------ -------------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ------------- ------------- ------------- -------------- ------------ --------------- --------------- ------------ ------------ ---------------- ---------------- ------------ -------------- -------------- ------------------ ------------------ -------------- ------------------ ------------------ -------------- --------------- --------------- ------------------- ------------------- --------------- ------------------- ------------------- --------------- 
sstat: symbol lookup error: /usr/lib/slurm/auth_munge.so: undefined symbol: slurm_get_log_level

Comment 40 Danny Auble 2020-01-10 09:16:26 MST

I would expect you aren't compiling it correctly, as this should work just fine.

Comment 41 Mike Li 2020-01-10 09:24:50 MST

I was using

./configure --prefix=/tmp/slurm-build --sysconfdir=/etc/slurm --enable-pam --with-pam_dir=/lib/x86_64-linux-gnu/security/ --without-shared-libslurm --with-rsmi


It's any other options that are required?

Comment 42 Danny Auble 2020-01-10 09:41:52 MST

Not sure why you are using --without-shared-libslurm, (we haven't ever recommended that) but that shouldn't matter.

Hopefully you figure it out. I haven't ever heard of this issue before.

Comment 43 Mike Li 2020-01-14 09:25:42 MST

How is the status with 20.02 version. Have you guys review and included the changes in new release?

Comment 44 Danny Auble 2020-01-14 09:30:30 MST

As pointed out, since the deadline was missed the odds of these making it in are low.  I will do my best to review them, but I would not count these to be in 20.02.

Comment 45 Mike Li 2020-01-30 10:50:03 MST

Do you have a date for release 20.02? What were merged into 20.02? Is the phase 1 code with auto detection of AMD GPUs in 20.02?


Thanks


Mike

Comment 46 Danny Auble 2020-01-30 10:55:53 MST

In the coming weeks we will release a few release candidates and then near the end of the month we will do an actual tag for 20.02.

At this moment nothing has been merged.

Comment 47 Danny Auble 2020-02-03 11:36:19 MST

Mike, would it be wrong to call this new plugin acct_gather_energy/rsmi to match instead of amd_gpu?

Comment 48 Danny Auble 2020-02-03 11:40:57 MST

Also, I notice in _get_node_energy() we don't check for cgroups, but in similar code in _get_node_energy_up() we do.  Is that expected to have divergence like that?

Comment 49 Danny Auble 2020-02-03 12:26:00 MST

This next question is a follow up to comment 48.

It looks like you are overloading

acct_gather_energy_g_set_data(ENERGY_DATA_RECONFIG, job);

to get the gres bits you are using if not using cgroups.  I am wondering if there is a different way you can do this.  There currently isn't anything that sends in the stepd_step_rec_t in question, so this is a good way to do it without messing with function signatures, just seems like we could do it in a cleaner way.  For the time being I'll allow it but I would rather add a new enum of something so the code doesn't get messy from this kind of overloading.

Comment 50 Mike Li 2020-02-03 12:32:54 MST

would it be wrong to call this new plugin acct_gather_energy/rsmi to match instead of amd_gpu?


Ok for rsmi, since we call another plugin rsmi.

Comment 51 Mike Li 2020-02-03 12:37:47 MST

Also, I notice in _get_node_energy() we don't check for cgroups, but in similar code in _get_node_energy_up() we do.  Is that expected to have divergence like that?

_get_node_energy() is for energy consumed for node for all GPUs.
 
_get_node_energy_up() is for a job or job step. I understand that setup for the cgroup is per job submission.

Comment 52 Mike Li 2020-02-03 12:42:25 MST

It looks like you are overloading

acct_gather_energy_g_set_data(ENERGY_DATA_RECONFIG, job);

Yes, It is. I couldn't find a better way to do it.

I would rather add a new enum of something.

OK. Do you want me to do it or you can do it?

Thanks

Mike

Comment 53 Danny Auble 2020-02-03 13:21:20 MST

Thanks for your prompt response Mike.  All this code has now been merged into Master 40d323d899ea (20.02).

If anything new needs adding please open a new bug on the matter.

I ended up adding a new enum ENERGY_DATA_STEP_PTR in reference to comment 52. I also called the plugin rsmi in reference to comment 50.

Comment 54 Mike Li 2020-02-28 14:59:57 MST

Thanks Danny
I saw 20.02.0 is released. What are the feartures still missing?

Thnaks

Mike

Comment 55 Danny Auble 2020-02-28 15:04:02 MST

You tell me :).

Perhaps open a new bug as this one is closed if there are some.

Comment 56 Tim Wickberg 2021-05-24 14:47:26 MDT

*** Ticket 7560 has been marked as a duplicate of this ticket. ***