8473 – Normalised InfluxDB format

Ticket 8473 - Normalised InfluxDB format

Summary: Normalised InfluxDB format

Status:	OPEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Profiling (show other tickets)
Version:	25.11.x
Hardware:	Linux Linux

Severity:	C - Contributions
Assignee:	Alejandro Sanchez
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2020-02-08 03:28 MST by Peter Maxwell
Modified:	2026-03-12 22:49 MDT (History)
CC List:	7 users (show)

See Also:	23254
Site:	-Other-
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Patch which improves acct_gather_profile/influxdb data format (4.65 KB, patch) 2020-02-08 03:28 MST, Peter Maxwell	Details \| Diff
Improved version of the patch, includes cluster tag. (4.87 KB, patch) 2020-02-24 00:03 MST, Peter Maxwell	Details \| Diff
Attempted rewrite of patch for 20.11.0 - does not work (6.46 KB, patch) 2020-12-03 18:25 MST, Kevin Buckley	Details \| Diff
Patch updated to 23.11.7 (6.12 KB, patch) 2024-06-09 23:15 MDT, Peter Maxwell	Details \| Diff
Patch further updated to 24.05.0 (6.01 KB, patch) 2024-06-09 23:16 MDT, Peter Maxwell	Details \| Diff
Patch updated to 24.11.6 (5.95 KB, patch) 2025-08-29 02:57 MDT, Peter Maxwell	Details \| Diff
Patch now with choice of format and choice of extra tags. (7.65 KB, patch) 2025-08-29 03:08 MDT, Peter Maxwell	Details \| Diff
Updated to 25.11.4 (7.87 KB, patch) 2026-03-12 22:27 MDT, Peter Maxwell	Details \| Diff
Make ProfileInfluxDBRTPolicy optional. (580 bytes, patch) 2026-03-12 22:31 MDT, Peter Maxwell	Details \| Diff
ProfileInfluxDBDefault default of None rather than All, so matching docs. (580 bytes, patch) 2026-03-12 22:34 MDT, Peter Maxwell	Details \| Diff
Make ProfileInfluxDBRTPolicy optional. (1.15 KB, patch) 2026-03-12 22:35 MDT, Peter Maxwell	Details \| Diff
No empty message to InfluxDB at end of an un-profiled job. (493 bytes, patch) 2026-03-12 22:36 MDT, Peter Maxwell	Details \| Diff
Optional new normalised format for data sent to InfluxDB. (4.89 KB, patch) 2026-03-12 22:36 MDT, Peter Maxwell	Details \| Diff
Use names for special steps such as batch and extern. (1.31 KB, patch) 2026-03-12 22:37 MDT, Peter Maxwell	Details \| Diff
Represent InfluxDB uints and floats as such. (885 bytes, patch) 2026-03-12 22:39 MDT, Peter Maxwell	Details \| Diff
Allow cluster, user, or uid as extra InfluxDB tags. (2.10 KB, patch) 2026-03-12 22:39 MDT, Peter Maxwell	Details \| Diff
Show Obsolete (8) Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Peter Maxwell 2020-02-08 03:28:22 MST

Created attachment 12982 [details]
Patch which improves acct_gather_profile/influxdb data format

While trying out acct_gather_profile/influxdb I was disappointed to see that its output puts each field on a line of its own, all with the same field name of "value".  That isn't the usual InfluxDB way, which would be to put all the fields of one sample (8 fields in the case of the "task" profile) on the same line, since they all share a timestamp and come from the same source. 

This current format is inefficient and causes problems downstream. Lech Nieroda described the impact on InfluxDB on slurm-users (https://www.mail-archive.com/slurm-users@lists.schedmd.com/msg04487.html) and independently suggested the same change as I am describing here. 

I plan to use  https://docs.timescale.com/latest/tutorials/telegraf-output-plugin instead of InfluxDB, , but the current format would cause similar problems there, creating a separate table for each field rather than a single multi-field table for each profile type. 

So, the attached patch produces normalised (in the database schema sense) output. And also changes a couple of things which flow on from that: 
 - Parts of the output which stay constant over time are moved out of acct_gather_profile_p_add_sample_data and are instead stored in table->name, renamed to table->prefix.
 - Translate the negative step numbers into "batch" and "extern" like sacct does.
 - Since the best choice of InfluxDB "measurement" name was not obvious to me I made that customisable, with "%P" expanding to the profile name ("Task", "Energy" etc.) and "%p" to the lowercase equivalent. Defaulting to "%p_profile".  That bit might be overkill.

Comment 1 Andrew Elwell 2020-02-23 21:53:42 MST

Thanks for submitting this patch - I'm about to try and run this up on some systems at Pawsey (I'd seen Lech's comments on the mailing list and was about to try and turn it into a patch. 

Just one wishlist item - Do you know if $SLURM_CLUSTER_NAME is available to the plugin? reason being it'd be really nice to have an additional tag cluster=$SLURM_CLUSTER_NAME to be able to use as a variable 

Many thanks

Andrew

Comment 2 Peter Maxwell 2020-02-24 00:03:52 MST

Created attachment 13139 [details]
Improved version of the patch, includes cluster tag.

This second version of my patch leaves out some redundant code accidentally included in the first version, adds a cluster tag as suggested by Andrew Elwell, and stops the current behaviour where acct_gather_profile_p_task_end needlessly contacts influxdb to send it no data when profile=None.

Comment 3 Andrew Elwell 2020-02-24 05:05:42 MST

(I suspect this should be spun off as a separate patch/bug rather than confusing this unit of work)

One minor niggle with this plugin is that defining ProfileInfluxDBRTPolicy is compulsory

> fatal("No ProfileInfluxDBRTPolicy in your acct_gather.conf file. This is required to use the %s plugin",

I feel it should be optional, as if it's not definined in the API, it falls back to default

see https://docs.influxdata.com/influxdb/v1.7/guides/writing_data/
> When writing points, you must specify an existing database in the db query parameter. 
> Points will be written to db’s default retention policy if you do not supply a retention policy via the rp query parameter. 
> See the InfluxDB API Reference documentation for a complete list of the available query parameters.

Comment 4 Andrew Elwell 2020-07-15 07:01:45 MDT

Gentle poke to Tim to see if this can be included in the 20.11 release (given that it's a breaking change to include in a point-relese mid 20.02 cycle)

It'd be nice not to have to merge changes in locally at build time :-)

Comment 5 Kevin Buckley 2020-12-03 18:15:28 MST

Another gentle poke, given that my attempts to re-write
the patch for 20.11 "didn't quite work" (tm).

(See https://bugs.schedmd.com/show_bug.cgi?id=10344#c2)

In my defence, I merely assumed that it would be the changes
SchedMD made to the g_job struct that would need need to be
rolled forwards: clearly not!

Comment 6 Kevin Buckley 2020-12-03 18:25:09 MST

Created attachment 16967 [details]
Attempted rewrite of patch for 20.11.0 - does not work

Attaching here after Jason Booth suggested that we do so.

Initially applying the orignal patch saw the compilation
fail, because of changes to the g_job struct.

This patch merely alters the g_job struct members so that
compilation does complete.

When deployed within 20.11.0, we saw failures of jobs to
launch,

Removing the patch saw jobs launch.

Suggestion is that more changes to the patch rae required
for 20.11.0.

HTH

Comment 7 Peter Maxwell 2024-06-09 23:15:52 MDT

Created attachment 37049 [details]
Patch updated to 23.11.7

Patch updated to 23.11.7.  
Also makes ProfileInfluxDBRTPolicy optional. 
Also makes the default for ProfileInfluxDBDefault be None rather than All, so matching docs.

Comment 8 Peter Maxwell 2024-06-09 23:16:51 MDT

Created attachment 37050 [details]
Patch further updated to 24.05.0

Comment 10 Alejandro Sanchez 2025-08-12 08:16:26 MDT

Hi,

Thanks for the contribution proposal and sorry for the late reply on this.

I agree it'd be better that fields of the same type of profiling be written all together to the same time series measurement point, instead of the current design approach, which looking back wasn't the best thought.

If this change is done though, sites with written data older to the change willing to query data cross pre/post the change would have their queries and backward-compatibility disrupted. It is feasible to adapt queries to do cross-data matches with something like this:

> select * from RSS,task_profile where (value=192 OR RSS=192);

Because before the patch fields were written as measurements and the only field key was "value", and after the patch a single measurement of the same type would have as many fields as original measurements.

Even if this change would be announced in our CHANGELOG, it is quiet disruptive and I'd need to discuss internally if we're willing to do that, specially due to other sites already using the plugin before this change.

I like the idea of naming the measurements like:

"task", "network", "energy", etc.

rather than "task_profile", "network_profile", "energy_profile". If we do this, perhaps there's no need to rename the table_t struct "name" field to "prefix" either.

Also can you elaborate on the value of the new ProfileInfluxDBMeasurementName option to acct_gather.conf? On my testing with your contribution, if set, then all fields from all types of profiling would end up inside that measurement, instead of being split into the different "task", "network", "energy", etc. measurements.

I have the feeling that we either do the split always or put all fields of all kinds into a single measurement, but I think the split would be better, and would avoid potential field key naming conflicts. What would be the value of having the possibility of both? If configuration changes we would end up with data again landing in different layouts.

I like the lowercase for the naming convention, as seems aligned with InfluxDB best practices.

The change to make ProfileInfluxDBRTPolicy optional instead of required with a fatal() if not present looks reasonable, due to the write operation falling back to the databse default. I'm good with this bit.

The change to make ProfileInfluxDBDefault=NONE be the default also looks reasonable, I guess it was motivated to be aligned with the acct_gather.conf documentation, where it says it is the default, but the code would default to ALL.

There are also various changes to tag key sets, and tag values:

1. You made it so that "Task" kind of profile measurements will include a "task" tag key. But the other kind of profiles "network", "energy", etc. this "task" key is removed (all callers of acct_gather_profile_g_create_dataset() with NO_PARENT argument, which pass the name of the kind of profiling). The "Task" kind of profiling passes the taskid as name argument.

This looks reasonable, but again affects potential sites relying on existing queries and workflows.

2. You changed the "step" tag value to use "batch" for batch steps, "extern" for extern steps, etc.

Also looks reasonable, but again affects potential sites relying on existing queries and workflows.

Also this is missing "interactive" steps and the case of SLURM_PENDING_STEP. We have functions like log_build_step_id_str() that automatically handle the cases formatting. In this case probably with the STEP_ID_FLAG_NO_PREFIX flag.

3. The cluster tag has been added. In ticket 23254 they also requested to add the step username as a tag for group by purposes...

While again this looks reasonable, this changes the tag key set for existing data and affects existing queries and workflows. Also keep in mind that tags are indexed, and adding new tags might affect performance and memory consumption. Not sure if it'd be better to have a separate database for each cluster instead of adding a new cluster tag... username seems more reasonable...

In summary, there are good ideas worth considering, but I feel they should be discussed further.

Also, for ease of reviewing, and as documented in CONTRIBUTING, please break patches up into logically separate chunks, while ensuring that each patch can still be compiled.

While not required, we encourage use of git format-patch to generate the patch. This ensures the relevant author line and commit message stay attached.

Thansk.

Comment 11 Kevin Buckley 2025-08-12 20:49:16 MDT

>If this change is done though, sites with written data older to the
>change willing to query data cross pre/post the change would have
>their queries and backward-compatibility disrupted. 

Possibly (given that's been years since I could even recall that
I had added to this conversation) overthinking this, but, might
it be possible to support both the "old" and "new" formats, with
the choice being made by using a config variable that would need
to be added into Slurm, at least until the old format finally got
deprecated?

That mght afford sites, using the old format, the opportunity to
contunue to do so, whilst they look to swap over to the new format,
without any immediate "re-tooling" being required.

Comment 12 Peter Maxwell 2025-08-17 16:13:46 MDT

If ProfileInfluxDBMeasurementName is set to something which does not contain "%P" or "%p" then yes, fields from all types of profiling would end up inside that one measurement. That might be enough reason to scrap ProfileInfluxDBMeasurementName. I'm OK with just using the lowercase dataset name, equivalent to forcing ProfileInfluxDBMeasurementName="%p".

I agree that it would be possible and desirable to support both formats, at least for a while. So in my very latest version of the patch I have code to produce either format (necessitating the split of my table_t struct "prefix" field into "name" and "tags" fields) and a boolean config option "ProfileInfluxDBNormalized".  I'll attach that later as it is totally untested at present.

The same goes for the other format changes which have accumulated here - they can be made optional so as to allow backward compatibility.

An additional boolean config option could select between raw step numbers or passing them through log_build_step_id_str() to get step names. ProfileInfluxDBStepNames?

Those two boolean option could initially default to false, then true in a later release, and maybe be phased out after a few years.

Inclusion of the cluster name, user name, and any other additional tags anyone requests later (uid?, account?) could be selected via a more permanent option done in the style of a Slurm "Parameters" or "Flags" option: a comma delimited string which can contain "cluster" and/or "user". "ProfileInfluxDBExtraTags"?

Comment 13 Alejandro Sanchez 2025-08-19 09:01:38 MDT

(In reply to Peter Maxwell from comment #12)
> If ProfileInfluxDBMeasurementName is set to something which does not contain
> "%P" or "%p" then yes, fields from all types of profiling would end up
> inside that one measurement. That might be enough reason to scrap
> ProfileInfluxDBMeasurementName. I'm OK with just using the lowercase dataset
> name, equivalent to forcing ProfileInfluxDBMeasurementName="%p".

All right, thanks.
 
> I agree that it would be possible and desirable to support both formats, at
> least for a while. So in my very latest version of the patch I have code to
> produce either format (necessitating the split of my table_t struct "prefix"
> field into "name" and "tags" fields) and a boolean config option
> "ProfileInfluxDBNormalized".  I'll attach that later as it is totally
> untested at present.

Let me discuss how much we want to support multiple formats internally and we'll come back to you.

> The same goes for the other format changes which have accumulated here -
> they can be made optional so as to allow backward compatibility.
> 
> An additional boolean config option could select between raw step numbers or
> passing them through log_build_step_id_str() to get step names.
> ProfileInfluxDBStepNames?

And discuss this as well.

> Those two boolean option could initially default to false, then true in a
> later release, and maybe be phased out after a few years.

Perhaps it'd be better to create a ProfileInfluxDBFlags or Params and add booleans governing behavior/formats there, if at all.
 
> Inclusion of the cluster name, user name, and any other additional tags
> anyone requests later (uid?, account?) could be selected via a more
> permanent option done in the style of a Slurm "Parameters" or "Flags"
> option: a comma delimited string which can contain "cluster" and/or "user".
> "ProfileInfluxDBExtraTags"?

That's something I need to discuss too.

Comment 14 Peter Maxwell 2025-08-19 18:15:17 MDT

Not a big deal, but here is one more minor thing to consider changing: the number formats. According to https://docs.influxdata.com/influxdb/v1/write_protocols/line_protocol_reference/ integers should be suffixed with "i" to be recognised as such, and floats don't need to contain a decimal point.

So that probably explains why I only see (double precision) floats in my database. Which has been fine actually. 

For the floats, "%.15g" might be more appropriate than the current "%.2f", as that would avoid a loss of precision on any very small numbers and an excessive length of any very large numbers.

Comment 15 Peter Maxwell 2025-08-29 02:57:31 MDT

Created attachment 42963 [details]
Patch updated to 24.11.6

Comment 16 Peter Maxwell 2025-08-29 03:08:05 MDT

Created attachment 42964 [details]
Patch now with choice of format and choice of extra tags.

This version follows from recent discussion on the ticket. It removes the option InfluxDBProfileName but adds options InfluxDBNewFormat (boolean) and InfluxDBExtraTags (string). The "new" format is normalised, uses the special step names (as before, but now via Slurm library code), and now also uses different numerical formatting of the values. The "old" format is intended to be the same as without the patch. InfluxDBExtraTags applies to either and can include any combination of "cluster", "uid", and "user".

Comment 17 Peter Maxwell 2026-03-12 22:27:33 MDT

Created attachment 44693 [details]
Updated to 25.11.4

Equivalent to previous versions of this patch, plus using "u" instead of "i" so as to let InfluxDB know that the ints are unsigned ones, and a couple of minor fixes in acct_gather_profile_p_conf_values.

Comment 18 Peter Maxwell 2026-03-12 22:31:12 MDT

Created attachment 44694 [details]
Make ProfileInfluxDBRTPolicy optional.

Comment 19 Peter Maxwell 2026-03-12 22:34:37 MDT

Created attachment 44695 [details]
ProfileInfluxDBDefault default of None rather than All, so matching docs.

Comment 20 Peter Maxwell 2026-03-12 22:35:10 MDT

Created attachment 44696 [details]
Make ProfileInfluxDBRTPolicy optional.

Comment 21 Peter Maxwell 2026-03-12 22:36:14 MDT

Created attachment 44697 [details]
No empty message to InfluxDB at end of an un-profiled job.

Comment 22 Peter Maxwell 2026-03-12 22:36:59 MDT

Created attachment 44698 [details]
Optional new normalised format for data sent to InfluxDB.

Comment 23 Peter Maxwell 2026-03-12 22:37:48 MDT

Created attachment 44699 [details]
Use names for special steps such as batch and extern.

Comment 24 Peter Maxwell 2026-03-12 22:39:19 MDT

Created attachment 44700 [details]
Represent InfluxDB uints and floats as such.

Comment 25 Peter Maxwell 2026-03-12 22:39:49 MDT

Created attachment 44701 [details]
Allow cluster, user, or uid as extra InfluxDB tags.

Comment 26 Peter Maxwell 2026-03-12 22:49:21 MDT

In addition to updating my patch for 25.11 I have also updated it for the current master and uploaded that version here as a sequence of 7 smaller patches.

Of those 7, the first 3 are small and independent fixes which were discovered along the way. The 4th introduces the new format which is the main point of this ticket, and then the 5th and 6th make further improvements to that new format. The final one (extra tags) applies to both formats.