Description
Peter Maxwell
2020-02-08 03:28:22 MST
Thanks for submitting this patch - I'm about to try and run this up on some systems at Pawsey (I'd seen Lech's comments on the mailing list and was about to try and turn it into a patch. Just one wishlist item - Do you know if $SLURM_CLUSTER_NAME is available to the plugin? reason being it'd be really nice to have an additional tag cluster=$SLURM_CLUSTER_NAME to be able to use as a variable Many thanks Andrew Created attachment 13139 [details]
Improved version of the patch, includes cluster tag.
This second version of my patch leaves out some redundant code accidentally included in the first version, adds a cluster tag as suggested by Andrew Elwell, and stops the current behaviour where acct_gather_profile_p_task_end needlessly contacts influxdb to send it no data when profile=None.
(I suspect this should be spun off as a separate patch/bug rather than confusing this unit of work) One minor niggle with this plugin is that defining ProfileInfluxDBRTPolicy is compulsory > fatal("No ProfileInfluxDBRTPolicy in your acct_gather.conf file. This is required to use the %s plugin", I feel it should be optional, as if it's not definined in the API, it falls back to default see https://docs.influxdata.com/influxdb/v1.7/guides/writing_data/ > When writing points, you must specify an existing database in the db query parameter. > Points will be written to db’s default retention policy if you do not supply a retention policy via the rp query parameter. > See the InfluxDB API Reference documentation for a complete list of the available query parameters. Gentle poke to Tim to see if this can be included in the 20.11 release (given that it's a breaking change to include in a point-relese mid 20.02 cycle) It'd be nice not to have to merge changes in locally at build time :-) Another gentle poke, given that my attempts to re-write the patch for 20.11 "didn't quite work" (tm). (See https://bugs.schedmd.com/show_bug.cgi?id=10344#c2) In my defence, I merely assumed that it would be the changes SchedMD made to the g_job struct that would need need to be rolled forwards: clearly not! Created attachment 16967 [details]
Attempted rewrite of patch for 20.11.0 - does not work
Attaching here after Jason Booth suggested that we do so.
Initially applying the orignal patch saw the compilation
fail, because of changes to the g_job struct.
This patch merely alters the g_job struct members so that
compilation does complete.
When deployed within 20.11.0, we saw failures of jobs to
launch,
Removing the patch saw jobs launch.
Suggestion is that more changes to the patch rae required
for 20.11.0.
HTH
Created attachment 37049 [details]
Patch updated to 23.11.7
Patch updated to 23.11.7.
Also makes ProfileInfluxDBRTPolicy optional.
Also makes the default for ProfileInfluxDBDefault be None rather than All, so matching docs.
Created attachment 37050 [details]
Patch further updated to 24.05.0
|