Ticket 15953

Summary: cli_filter.lua: missing "--time" results in options["time"] == "2982616-04:14:00"
Product: Slurm Reporter: Bjørn-Helge Mevik <b.h.mevik>
Component: User CommandsAssignee: Carlos Tripiana Montes <tripiana>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 22.05.6   
Hardware: Linux   
OS: Linux   
Site: University of Oslo Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 23.11.0rc1 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Bjørn-Helge Mevik 2023-02-06 05:55:16 MST
During development of a cli_filter.lua setup, we discovered that if one submits a job with no --time option, then in the cli_filter.lua script, options["time"] equals "2982616-04:14:00".  We would have expected it to be slurm.NO_VAL (or nil or "").

With
   opt_json = slurm.json_cli_options(options)
   slurm.log_debug("slurm_cli_pre_submit: Running with following options set: %s", opt_json)
   slurm.log_debug("time: %s", options["time"])
inside the function slurm_cli_pre_submit function, we get this when submitting a job without --time:

$ sbatch -A nn9999k --wrap='sleep 30' -vv
sbatch: debug:  lua: slurm_cli_pre_submit: Running with following options set: {"account":"nn9999k","verbose":"2","wrap":"sleep 30","argv":[]}
sbatch: debug:  lua: time: 2982616-04:14:00

We see this on both slurm 21.08.7 and 22.05.6.

The time value turns out to equal 4294967294 minutes, which is the value of slurm.NO_VAL.
In the job_submit.lua script, the value for a missing --time option is indeed slurm.NO_VAL, and it would be very nice if it could be so in cli_filter.lua as well.  "2982616-04:14:00" is less intuitive, and more prone to typos. :)
Comment 1 Carlos Tripiana Montes 2023-02-06 08:27:12 MST
Hi,

Reverting 2982616-04:14:00 back to a number is indeed:

2982616 days + 4 hours + 14 mins = 4294967294 mins

Number 4294967294 is 0xFFFFFFFE, which equals to NO_VAL (see slurm.h):

#define NO_VAL     (0xfffffffe)

So, at the end of the day, they're the same, as you said. I guess we lack from delivering just the proper object slurm.NO_VAL, rather than the raw, internal, slurm value.

I'll check how to fix that.

Thanks,
Carlos.
Comment 12 Carlos Tripiana Montes 2023-03-02 08:25:20 MST
Hi!

We have landed commit 9f41a5eb08 in master and 23.02 branches, and now your issue should be away. The fix will be included in next 23.02.1 release.

The idea is that, now, time is exactly what conceptually it is, the empty string. Because we are talking about the string value for the client command flag "--time", which isn't set. And can be easily checked in lua like:

	if options["time"] == ""
	then

Obviously, slurm.NO_VAL, makes more sense when we are talking about the job record, but this is when we are on the controller side, no the client side while handling the parsing of the job's parameters.

Closing now as resolved/fixed.

Regards,
Carlos.
Comment 17 Carlos Tripiana Montes 2023-03-03 01:30:29 MST
Good morning,

First of all, I want to apologise for a last minute change in our commits, as we have finally decided not to include the change in the 23.02 branch. We found a user case in a customer site in which such change would potentially break their scripts. Our policy is to avoid such breaking changes in a maintenance release.

We're internally deciding if we finally opt between "" or nil too. This is a minor difference for a lua script, but this comes from the internals of slurm. And "" was straightforward. Nevertheless, we'll inform you of the final results ASAP.

In the meantime, or for older versions, you can workaround this issue with something like:

  if time == "2982616-04:14:00" then
    time = nil
  end

Have a good day,
Carlos.
Comment 30 Carlos Tripiana Montes 2023-03-14 05:55:25 MDT
Hi!

We have landed commit 21707eb039 in master, and now your issue should be away. The fix will be included in next major release. In the meantime, the workaround explained in Comment 17 should suffice. We have finally decided to opt for nil value rather than "", to be more Lua-ish compliant.

Closing now as resolved/fixed.

Regards,
Carlos.