The vxstrfmt() function is written such that a first pass at the fmt string passed to it is attempted with a mixture of SLURM log-specific sequences (e.g. %M, %t) as well as short-circuiting of some standard printf() sequences (e.g. %s, %d, %%). Any unhandled % sequence is noted, and if present a second pass is made using vsnprintf() using the string produced by the first pass and the remainder of the variable argument list. If the format string introduces additional % sequences into the string produced in the first pass and there are any unhandled % sequences triggering the second pass with vsnprintf(), then the variable argument list will be overrun by vsnprintf() (possibly producing a segfault, etc.) and the resulting string will be garbled. I discovered this while writing a job submission plugin for SGE compatibility (scans the job script for #$ lines and translates to SLURM job_descriptor settings). I was logging the std_out filename produced from #$ -o $HOSTNAME-$USER-$JOB_ID.out The plugin turns this into the SLURM path "%N-%u-%j.out" and my code was logging this via an info() call e.g.: char variant = 'o'; char *path = job_submit_parse_path(s); : info(PLUGIN_SUBTYPE ": stdio path \"%s\" from -%c option", path, variant); The output in the slurmctld log was [2017-08-30T13:54:18.956] udhpc: stdio path "test.%N.111.out" from -^P option The vxstrfmt() function handles the %s but not the %c sequence, so the second pass call to vsnprintf() is evoked as vsnprint(..., "[2017-08-30T13:54:18.956] udhpc: stdio path "test.%N.%u.out" from -%c option", ap); All that remains in the vararg list is the character 'o' (ASCII 111). This bug likely exists in all versions of SLURM.
Created attachment 5183 [details] _xstrdup_vprintf() function rewrite
Created attachment 5186 [details] vxstrfmt() function rewrite
Another problem: since vxstrfmt() itself replaces '%%' sequences with '%', it can introduce additional format sequences into the string it hands to the second pass. E.g. info(PLUGIN_SUBTYPE ": %%stdio path \"%s\" from -%c optio%%n", path, variant); would have the second pass in vxstrfmt() calling vsnprintf() as vsnprint(..., "[2017-08-30T13:54:18.956] udhpc: %stdio path "test.%N.%u.out" from -%c optio%n", ap); This example is especially heinous because the first pass produces a '%n' sequence, which will cause vsnprintf() to write to an arbitrary location in memory. There's no reliable way to pass a partially-expanded format string to vsnprintf(). The only remotely plausible solution would be to handle each unhandled format sequence as it occurs, not en masse at the end. This would also remove the 1024-character limit on log lines that included unhandled format sequences. I've included a rewritten (and tested) vxstrfmt() function on this bug as an attachment. Additionally, the behavior of the _xstrdup_vprintf() function w.r.t. the va_list passed to it is not consistent with the standard behavior for v*printf() functions. Those functions consume arguments from the va_list passed to them and do not call va_end() on it, so the caller can expect that a post condition of calling v*printf() is that the va_list passed is now pointing to the first argument beyond those consumed by v*printf(). I've included a rewritten (and tested) _xstrdup_vprintf() function on this bug as an attachment to address this (also optimized memory consumption for small strings < 100 characters). Is there a reason xstrdup_vprintf() isn't exported outside xstring.c? It seems a natural choice to accompany xstrdup_printf() in the API.
Created attachment 5251 [details] Additional changes to previously-uploaded _xstrdup_vprintf() function Previous rewrite of _xstrdup_vprintf() used xstrdup() to return a copy of strings that fit in the on-stack buffer. Calling xstrdup() requires that function do a strlen() to determine how large to make the allocated buffer. Since _xstrdup_vprintf() already knows how long the string is, it makes sense to allocate and fill-in the copy right inside that function, without calling xstrdup().
Hi Thanks for your report and contribution. After reviewing this code, we changed formatting and fixed some issues. It is committed as https://github.com/SchedMD/slurm/commit/5e5b1159273374b47ab35a8681ee7577cdf32f8b Dominik
Please see https://bugs.schedmd.com/show_bug.cgi?id=4469 to fix the problems in log.c that this patch created. I attached a patch there.