Created attachment 189 [details] propose fix [root@cluster] # scontrol reconfig scontrol: error: Parse error in file /release/bull/automate/slurm_automate.conf line 5: "07,6252-6263,6265-6266,6269,6288-6301,6303-6323,6482-6485,6526-6532,6534-6539]" scontrol: error: "Include" failed in file /etc/slurm/slurm.conf line 168 scontrol: fatal: something wrong with opening/reading conf file slurm_automate.conf file is used for declare the nodesets in all the partitions (production, testing, maintenance...). In a installation/stabilisation period the nodesets can be very larges caused by many holes, and cause the line contains more than 4096 chars.
Created attachment 190 [details] problem config file Customer's config file.
I will take core of this issue. David
I did some performance measures and it looks like the malloc()/free() has indeed little impact. The time is probably spent in parsing the file. Dynamic memory. usec=30700 usec=29435 usec=29078 usec=31375 usec=31386 average=30394.8 4MB static buffer usec=30159 usec=30102 usec=26012 usec=30159 usec=34238 average=30134.0 Perhaps the patch could be a little simpler by avoiding one variable. diff --git a/src/common/parse_config.c b/src/common/parse_config.c index 27f756e..7cb386b 100644 --- a/src/common/parse_config.c +++ b/src/common/parse_config.c @@ -69,8 +69,6 @@ strong_alias(s_p_hashtbl_create, slurm_s_p_hashtbl_create); strong_alias(s_p_hashtbl_destroy, slurm_s_p_hashtbl_destroy); strong_alias(s_p_parse_file, slurm_s_p_parse_file); -#define BUFFER_SIZE 4096 - #define CONF_HASH_LEN 26 static regex_t keyvalue_re; @@ -870,8 +868,8 @@ int s_p_parse_file(s_p_hashtbl_t *hashtbl, uint32_t *hash_val, char *filename, bool ignore_new) { FILE *f; - char line[BUFFER_SIZE]; char *leftover = NULL; + char *line; int rc = SLURM_SUCCESS; int line_number; int merged_lines; @@ -899,9 +897,10 @@ int s_p_parse_file(s_p_hashtbl_t *hashtbl, uint32_t *hash_val, char *filename, return SLURM_ERROR; } + line = xmalloc(sizeof(char) * stat_buf.st_size); line_number = 1; while ((merged_lines = _get_next_line( - line, BUFFER_SIZE, hash_val, f)) > 0) { + line, stat_buf.st_size, hash_val, f)) > 0) { /* skip empty lines */ if (line[0] == '\0') { line_number += merged_lines; @@ -937,13 +936,14 @@ int s_p_parse_file(s_p_hashtbl_t *hashtbl, uint32_t *hash_val, char *filename, line_number += merged_lines; } + xfree(line); fclose(f); return rc; } Thanks, David
Created attachment 191 [details] testing slurm.conf file
I just checked the logic to include other files and it recursively calls s_p_parse_file() (see the function _parse_include_directive()), so I think we should just include this patch in the next release of v2.5 and remove the description of the line size limitations from the man pages for all of the configuration files. Thank you for the patch!
Your patch, plus the man page changes, are in the commit shown below. Thanks again. https://github.com/SchedMD/slurm/commit/97a5fa2fe7509f3847e23bda48f5c8c40b0db695