| Summary: | slurm does not handle config line larger than 4096 chars | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Puenlap Lee <puen-lap.lee> |
| Component: | slurmctld | Assignee: | David Bigagli <david> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | CC: | da |
| Version: | 2.5.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | CEA | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: |
propose fix
problem config file testing slurm.conf file |
||
Created attachment 190 [details]
problem config file
Customer's config file.
I will take core of this issue. David
I did some performance measures and it looks like the malloc()/free() has indeed little impact. The time is probably spent in parsing the file.
Dynamic memory.
usec=30700
usec=29435
usec=29078
usec=31375
usec=31386
average=30394.8
4MB static buffer
usec=30159
usec=30102
usec=26012
usec=30159
usec=34238
average=30134.0
Perhaps the patch could be a little simpler by avoiding one variable.
diff --git a/src/common/parse_config.c b/src/common/parse_config.c
index 27f756e..7cb386b 100644
--- a/src/common/parse_config.c
+++ b/src/common/parse_config.c
@@ -69,8 +69,6 @@ strong_alias(s_p_hashtbl_create, slurm_s_p_hashtbl_create);
strong_alias(s_p_hashtbl_destroy, slurm_s_p_hashtbl_destroy);
strong_alias(s_p_parse_file, slurm_s_p_parse_file);
-#define BUFFER_SIZE 4096
-
#define CONF_HASH_LEN 26
static regex_t keyvalue_re;
@@ -870,8 +868,8 @@ int s_p_parse_file(s_p_hashtbl_t *hashtbl, uint32_t *hash_val, char *filename,
bool ignore_new)
{
FILE *f;
- char line[BUFFER_SIZE];
char *leftover = NULL;
+ char *line;
int rc = SLURM_SUCCESS;
int line_number;
int merged_lines;
@@ -899,9 +897,10 @@ int s_p_parse_file(s_p_hashtbl_t *hashtbl, uint32_t *hash_val, char *filename,
return SLURM_ERROR;
}
+ line = xmalloc(sizeof(char) * stat_buf.st_size);
line_number = 1;
while ((merged_lines = _get_next_line(
- line, BUFFER_SIZE, hash_val, f)) > 0) {
+ line, stat_buf.st_size, hash_val, f)) > 0) {
/* skip empty lines */
if (line[0] == '\0') {
line_number += merged_lines;
@@ -937,13 +936,14 @@ int s_p_parse_file(s_p_hashtbl_t *hashtbl, uint32_t *hash_val, char *filename,
line_number += merged_lines;
}
+ xfree(line);
fclose(f);
return rc;
}
Thanks,
David
Created attachment 191 [details]
testing slurm.conf file
I just checked the logic to include other files and it recursively calls s_p_parse_file() (see the function _parse_include_directive()), so I think we should just include this patch in the next release of v2.5 and remove the description of the line size limitations from the man pages for all of the configuration files. Thank you for the patch! Your patch, plus the man page changes, are in the commit shown below. Thanks again. https://github.com/SchedMD/slurm/commit/97a5fa2fe7509f3847e23bda48f5c8c40b0db695 |
Created attachment 189 [details] propose fix [root@cluster] # scontrol reconfig scontrol: error: Parse error in file /release/bull/automate/slurm_automate.conf line 5: "07,6252-6263,6265-6266,6269,6288-6301,6303-6323,6482-6485,6526-6532,6534-6539]" scontrol: error: "Include" failed in file /etc/slurm/slurm.conf line 168 scontrol: fatal: something wrong with opening/reading conf file slurm_automate.conf file is used for declare the nodesets in all the partitions (production, testing, maintenance...). In a installation/stabilisation period the nodesets can be very larges caused by many holes, and cause the line contains more than 4096 chars.