Ticket 1099 - Improvement : ask for a way to dump the active configuration with a scontrol command
Summary: Improvement : ask for a way to dump the active configuration with a scontrol ...
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 14.11.x
Hardware: Linux Linux
: 5 - Enhancement
Assignee: Moe Jette
QA Contact:
URL:
: 1100 (view as ticket list)
Depends on:
Blocks:
 
Reported: 2014-09-11 05:51 MDT by doug.parisek
Modified: 2014-09-18 11:29 MDT (History)
3 users (show)

See Also:
Site: Atos/Eviden Sites
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 14.11.0-pre5
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
patch file (25.80 KB, application/octet-stream)
2014-09-11 05:51 MDT, doug.parisek
Details
example of generated slurm.conf (6.33 KB, application/octet-stream)
2014-09-11 05:57 MDT, doug.parisek
Details
Remove colons from the pathname & comment out DefMemPerNode=UNLIMITED (1.79 KB, patch)
2014-09-18 06:15 MDT, doug.parisek
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description doug.parisek 2014-09-11 05:51:14 MDT
Created attachment 1224 [details]
patch file

original text from customer :

The customer (University-StAndrews) had some problems with changes getting lost (see PARM 19311). During discussions
with him about the issue, he has requested that SLURM be improved so that there is an scontrol command that will create
a good SLURM configuration file from the running configuration ( a bit like in PBS Pro). This would mean that he could
use scontrol to change his configuration on the fly, and when happy generate good configuration files rather than have
to create the configuration file and do potentially many restarts to re-read the file (scontrol reconfigure)before he
gets it correct.

===============================================================================
Created new scontrol option:  “scontrol write config”

Uses same slurm functions used by “scontrol show config” for gathering key/value pairs as stored in the slurm_ctl_conf structure.  Also gathers node and partition info from the node_info and partition_info structures respectively.

Uses slurm hostlist functions for generating a hostlist format for nodes that have like attributes for writing node information to the generated conf file.  I am not writing all the possible node attributes on the node definition line. I have essentially picked certain ones. Comments?   Of course user can manually edit these after the fact.

Since many of the values within the key/value pairs are too verbose to be used in an actual slurm.conf they are altered.   For example “UNLIMITED” is changed to “0” (per slurm.conf man page, but perhaps UNLIMITED is ok?), SUSPEND_TIME=NONE is changed to SUSPEND_TIME=0.  NULL values, empty values and N/A values are commented out. KeepAliveTime=SYSTEM_DEFAULT is also commented out (didn’t know a value for SYSTEM_DEFAULT).   Note:  If “DefMemPerCPU” is set to UNLIMITED in the conf file then the value is stored as “2147483647”. Slurm.conf man page says this value is in megabytes – so that value can cause problems. I am setting it to “0” if it is that value.  Comments?

I am grouping key/value pairs in the generated conf file according to perceived groups (accounting, power, logs, etc.)

Attached is a conf file generated by this new command.
Comment 1 doug.parisek 2014-09-11 05:57:35 MDT
Created attachment 1225 [details]
example of generated slurm.conf

Example of a slurm.conf generated by 'scontrol write config'
Comment 2 Nancy 2014-09-11 06:06:08 MDT
*** Ticket 1100 has been marked as a duplicate of this ticket. ***
Comment 3 Moe Jette 2014-09-11 08:39:23 MDT
I made some format changes for what I believe is better readability plus replacing spaces in some places with tabs.

The xfree() function does nothing if the input value is NULL, so there is no reason to check first for a non-zero value.

In slurm_write_ctl_conf(), the value tmp_str was xfreed, but not initialized. This was the only significant flaw that I saw in the code.

I also restructured the code in slurm_write_ctl_conf() a bit to eliminate a redundant memory allocation and copy of tmp_str.

Most of the Slurm fields are unsigned, so they should be printed using %u rather than %d.

There was a bug in the "scontrol show config" that would report DefMemPerCPU=UNLIMITED as 2147483647. This was a bug due to how INFINITE was being interpreted.

“UNLIMITED” is valid input for all of the numbers, so I removed your logic which changed those values to "0".

KeepAliveTime=SYSTEM_DEFAULT means the value being used is per the Linux kernel and Slurm does not have a value configured. Commenting it out, as you have done, is probably the best solution.

I make some global functions local since they are not used externally to config_info.c.

I believe that we can add additional fields later if needed, but this seems to cover almost everything now
Comment 4 Moe Jette 2014-09-11 10:29:35 MDT
The main commit is here:
55766c11db5f82148b0cab6a3199ce64a793e911

My enhancements in commits here:
9cabcea2e420ed8ffd54a898ab54c93956346011
57345149b059dfbf64c1f539bba682064b7c2972
Comment 5 doug.parisek 2014-09-18 06:15:46 MDT
Created attachment 1246 [details]
Remove colons from the pathname & comment out DefMemPerNode=UNLIMITED

1. Generated conf filename was in format of "slurm.conf.2014-09-18T12:30:00". The colons in the time portion of the filename prevent MS-Windows from displaying the filename properly (we use a Samba mount on Windows to view the Linux file system). Therefore this patch removes the colons in the filename.

2. Comment out the "DefMemPerNode=UNLIMITED" key/value pair in the generated conf file because we cannot srun any jobs when it is defined. The error is:

srun: Force Terminated job 158
srun: error: Memory specification can not be satisfied
srun: error: Unable to allocate resources: Requested node configuration is not available
Comment 6 Danny Auble 2014-09-18 08:46:07 MDT
Thanks Doug. I am wondering about the removal of the :'s.  Many other files we make have this formatted timestamp, with the :'s removed the file name looks sort of strange now.

What does it look like in Windows?  Is this change really relevant to normal users?

The other part of the patch makes sense, in this case UNLIMITED means the parameter wasn't set so I removed the comment about it not working, it is in commit aa9735b6f92bbf9ef8a4b13d62d737710dbdd41d
Comment 7 Danny Auble 2014-09-18 08:55:17 MDT
Hum, I see, on Windows it doesn't appear you can even make a file with a ':' in it.  That is an interesting limitation I didn't know existed :).  So while I think it looks strange I'll add it.  commit 1b65f61321e894c30430f8e1d001d2182e6e57fe
Comment 8 doug.parisek 2014-09-18 09:01:36 MDT
I thought it looked strange too. At first I though of changing the colons to dashes but that didn't look right either - being the same format as the date portion. Not being able to decide I just went with removing the colons.
Comment 9 Danny Auble 2014-09-18 09:07:45 MDT
Yeah, I thought about a '.' and a '-' as well, but that still didn't look right.  Looks like we came to the same conclusion.

When you create it does it only look bad in Windows or also when you look at it on the linux box?

What I see is no wackiness in either place.  I just made a file in linux on my samba share with normal timestamp and it looks just fine on linux and when I look at it on Windows it looks just fine as well.

What exactly are you experiencing?
Comment 10 doug.parisek 2014-09-18 09:23:57 MDT
Both the Windows Filemanager view and the CommandPrompt view show it as:
	09/18/2014  02:14 PM      6,258 SL0S8~U

On Linux the same file looks like:
	-rw-rw-r-- 1 slurm slurm 6258 Sep 18 14:14 /app/slurm/dhp/install/etc/slurm.conf.2014-09-11T10:09:57
Comment 11 Danny Auble 2014-09-18 09:29:29 MDT
Weird, on my windows 7 box it looks normal in windows explorer or with cmd.  It will not let me rename the file to something that still contains the :'s but it doesn't look strange.  What version of windows is your filesystem running?  I am wondering if this is a samba issue or something.

I am still wondering how big of an issue this is.
Comment 12 doug.parisek 2014-09-18 09:47:50 MDT
I have Windows 7 Pro SP1.  It was reported to me by another user who uses the samba share often.
Comment 13 Danny Auble 2014-09-18 09:49:32 MDT
Have you seen the anomaly yourself?  I am wondering why I don't see the same behavior.
Comment 14 doug.parisek 2014-09-18 09:51:20 MDT
Yes, I see the same issue as the other user.
Comment 15 Danny Auble 2014-09-18 09:54:19 MDT
My guess is an incorrect samba share setting, but we have wasted too much time worrying about a small issue.  Lets just keep it with the file looking strange with your patch and not waste anymore time.  It is clear in my mind the problem can be resolved in windows/samba somewhere though since I don't see the issue at all and we appear to have similar setups.
Comment 16 doug.parisek 2014-09-18 11:08:54 MDT
Danny, I experimented with a samba option I found researching the web. Added:

   mangled names = no

in the [global] section of the samba smb.conf file, restarted samba and now the filenames with colons are displayed correctly!

I vote to undo the colon removal portion of the patch and leave the filename alone.
Comment 17 Danny Auble 2014-09-18 11:16:11 MDT
fantastic :).  I'll remove the commit and update the comment just in case some one else has issues.
Comment 18 Danny Auble 2014-09-18 11:29:51 MDT
Reverted, thanks for taking the time to figure out the samba settings :).