Ticket 1407

Summary: license scheduling and 'remote licenses'
Product: Slurm Reporter: Gareth <gareth.williams>
Component: ConfigurationAssignee: Brian Christiansen <brian>
Status: RESOLVED FIXED QA Contact:
Severity: 3 - Medium Impact    
Priority: --- CC: brian, da, rod
Version: 14.03.0   
Hardware: Linux   
OS: Linux   
Site: CSIRO Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 14.11.4 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Gareth 2015-01-29 15:29:43 MST
see: https://groups.google.com/forum/#!searchin/slurm-devel/license%7Csort:date/slurm-devel/0Uv2n_05ULg/xhF0O5nww_AJ

We can't get the sacctmgr setup described at http://slurm.schedmd.com/licenses.html and dynamically change what is available to work.
Comment 1 Brian Christiansen 2015-01-30 06:39:48 MST
There is some missing information in the documentation. The way you submit a job using the remote licenses, is you have to have to add the server name.

ex.
brian@compy:~/slurm$ scontrol show lic
LicenseName=stuff
    Total=200 Used=0 Free=200 Remote=no
LicenseName=pdf@slurmdb
    Total=100 Used=0 Free=100 Remote=yes

brian@compy:~/slurm$ sbatch -L pdf@slurmdb:3 ~/jobs/sleep_infini.sh 
Submitted batch job 5157

brian@compy:~/slurm$ scontrol show lic
LicenseName=stuff
    Total=200 Used=0 Free=200 Remote=no
LicenseName=pdf@slurmdb
    Total=100 Used=3 Free=97 Remote=yes


As far as "scontrol show lic" not showing the remote licenses. Can you verify that the controller is connected to the dbd? You can either look in the controller logs for connection error messages or run "sacctmgr list clusters". The cluster's ControlPort will be 0 if it's not connected.



To create a reservation on the licenses so thatthey can't be used you can create it like:
scontrol create reservation user=root starttime=now duration=120 Licenses=stuff:100 flags=license_only
Comment 2 Brian Christiansen 2015-01-30 10:28:07 MST
I believe the reason why the licenses aren't showing up is because they weren't associated with a cluster.

Will you delete the remote licenses and re-add them? There is a bug that doesn't let you associate the license with a cluster after the fact -- which we are looking into.

ex.
sacctmgr delete resource where name=matlab
sacctmgr delete resource where name=comsol


ex.
brian@compy:~/slurm$ sacctmgr add resource name=matlab count=9 type=license percentallowed=100 cluster=compy
 Adding Resource(s)
  matlab@slurmdb
   Cluster - compy      100%
 Settings
  Name           = matlab
  Server         = slurmdb
  Description    = matlab
  Count          = 9
  Type           = License
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y

brian@compy:~/slurm$ sacctmgr show resource
      Name     Server     Type  Count % Allocated ServerType 
---------- ---------- -------- ------ ----------- ---------- 
       pdf    slurmdb  License    100         100            
    matlab    slurmdb  License      9         100            

brian@compy:~/slurm$ sbatch -L matlab@slurmdb ~/jobs/sleep_infini.sh 
Submitted batch job 5169
Comment 3 Gareth 2015-02-02 17:22:29 MST
I can confirm that deleting and redefining the entries with cluster specified results in the licenses being listed.

Also I can update the count on demand.

I've not yet tested that the resources are scheduled as I'd expect but will do so.

Thanks!

ps. I'll post an update on the mailing list sooner or later.
Comment 4 Brian Christiansen 2015-02-03 07:55:03 MST
Great. We've fixed the issues with adding licenses to clusters. They are in the following commits:
https://github.com/SchedMD/slurm/commit/da8409c0615c2f05435df1b129facd96f7165653
https://github.com/SchedMD/slurm/commit/79c310375b759eedce69c85bb29cc589ffbf2fa5

The docs have also been updated:
https://github.com/SchedMD/slurm/commit/055e4e52dceaa4a352dd0e0a4476880f3659f10c
https://github.com/SchedMD/slurm/commit/df531529ebffaaac47f623f7c2dac0cbf9b5805c

Please reopen the ticket if you have any other problems.

Thanks,
Brian