| Summary: | license scheduling and 'remote licenses' | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Gareth <gareth.williams> |
| Component: | Configuration | Assignee: | Brian Christiansen <brian> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | CC: | brian, da, rod |
| Version: | 14.03.0 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | CSIRO | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 14.11.4 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Gareth
2015-01-29 15:29:43 MST
There is some missing information in the documentation. The way you submit a job using the remote licenses, is you have to have to add the server name.
ex.
brian@compy:~/slurm$ scontrol show lic
LicenseName=stuff
Total=200 Used=0 Free=200 Remote=no
LicenseName=pdf@slurmdb
Total=100 Used=0 Free=100 Remote=yes
brian@compy:~/slurm$ sbatch -L pdf@slurmdb:3 ~/jobs/sleep_infini.sh
Submitted batch job 5157
brian@compy:~/slurm$ scontrol show lic
LicenseName=stuff
Total=200 Used=0 Free=200 Remote=no
LicenseName=pdf@slurmdb
Total=100 Used=3 Free=97 Remote=yes
As far as "scontrol show lic" not showing the remote licenses. Can you verify that the controller is connected to the dbd? You can either look in the controller logs for connection error messages or run "sacctmgr list clusters". The cluster's ControlPort will be 0 if it's not connected.
To create a reservation on the licenses so thatthey can't be used you can create it like:
scontrol create reservation user=root starttime=now duration=120 Licenses=stuff:100 flags=license_only
I believe the reason why the licenses aren't showing up is because they weren't associated with a cluster.
Will you delete the remote licenses and re-add them? There is a bug that doesn't let you associate the license with a cluster after the fact -- which we are looking into.
ex.
sacctmgr delete resource where name=matlab
sacctmgr delete resource where name=comsol
ex.
brian@compy:~/slurm$ sacctmgr add resource name=matlab count=9 type=license percentallowed=100 cluster=compy
Adding Resource(s)
matlab@slurmdb
Cluster - compy 100%
Settings
Name = matlab
Server = slurmdb
Description = matlab
Count = 9
Type = License
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
brian@compy:~/slurm$ sacctmgr show resource
Name Server Type Count % Allocated ServerType
---------- ---------- -------- ------ ----------- ----------
pdf slurmdb License 100 100
matlab slurmdb License 9 100
brian@compy:~/slurm$ sbatch -L matlab@slurmdb ~/jobs/sleep_infini.sh
Submitted batch job 5169
I can confirm that deleting and redefining the entries with cluster specified results in the licenses being listed. Also I can update the count on demand. I've not yet tested that the resources are scheduled as I'd expect but will do so. Thanks! ps. I'll post an update on the mailing list sooner or later. Great. We've fixed the issues with adding licenses to clusters. They are in the following commits: https://github.com/SchedMD/slurm/commit/da8409c0615c2f05435df1b129facd96f7165653 https://github.com/SchedMD/slurm/commit/79c310375b759eedce69c85bb29cc589ffbf2fa5 The docs have also been updated: https://github.com/SchedMD/slurm/commit/055e4e52dceaa4a352dd0e0a4476880f3659f10c https://github.com/SchedMD/slurm/commit/df531529ebffaaac47f623f7c2dac0cbf9b5805c Please reopen the ticket if you have any other problems. Thanks, Brian |