Ticket 5725 - Setting MaxWall user association has no effect on job duration
Summary: Setting MaxWall user association has no effect on job duration
Status: RESOLVED DUPLICATE of ticket 4681
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 17.02.9
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Marshall Garey
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2018-09-12 15:25 MDT by doug.parisek
Modified: 2018-09-12 15:39 MDT (History)
0 users

See Also:
Site: Atos/Eviden Sites
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: Internal
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description doug.parisek 2018-09-12 15:25:19 MDT
Reproduced a reported problem. The problem was reported against version 16.05.5 but I had the same problem on 17.02.9 as follows:

I created a new account (test) and associated a user (dparisek) with that account. Then I first set MaxWall to 1 minute (and later MaxWallDurationPerJob to 1 min to see if that made a difference).  I set AccountingStorageEnforce=associations,limit; user dparisek ran a sleep job for 2 mins but the job remained running the entire 2 mins.
========================================================================
sacctmgr modify user dparisek set MaxWallDurationPerJob=1
 Modified user associations...
  C = cluster5   A = test                 U = dparisek 
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
[trek0] (slurm) dhp> sacctmgr -s show user where user=dparisek format=user,maxw
      User     MaxWall 
---------- ----------- 
  dparisek    00:01:00 


[trek0] (slurm) dhp> scontrol show config | grep Accounting
AccountingStorageBackupHost = (null)
AccountingStorageEnforce = associations,limits


<< srun sleep 120& >>
Ran entire 2 mins - maxwall not enforced!
========================================================================

Then I created a new QoS and associated that QoS with the user and associated that QoS with a MaxWall=1 min.  This DID work!  

[trek0] (slurm) dhp> sacctmgr add qos qosA
 Adding QOS(s)
  qosa
 Settings
  Description    = qosa
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
[trek0] (slurm) dhp> sacctmgr modify user dparisek set qos=qosa
 Modified user associations...
  C = cluster5   A = test                 U = dparisek 
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
[trek0] (slurm) dhp> sacctmgr -s show user where user=dparisek format=user,maxw,qos
      User     MaxWall                  QOS 
---------- ----------- -------------------- 
  dparisek    00:01:00                 qosa 
[trek0] (slurm) dhp> sacctmgr modify qos set maxwall=1 where user=dparisek

<< srun sleep 120& >>
Maxwall was enforced - job was killed after 1 min
========================================================================
Question: Did I miss something in the first scenario when I didn't have a QoS associated?  Is associating a QoS the only way to enforce MaxWall (and maybe other limits)?  If so then what is the point of allowing sacctmgr to set the limit without a QoS?  Is there a bug here?

Thanks.
Comment 1 Marshall Garey 2018-09-12 15:39:49 MDT
This has already been fixed in bug 4681. Marking as duplicate. Please reopen if it doesn't address your problem.

Specifically:

https://github.com/SchedMD/slurm/commit/9143c7c964

and more work done here:

https://github.com/SchedMD/slurm/commit/2ef56d4b96f93e0854

*** This ticket has been marked as a duplicate of ticket 4681 ***