Ticket 7136 - Jobs are held with reason JobHeldAdmin instead of JobHeldUser
Summary: Jobs are held with reason JobHeldAdmin instead of JobHeldUser
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 18.08.6
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Dominik Bartkiewicz
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2019-05-29 04:32 MDT by Renate Dohmen
Modified: 2019-06-05 04:22 MDT (History)
1 user (show)

See Also:
Site: Max Planck Computing and Data Facility
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 19.05.1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
slurmctld log file of the described scenario (3.98 KB, text/x-log)
2019-05-29 04:32 MDT, Renate Dohmen
Details
jobinfo file (3.12 KB, text/plain)
2019-05-29 04:33 MDT, Renate Dohmen
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Renate Dohmen 2019-05-29 04:32:19 MDT
Created attachment 10415 [details]
slurmctld log file of the described scenario

When a user launches jobs with hold parameter and QOS which has MaxJobsPU limit, then jobs hold with reason JobHeldAdmin instead of JobHeldUser.

This happens in the following situation:

1) The user submits more than QOSMaxJobsPerUserLimit jobs in hold state.
2) The user releases his held jobs, some of them start to run, some are pending with QOSMaxJobsPerUserLimit reason.
3) At the same moment the user submits new hold jobs. And these new jobs are held with JobHeldAdmin reason.

In the attachment we provide the slurmctld log related to this situation and jobinfo for one of job (109015) which was held with reason JobHeldAdmin.
Comment 1 Renate Dohmen 2019-05-29 04:33:05 MDT
Created attachment 10416 [details]
jobinfo file
Comment 2 Dominik Bartkiewicz 2019-05-29 05:25:38 MDT
Hi

Thanks for reporting this, I can reproduce this.
I will let you know when the fix will be in the repo.

Dominik
Comment 5 Dominik Bartkiewicz 2019-06-05 04:22:26 MDT
Hi

This commit fixed this issue it will be in 19.05.1 and above.
https://github.com/SchedMD/slurm/commit/fe8226e72
I'll go ahead and close this. Thank you.

Dominik