Ticket 3995

Summary:	Reset jobid
Product:	Slurm	Reporter:	Amit Kumar <ahkumar>
Component:	Accounting	Assignee:	Director of Support <support>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	4 - Minor Issue
Priority:	---	CC:	ahkumar, brian
Version:	17.02.6
Hardware:	Linux
OS:	Linux
See Also:	https://bugs.schedmd.com/show_bug.cgi?id=4538
Site:	SMU	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description Amit Kumar 2017-07-14 11:30:17 MDT

Dear Schedmd,

We just upgraded to 17.02.06, and I am interested in forcing reset of jobids, I assumed dropping tables and recreating restarting slurmdbd would reset the Jobid to 1. That doesn't seems to happen. Can you please advise on how I can achieve this? Do I need to clean up the statesave locations?

Thank you,
Amit

Comment 4 Tim Shaw 2017-07-14 12:44:25 MDT

Amit,

Are you wanting to reset the job id's back to 1 when there are other jobs still running or do you plan on doing it with an empty job queue?  I'll need to do some investigation to find out the easiest way to do this, but having current jobs running could complicate things.

Also, are you having major outages because of this?  Usually, classifying bugs as "Severity 2 — High Impact" means this problem is causing sporadic outages or having an adverse impact to end user interaction with the system.  See our bug classifications listed here (at the bottom of the page):

https://www.schedmd.com/support.php

If this isn't the case, can we lower the severity?

Thanks

Tim

Comment 5 Amit Kumar 2017-07-14 13:33:01 MDT

Hi Tim,

We had a maintenance window for the upgrade this morning and hoped this reset would happen by dropping database tables, which it didn't. And the dilemma is, if I have to cancel jobs to reset the counter, then I rather do it now as I am working through another issues(seg fault on spank-x11) after the upgrade.  
If we can reset while the jobs are still running that will be great. We have users waiting for us to let them in and start using the system. I have reduced it to medium. I will be happy to read up any documentation to accomplish this if you have any pointers. 

Please advise. 

Thank you,
Amit

Comment 10 Tim Shaw 2017-07-14 15:42:30 MDT

Amit,

I know you're on a bit of a time crunch so I'll share what I found out so far.  I've been testing this for a few hours now and the only way I've been able to reset the jobid's involves deleting the data in StateSaveLocation.  Here's how:

First, set a MaxJobId in the slurm.conf file to a number larger than your current job id's:

MaxJobId=999999

Next, run "scontrol reconfigure" for the controller to pick up the change.

Submit a dummy job that requests that very last job id (you may need to make sure no other jobs are incoming at this time):

sbatch --jobid=999999 --wrap "sleep 15"

Wait for it to start running.

Now stop the controller:

kill `pgrep slurmctld`

Remove the data in StateSaveLocation:

rm -rf state/*

Start the controller back up:

slurmctld

Now, submit a few test jobs and you should see you job id's reset.

sbatch --wrap "sleep 15"
sbatch --wrap "sleep 15"
sbatch --wrap "sleep 15"

I'm not sure if removing the state is acceptable for you but I thought I would share the only way I've found so far that works and let you make that determination.  Especially, since it sounds like the maintenance window you have may be very short.  If that won't work let me know, I can continue investigating but I'm unsure how long it may be to find another solution, if there is one.

Regards.

Tim

Comment 11 Amit Kumar 2017-07-17 04:42:53 MDT

Hi Tim,

I was able to reset jobid counter with your instructions. Thank you very much for your prompt response. Also I tried to respond to your update on Friday but the email bounced back saying user email was not recognized. I believe it has to do with my email address that is registered(ahkumar@smu.edu) vs what my phone mail client used(ahkumar@mail.smu.edu is an alias). I have reduced this importance of this bug. 

In addition I had another question not sure if it makes sense to open up a new bug request or if I could piggyback on this one. Please let me know which one is better. 

Here is my question. 

We have GPU nodes with 1 gpu on each node, we have GRES correctly defined, and works fine if we explicitly request --gres=gpu:1; What we would like to accomplish is, since we are using these nodes exclusively for GPU development we want users to be able to request allocation to these nodes and just by either specifying -n1 or -N1 in the batch script they get the entire node along with the GPU. Hence users do not have to explicitly request that in the line of commands. If I indicate Shared=Exclusive and ExclusiveUser=yes it guarantees me all GPUs are allocated nut not the GPU

Snippet of slurm.conf:
SelectType=select/cons_res
SelectTypeParameters=CR_CORE_Memory,CR_ONE_TASK_PER_CORE
NodeName=DEFAULT Sockets=2 CoresPerSocket=18 ThreadsPerCore=1 RealMemory=256000
PartitionName=gpgpu-1 Nodes=p[001-036],tp001 MaxTime=7-00:00:00 Shared=EXCLUSIVE ExclusiveUser=yes


[ahkumar@login01 ~]$ srun -p gpgpu-1 -N1 /bin/sleep 200 &
[1] 26670

[ahkumar@login01 ~]$ scontrol show node p031
NodeName=p031 Arch=x86_64 CoresPerSocket=18
   CPUAlloc=36 CPUErr=0 CPUTot=36 CPULoad=0.01
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=gpu:1
   NodeAddr=p031 NodeHostName=p031
   OS=Linux RealMemory=256000 AllocMem=256000 FreeMem=250697 Sockets=2 Boards=1
   State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=ahkumar(507002) MCS_label=N/A
   Partitions=gpgpu-1
   BootTime=2017-07-15T19:41:01 SlurmdStartTime=2017-07-15T19:42:35
   CfgTRES=cpu=36,mem=250G,gres/gpu=1
   AllocTRES=cpu=36,mem=250G
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s


[ahkumar@login01 ~]$ scontrol show part gpgpu-1
PartitionName=gpgpu-1
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=login[01-05] Default=NO QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=YES GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=p[001-036],tp001
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=EXCLUSIVE
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=1332 TotalNodes=37 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

[ahkumar@login01 ~]$ scontrol show job 100
JobId=100 JobName=sleep
   UserId=ahkumar(507002) GroupId=ahkumar(507002) MCS_label=N/A
   Priority=35002 Nice=0 Account=default QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:01:27 TimeLimit=7-00:00:00 TimeMin=N/A
   SubmitTime=2017-07-17T05:27:34 EligibleTime=2017-07-17T05:27:34
   StartTime=2017-07-17T05:27:34 EndTime=2017-07-24T05:27:34 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=gpgpu-1 AllocNode:Sid=login01:26374
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=p031
   BatchHost=p031
   NumNodes=1 NumCPUs=36 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=36,mem=250G,node=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=250G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=(null) Reservation=(null)
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
   Command=/bin/sleep
   WorkDir=/users/ahkumar
   Power=

I personally prefer --gres=gpu:1 indicating intention, but our users would prefer it otherwise. 

Please let me know if I need to open up another ticket for this?

Thank you for your help.
Amit

Comment 12 Amit Kumar 2017-07-17 04:46:12 MDT

Correction: If I indicate Shared=Exclusive and ExclusiveUser=yes on a partition it guarantees/allocates me all CPUs, but nut not the GPUs

Comment 13 Tim Shaw 2017-07-17 08:27:47 MDT

Amit,

Glad to hear that worked for you.  If you wouldn't mind creating a new bug for this GRES question, that would be great.  We prefer each issue to be a separate bug for classification purposes.

Let me know if I'm okay to resolve this bug.

Thanks

Tim

Comment 14 Amit Kumar 2017-07-17 10:53:59 MDT

Resolved!! thank you to Tim for providing this solution. Would be happy to hear if there is a non-disruptive solution to this, if at all. 

Thank you,
Amit