Ticket 15166

Summary: bug of preemption(& Gang sched) mode.
Product: Slurm Reporter: xusx <xusx>
Component: SchedulingAssignee: Jacob Jenson <jacob>
Status: RESOLVED INVALID QA Contact:
Severity: 6 - No support contract    
Priority: ---    
Version: 21.08.0   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description xusx 2022-10-13 01:26:49 MDT
Hello engineer

Recently, I got into trouble in slurm because of a problem related to Preempt (& gang).

config:
PreemptMode = GANG,SUSPEND
PreemptType = preempt/partition_prio
....
PartitionName=debug .... PreemptMode=OFF

Phenomenon:
There are two jobs in the debug queue 101 and 102, if 101 is in the CG exit state for a long time, job 102 will be suspended first, and then "running" "suspend" "running", in slurmctld's log, job 102 suspend, is trying to let 101 running, but detected 101 job has ended, so the two job states are CG (101), Suspend (102). This looks like it fits the rules of gang scheduling.

In fact, after preemption is enabled, if there is a job CG for a long time, it affects all other job in this queue running. Until the CG node times out and goes offline. It appears that the configuration does not take effect when preemption of a queue is turned off individually

My question:
1, in the configuration of queue "PreemptMode=OFF", whether the above phenomenon is normal, I see the code found that the gs_part_list actually contains all the queues, regardless of whether the queue is configured PreemptMode=OFF, is this a bug
2. Can gang scheduling exist independently of preemption mode?