Breaking this issue out from bug#7872 comment#49 When a node becomes unresponsive and the slurm controller is trying to terminate a job, it should eventually drain the node since something is wrong with the node or the network connection to the node putting the system in an unknown state. Aiming this change at 20.02 since it changes how Slurm behaves.