| Summary: | Purge queue after shutdown | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Davide Vanzo <davide.vanzo> |
| Component: | Scheduling | Assignee: | Dominik Bartkiewicz <bart> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 2 - High Impact | ||
| Priority: | --- | CC: | bart |
| Version: | 15.08.11 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Vanderbilt | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Davide Vanzo
2016-08-04 07:41:08 MDT
Hi You can use -c option to initial start of slurmctld. http://slurm.schedmd.com/slurmctld.html: -c Clear all previous slurmctld state from its last checkpoint. With this option, all jobs, including both running and queued, and all node states, will be deleted. Without this option, previously running jobs will be preserved along with node State of DOWN, DRAINED and DRAINING nodes and the associated Reason field for those nodes. NOTE: It is rare you would ever want to use this in production as all jobs will be killed. Dominik Hi Dominik, thank you for your quick reply. If I understand correctly, that would not only purge the job queue but also delete all node states. Am I right? That is something we don't want since that would release nodes that have been set as DOWN for multiple reasons. DV (In reply to Davide Vanzo from comment #2) > Hi Dominik, > thank you for your quick reply. > If I understand correctly, that would not only purge the job queue but also > delete all node states. Am I right? That is something we don't want since > that would release nodes that have been set as DOWN for multiple reasons. > > > DV You are correct. You will need to delete the "job_state" and "job_state.old" files from your configured "StateSaveLocation" directory in order to delete the jobs, but retain all other state information. Moe, that is what I thought but I preferred to get confirmation from you before touching stuff I should not touch. A couple of related questions. 1) I also have a job_state.new file in the same folder. What is that? Should I delete that one too? 2) Will Slurm update the database accordingly by setting the pugred jobs as "CANCELLED" or will they remain in the old state in the database? DV On Aug 4 2016, at 9:27 am, bugs@schedmd.com wrote: Moe Jette<mailto:jette@schedmd.com> changed bug 2970<https://bugs.schedmd.com/show_bug.cgi?id=2970> What Removed Added CC jette@schedmd.com Comment # 3<https://bugs.schedmd.com/show_bug.cgi?id=2970#c3> on bug 2970<https://bugs.schedmd.com/show_bug.cgi?id=2970> from Moe Jette<mailto:jette@schedmd.com> (In reply to Davide Vanzo from comment #2<show_bug.cgi?id=2970#c2>) > Hi Dominik, > thank you for your quick reply. > If I understand correctly, that would not only purge the job queue but also > delete all node states. Am I right? That is something we don't want since > that would release nodes that have been set as DOWN for multiple reasons. > > > DV You are correct. You will need to delete the "job_state" and "job_state.old" files from your configured "StateSaveLocation" directory in order to delete the jobs, but retain all other state information. ________________________________ You are receiving this mail because: * You reported the bug. You can safely remove this file to, this file is used during updating of job_state file. Dominik And yes, all info in database will be correctly updated. Dominik Great, thank you Dominik. You can close this ticket now. Have a great day, DV On Aug 4 2016, at 11:08 am, bugs@schedmd.com wrote: Comment # 6<https://bugs.schedmd.com/show_bug.cgi?id=2970#c6> on bug 2970<https://bugs.schedmd.com/show_bug.cgi?id=2970> from Dominik Bartkiewicz<mailto:bart@schedmd.com> And yes, all info in database will be correctly updated. Dominik ________________________________ You are receiving this mail because: * You reported the bug. Closing as resolved/infogiven. Please reopen if there's anything else I can address. Dominik Dominik, your solution worked well but it had the unexpected and undesired effect of resetting the JOBID counter. So now the new jobs start from 1. Is there a way to set the counter to a different value? Davide That made the trick. Thanks again. You can re-close the ticket now. Davide On Aug 4 2016, at 4:31 pm, bugs@schedmd.com wrote: Comment # 10<https://bugs.schedmd.com/show_bug.cgi?id=2970#c10> on bug 2970<https://bugs.schedmd.com/show_bug.cgi?id=2970> from Danny Auble<mailto:da@schedmd.com> FirstJobID http://slurm.schedmd.com/slurm.conf.html#OPT_FirstJobId ________________________________ You are receiving this mail because: * You reported the bug. |