| Summary: | How to monitor slurm jobs that are blocking the slurm queue | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Hjalti Sveinsson <hjalti.sveinsson> |
| Component: | Scheduling | Assignee: | Carlos Tripiana Montes <tripiana> |
| Status: | RESOLVED TIMEDOUT | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | - Unsupported Older Versions | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | deCODE | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Hjalti Sveinsson
2021-03-19 04:32:49 MDT
Hi Hjalti, For checking the cluster status in terms of jobs and nodes, squeue and sinfo are best options. You can automate some procedure checking the output from these commands. Regarding se 2nd question, a copy of the slurm.conf would be much appreciated. Also, from our records, we have some conf files already provided by deCODE for "hpc-sequor", "lhpc", "ru-hpc-test". Is this issue related to any of these? Thanks. Hi Hjalti, Whenever you have time please take a look to my previous answer and tell me if this is what you are looking for. Also, provide us the info I've requested, if possible. Thanks. Going to close the issue as timed out. Please, feel free to reopen it if necessary. Thanks. |