| Summary: | squeue: multi-word pending reasons make parsing difficult | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Kilian Cavalotti <kilian> |
| Component: | User Commands | Assignee: | Dominik Bartkiewicz <bart> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | Ben Roberts <ben> |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 19.05.6 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Stanford | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
Hi
You can set column separator in squeue.
eg.:
root@mx1:~# squeue -o '%.18i|%.9P|%.8j|%.8u|%.2t|%.10M|%.6D|%R'
JOBID|PARTITION| NAME| USER|ST| TIME| NODES|NODELIST(REASON)
20886_[4-10%3]| fast| wrap| root|PD| 0:00| 1|(JobArrayTaskLimit)
20886_1| fast| wrap| root| R| 0:03| 1|centos03
20886_2| fast| wrap| root| R| 0:03| 1|centos01
20886_3| fast| wrap| root| R| 0:03| 1|centos04
another workaround:
Squeue has also "Reason" field which is always one word.
Let me know if any of this works for you.
I am also checking if we can handle this in another way.
Dominik
(In reply to Dominik Bartkiewicz from comment #1) > You can set column separator in squeue. > eg.: > root@mx1:~# squeue -o '%.18i|%.9P|%.8j|%.8u|%.2t|%.10M|%.6D|%R' Ah true, but that doesn't work with -O, right? $ squeue -j 65199233 -O "jobid|reason" squeue: error: Invalid job format specification: jobid|reason Unfortunately, -o doesn't have any of the TRES-related fields, as far as I can tell, and doesn't seem to be offering any way to list GPU information with cons_tres. > another workaround: > Squeue has also "Reason" field which is always one word. I can't find it :( All the combinations I tried are multiple words: $ squeue -j 65199233 -o %r REASON ReqNodeNotAvail, UnavailableNodes:sh02-01n60 $ squeue -j 65199233 -O reasonlist:60 NODELIST(REASON) (ReqNodeNotAvail, UnavailableNodes:sh02-01n60) $ squeue -j 65199233 -O reason:60 REASON ReqNodeNotAvail, UnavailableNodes:sh02-01n60 scontrol is the only one that replaces spaces by underscores in the reason field: $ scontrol show job 65199233| grep -i reason JobState=PENDING Reason=ReqNodeNotAvail,_UnavailableNodes:sh02-01n60 Dependency=(null) > I am also checking if we can handle this in another way. That'd be great, thanks! Cheers, -- Kilian Hi Sorry for the late response. I don't find any easy fix for this. Maybe as a workaround, you can use "scontrol --oneliner show job"? If you need specific formatted output, you can create some script/program which use one of available API C/Perl/REST/python. Dominik Hi I'm going to close this ticket as info given. Feel free to reopen if you have any more questions. Dominik (In reply to Dominik Bartkiewicz from comment #4) > I'm going to close this ticket as info given. > Feel free to reopen if you have any more questions. Thanks Dominik. Sorry, I somehow missed your 07-20 response. "scontrol --oneliner show job" somehow works, but it's much slower (and heavier) than squeue, and doesn't allow filtering on specific display options either. Can't the pending reason strings that squeue displays just be cleaned up, so they use underscores instead of spaces, like in scontrol? The fact that they're displayed differently also seems to indicate that they're either duplicated in the code, or that they're somehow converted before being displayed in one tool but not in the other. I'm just wondering if the same treatment could be applied to both squeue and scontrol, so the strings are consistent across the tools. Thanks -- Kilian Hi scontrol uses completely separately from squeue code to display job records. scontrol uses slurm_sprint_job_info(), squeue builds output line by line depends on the requested format. Escaping space was added to slurm_sprint_job_info() in slurm 1.4. squeue is a user tool, and changing the default behavior just for easier parsing is not sufficient reason. We can add another format (eg.: ReasonNoSpace) or an option to squeue to handle this. But personally, I don't think this is a good idea. Dominik Hi I have talked with Tim about this ticket. Does adding "--delimiter" (like in sacct) option will solve this issue? If yes, we can add this easily to 20.11. Dominik Hi Dominik, (In reply to Dominik Bartkiewicz from comment #7) > I have talked with Tim about this ticket. > Does adding "--delimiter" (like in sacct) option will solve this issue? > If yes, we can add this easily to 20.11. Adding a --delimiter would allow to work around the issue for sure, but it would still keep two of the Slurm tools (squeue and scontrol) with inconsistent outputs. > and changing the default behavior just for easier parsing is not sufficient reason. Easier parsing is one reason, but the main one is really consistency. Having to remember that various tools use different options to specify the same query (like how a nodelist is requested with -w in squeue, with -N in sacct and with -n in sinfo) is already something, and granted, hard to change, having the same job property displayed differently in each tool is not really helping. What would be the downside of escaping spaces for job pending reasons in squeue, like it's already done in scontrol? If the concern is that it could break existing user tools that are parsing the output of squeue, yes maybe, but that would also most certainly simplify them, and make them easier to maintain for their authors. And that's a change that they would probably welcome. Cheers, -- Kilian Hi I started implementing this, and I realized that we already support syntax like: squeue -O 'Reason:|,tres-per-job:|' I am sorry that you needed to wait so long to get this answer. Let me know if we can close this bug now. Dominik (In reply to Dominik Bartkiewicz from comment #9) > Hi > > I started implementing this, and I realized that we already support syntax > like: > squeue -O 'Reason:|,tres-per-job:|' Aaah, that's very interesting! That would certainly make a great addition in the documentation. > I am sorry that you needed to wait so long to get this answer. > Let me know if we can close this bug now. Well, that's a nice workaround, but that doesn't really resolve the initial problem, which was inconsistent outputs between squeue and scontrol. Having multi-word, space-separated contents in a tabular output format is rarely a good thing, IMHO. So, although that separator workaround could be useful, it sill requires users to add a second level of processing (squeue with separator to get proper column alignment, then remove the separator to process the results), and I still feel the best way to fix this would be to replace spaces with underscores in the pending reasons displayed by squeue, for both consistency and proper alignment. I really don't see what the advantages are in keeping the current behavior in squeue, I'm afraid :( Thanks! -- Kilian (In reply to Kilian Cavalotti from comment #11) > (In reply to Dominik Bartkiewicz from comment #9) > > Hi > > > > I started implementing this, and I realized that we already support syntax > > like: > > squeue -O 'Reason:|,tres-per-job:|' > > Aaah, that's very interesting! That would certainly make a great addition in > the documentation. > I will document this feature. > > I am sorry that you needed to wait so long to get this answer. > > Let me know if we can close this bug now. > > Well, that's a nice workaround, but that doesn't really resolve the initial > problem, which was inconsistent outputs between squeue and scontrol. Having > multi-word, space-separated contents in a tabular output format is rarely a > good thing, IMHO. > I wouldn't say I like this behavior of scontrol. We have other potential multi-word fields in squeue and scontrol, but now we do this just for scontrol+Reason. This was introduced in 2008 by d1b76b35688, and that why we can't revert it now. > So, although that separator workaround could be useful, it sill requires > users to add a second level of processing (squeue with separator to get > proper column alignment, then remove the separator to process the results), > and I still feel the best way to fix this would be to replace spaces with > underscores in the pending reasons displayed by squeue, for both consistency > and proper alignment. > > I really don't see what the advantages are in keeping the current behavior > in squeue, I'm afraid :( squeue is mostly a tool for humans. My personal opinion is this looks nicer and is easier to read. Dominik > > Thanks! > -- > Kilian Hi We added info about suffix option in -O to sinfo and squeue man. Let me know if it's ok to close. Dominik (In reply to Dominik Bartkiewicz from comment #20) > Hi > > We added info about suffix option in -O to sinfo and squeue man. Thanks. > Let me know if it's ok to close. Well, if there's no chance we could get consistent output between scontrol and squeue, then, yes, there's no point in keeping this open. Thanks for trying to find workarounds, the separator will definitely be useful, even though I don't think it addresses the original issue. :) Cheers, -- Kilian Hi I am closing the issue then. Dominik |
Hi SchedMD! Some of the pending reasons in squeue are displayed as multi-word strings, separated by spaces. That makes formatting and parsing columns quite complicated. For instance: $ squeue -j 65045326 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 65045326 bigmem May005_b btolar1 PD 0:00 1 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions) That line has 22 columns, instead of the regular 8. Because column separators are not supported to customize the output, there's no real way around it. I suspect that this has already been considered for scontrol, because the same reason is displayed as a underscore-separated string to ensure a consistent number of fields in the output: # scontrol show job 65045326 | grep Reason JobState=PENDING Reason=Nodes_required_for_job_are_DOWN,_DRAINED_or_reserved_for_jobs_in_higher_priority_partitions Dependency=(null) Would it be possible to have squeue use the same mechanism and display long pending reasons as a single word too? Thanks! -- Kilian