| Summary: | Incorrect CPU_IDs reported by scontrol show job for hyperthreaded nodes | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Martin Perry <martin.perry> |
| Component: | Other | Assignee: | Moe Jette <jette> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | CC: | bill.brophy, da |
| Version: | 14.11.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Atos/Eviden Sites | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 14.03.4 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: |
Patch to fix bug
Patch, Version 2 |
||
|
Description
Martin Perry
2014-06-02 08:44:46 MDT
As you say, this will involve a lot of code change. It is also only relevant if you want to report CPU IDs rather than "abstract core IDs" for the "scontrol -d show job" command. Have you considered changing scontrol to get the node information when needed (i.e. when "scontrol -d show job" is run)? That is probably much simpler and probably provides better overall system scalability than adding ThreadsPerCore array to the job_resources_t struct. job_resources_t already has arrays for sockets_per_node and cores_per_socket. That's why I thought it would make sense to add threads_per_core. But since that would mean a lot of code change, and it's not really very efficient, I'll look into changing scontrol to get the node info dynamically. Thanks. (In reply to Martin Perry from comment #2) > job_resources_t already has arrays for sockets_per_node and > cores_per_socket. That's why I thought it would make sense to add > threads_per_core. But since that would mean a lot of code change, and it's > not really very efficient, I'll look into changing scontrol to get the node > info dynamically. Thanks. Those are needed to interpret the core_bitmap. It's easy enough to add the cores_per_thread field too, but it is a lot more work and arguably not needed in that data structure. Either design should be fine so, you choice... Created attachment 922 [details]
Patch to fix bug
Issuing an RPC for each node is not a very scalable solution. Is there some reason you don't just get the full node table and pick out the information about specific nodes of interest? That offloads most of the work to the squeue command. Moe, Yes, I should make it more scalable. So I need to call slurm_load_node instead of slurm_load_node_single, and then pick out the nodes of interest from the node_array, right? (In reply to Martin Perry from comment #6) > Moe, > Yes, I should make it more scalable. So I need to call slurm_load_node > instead of slurm_load_node_single, and then pick out the nodes of interest > from the node_array, right? Correct. Load the full table and then search it by node name as needed. It's pretty simple and there are many examples (see sinfo, sview, scontrol, etc.). Created attachment 928 [details]
Patch, Version 2
Version 2 of patch attached, to address scalability issue in original version.
Martin, this patch still makes the call for each job, which should be avoided. I am not sure how to make the current api work better. Perhaps changing the slurm_sprint_job_info() and slurm_print_job_info function calls to accept a node_info_msg_t which is filled in before hand. Calling slurm_load_*node in the function should be avoided. This is just one idea, perhaps there are others? I was thinking "scontrol show job" is relatively rare, so doing the slurm_load_node each time would be ok. But I'll look into making the call outside the function. The other option Moe and I discussed previously is adding a threads_per_core array to job_resources_t, but that would impact many more files. I have a few other things that need my attention so I probably won't get back to this today. Patch, version 2 is here: https://github.com/SchedMD/slurm/commit/83d626ca45032d71f3847a9a92e9a21fac5e8031 I moved around the code a bit so the node table only needs to get loaded once for better performance: https://github.com/SchedMD/slurm/commit/04921aa2d2662cc4236aaa4d6e86f99f7caeb19f These changes will be in version 14.03.4 Thanks! |