| Summary: | slurmctld segfault (_job_alloc) | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Kilian Cavalotti <kilian> |
| Component: | slurmctld | Assignee: | Carlos Tripiana Montes <tripiana> |
| Status: | RESOLVED DUPLICATE | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | CC: | tripiana |
| Version: | 22.05.3 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Stanford | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: | gdb 't a a bt' | ||
Same applies here. As we have a good reproducer of the issue in Bug 14885, I can follow the code while debugging to see how things are while hitting the other 2 segfaults with that patched version. I guess if it doesn't segfault it's because it's already fixed :). Marking as duplicate now. Cheers, Carlos. *** This ticket has been marked as a duplicate of ticket 14885 *** |
Created attachment 26893 [details] gdb 't a a bt' sorry the bug summaries are not very creative, but we just hit another segfault, yet another different backtrace (although I suspect 15007 and this one may be related to 14885?) From core.24193 -- 8< -------------------------------------------------------------------------- (gdb) bt #0 bit_size (b=0x0) at bitstring.c:286 #1 0x000000000044345c in _job_alloc (gres_state_job=gres_state_job@entry=0x49b04f0, job_gres_list_alloc=0x49b0580, gres_state_node=<optimized out>, node_cnt=node_cnt@entry=1, node_index=node_index@entry=719, node_offset=node_offset@entry=0, job_id=job_id@entry=62849174, node_name=node_name@entry=0x7f6a98084c10 "sh02-12n04", core_bitmap=core_bitmap@entry=0x7f6a99266310, new_alloc=new_alloc@entry=false) at gres_ctld.c:460 #2 0x0000000000444c72 in gres_ctld_job_alloc (job_gres_list=<optimized out>, job_gres_list_alloc=job_gres_list_alloc@entry=0x49afdc8, node_gres_list=node_gres_list@entry=0x2418130, node_cnt=1, node_index=node_index@entry=719, node_offset=node_offset@entry=0, job_id=62849174, node_name=0x7f6a98084c10 "sh02-12n04", core_bitmap=0x7f6a99266310, new_alloc=new_alloc@entry=false) at gres_ctld.c:951 #3 0x00007f6d2e573339 in job_res_add_job (job_ptr=job_ptr@entry=0x49afcb0, action=action@entry=JOB_RES_ACTION_NORMAL) at job_resources.c:328 #4 0x00007f6d2e568b5f in select_p_select_nodeinfo_set (job_ptr=0x49afcb0) at cons_common.c:1892 #5 0x00007f6d2fe99bc9 in select_g_select_nodeinfo_set (job_ptr=job_ptr@entry=0x49afcb0) at select.c:812 #6 0x00000000004aa6f8 in _sync_jobs_to_conf () at read_config.c:1382 #7 0x00000000004ad0de in read_slurm_conf (recover=recover@entry=1, reconfig=reconfig@entry=true) at read_config.c:1694 #8 0x00000000004a66f1 in _slurm_rpc_reconfigure_controller (msg=0x7f6a981c8190) at proc_req.c:3324 #9 0x00000000004a8230 in slurmctld_req (msg=msg@entry=0x7f6a981c8190) at proc_req.c:6676 #10 0x000000000042de35 in _service_connection (arg=0x0) at controller.c:1380 #11 0x00007f6d2f99aea5 in start_thread () from /lib64/libpthread.so.0 #12 0x00007f6d2f1a7b0d in clone () from /lib64/libc.so.6 -- 8< -------------------------------------------------------------------------- "t a a bt" attached Cheers, -- Kilian