Ticket 16822 - slurm23.02-2 breaks sshare because of a missing symbol in priority_basic plugin
Summary: slurm23.02-2 breaks sshare because of a missing symbol in priority_basic plugin
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 23.02.2
Hardware: Linux Linux
: C - Contributions
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-05-25 05:05 MDT by Taras Shapovalov
Modified: 2023-05-25 11:01 MDT (History)
0 users

See Also:
Site: -Other-
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 23.02.3
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
patch with the fix (519 bytes, patch)
2023-05-25 05:05 MDT, Taras Shapovalov
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Taras Shapovalov 2023-05-25 05:05:21 MDT
Created attachment 30451 [details]
patch with the fix

This command does not work with priority_basic plugin:

# sshare
sshare: error: plugin_load_from_file: dlopen(/cm/shared/apps/slurm/23.02.2/lib64/slurm/priority_basic.so): /cm/shared/apps/slurm/23.02.2/lib64/slurm/priority_basic.so: undefined symbol: job_list
sshare: error: Couldn't load specified plugin name for priority/basic: Dlopen of plugin file failed
sshare: error: cannot create priority context for priority/basic
sshare: fatal: failed to initialize priority plugin

The symbol is missing in priority_basic.so however it is defined in priority_multifactor.so:

# nm -gD /cm/shared/apps/slurm/23.02.2/lib64/slurm/priority_basic.so | grep job_list
                U job_list
# nm -gD /cm/shared/apps/slurm/23.02.2/lib64/slurm/priority_multifactor.so | grep job_list
000000000020b330 B job_list

The problem does not happen in slurm23.02-1. This commit introduces the problem:
https://github.com/SchedMD/slurm/commit/1473e5c287fa4d9e32264123a474ece1731e6fa1

It can be fixed by defining the missing symbol in priority_basic.c, mush the same way it is already done in priority_multifactor.c

Attached patch with the fix.
Comment 1 Jason Booth 2023-05-25 09:38:52 MDT
Hi Taras! I hope you are doing well. If you log patches like this with us as a contribution then the issues filed are routed correctly. For now, I moved this one over manually.
Comment 2 Tim Wickberg 2023-05-25 11:01:35 MDT
Thanks Taras. This fix will be in 23.02.3 when released:

commit b968136d8e762ca143fe8f7a020ffb17e1a2b4d1
Author:     Taras Shapovalov <taras.shapovalov@brightcomputing.com>
AuthorDate: Thu May 25 10:58:44 2023 -0600

    Fix sshare with priority/basic.
    
    Needs a weak symbol for job_list, otherwise the plugin will not load
    outside of slurmctld.
    
    Bug 16822.