Hi, We've set up a Slurm-powered compute cluster as an integrated part of a Linux estate with thousands of users, any of whom should be free to use it. However, in practice, we expect only a few hundred to actually use it, and maybe only tens on a day-to-day basis. We are enforcing associations, so we have to have all users of the system registered on the Slurm database so that they can submit work. This means that we have a lot of unnecessary dead weight, and we have to run a cron job to maintain the list. The last point seems particularly common among Slurm users. Having a very large majority of inactive users means that our fair-share calculations are badly skewed (whichever algorithm we choose). We would like a solution that allows a user to be added to the accounting database at the time of their first submit. This avoids having to use a cron job (or other clunky software) to synchronise the users. This way, only active users are present in the database for calculations like fair-share. We've thought about using a fairly nasty job_submit.lua plugin to achieve this, but it has to shell out to sacctmgr to check if a user exists, which takes 0.2 seconds per submit. Obviously this causes a bottleneck and doesn't scale particularly well, since submits are done in serial. Is there a way to check this within the current API ('the slurmctld's global data structures')? Could we have an extension to the API to check this within a job submit plugin, or configuration for a hook command that could be run if the user does not exist in the accounting database? Is there a better existing way to achieve this? Adding a user on login is not an option for us, since we expect to add all desktops and servers as 'login nodes' (Slurm installed, munge running, no slurm daemons). job_submit.lua example: ``` require = io require = os require = string function slurm_job_submit(job_desc, part_list, submit_uid) local getent_answer, username, rc f = io.popen("getent passwd " .. submit_uid) getent_answer = f:read() f:close() if not getent_answer then slurm.log_error("failed to lookup uid " .. submit_uid) return slurm.FAILURE end username = string.match(getent_answer, "^(%w+):") f = io.popen("sacctmgr show -n user " .. username) if not f:read() then rc = os.execute("sacctmgr add -i user " .. username .. " DefaultAccount=normalexpress" .. " Account=whenever") if rc ~= 0 then slurm.log_error("could not add user " .. username) return slurm.FAILURE end slurm.log_info("added user " .. username) end f:close() return slurm.SUCCESS end ``` Cheers, Ben Fitzpatrick
Our "5 - Enhancement" severity doesn't get tracked closely; we re-evaluate those requests as part of the release planning process and don't respond quickly to most of them. Our support staff triage and respond promptly to Sev. 1-4 bugs (which is how our support contracts are structured). - Are you still on the 14.11 release when you'd submitted this, or have you had a chance to update to 15.08? We'd encourage you to stay current as best you can, and a lot of accounting and resource management features have been reworked in this new release. - Which priority plugin are you using? The fairtree implementation should address most priority issues from having unused users as part of an account. - For what you're trying to accomplish, a job_submit plugin may be the best option. If you added a check if job_desc.account is set before you shell out to see if the user has any account entries you'd avoid most of the overhead of your current approach. We don't have a way for the submit plugin to directly query the accounting info (and definitely don't have a way to modify it) at present, although this may be something to look into for a future release. - Tim
Looking through the Lua plugin further, checking that "default_account" is non-NULL would be sufficient to ensure a user has a record in the accounting database: "if not job_desc.default_account then" could replace the "getent" call and "sacctmgr" calls that are running on every job submission. You'd still need to shell out for adding them in Lua (in a C plugin you'd be able to do this directly).
Ben - I'm going to go ahead and mark this as Resolved/Infogiven for now. If you have any further questions on this please re-open. cheers, - Tim
Thanks Tim, I'll try the default_account field. This looks like it wouldn't add much overhead, as we wouldn't need to shell out very often. Thanks for your help. Other points: - We are still on the 14.11 release, but plan to update. - We tried all the algorithms, but all gave non-optimal numbers, including fair tree. Fair tree sets all equal-usage users to the same fair share rank, but still increments the rank for each user. This means that for 4 active users and 1996 inactive users, our users' rank in order of increasing usage goes something like this: 2000, 2000, 2000, 2000, 2000, ..., 2000, 4, 3, 2, 1. Fair share scores will go like this: 1, 1, 1, 1, 1, ..., 1, 4/2000, 3/2000, 2/2000, 1/2000 - here's an example with the names changed: User* | Usage | Fair Share telcontar | 0 | 1.000000 grey | 0 | 1.000000 ... bombadil | 0 | 1.000000 brandybuck | 533128 | 0.001016 took | 839251 | 0.000762 gamgee | 1043430 | 0.000508 baggins | 12094456 | 0.000254 (* Association, really) These small gaps between the active users (compared to large gaps between them and the inactive users) makes it very difficult to set weights with a competing priority mechanism such as job age (e.g. trying to balance a newly submitted high fair share job with a longstanding low fair share job). We're now using more-or-less pure fair share, so this problem matters less to us.
On 11/26/2015 08:06 AM, bugs@schedmd.com wrote: > - We are still on the 14.11 release, but plan to update. Let me know if you need any specific advise on this; the update should be fairly straightforward. There were some changes to the database layouts that can take some time to process - I'd recommend updating slurmdbd first - potentially a few days ahead of updating slurmctld + slurmd's across the cluster - to give it a chance to finish first. 15.08.4 should be stable at this point - there aren't any major issues we're currently tracking unlike those that led to the .3 then .4 releases in quick succession. > These small gaps between the active users (compared to large gaps between them > and the inactive users) makes it very difficult to set weights with a competing > priority mechanism such as job age (e.g. trying to balance a newly submitted > high fair share job with a longstanding low fair share job). > > We're now using more-or-less pure fair share, so this problem matters less to > us. The final calculated priority value is a 32-bit unsigned int, thus can go up to 4-billion and change. If you want some of the granularity back you can set PriorityWeightFairshare to 100,000,000 and still be well within that range. If your users are used to looking at the values it may be somewhat odd to see, but the priority only matters with respect to other priority values within your cluster anyways. If you've no other open questions, I'll go ahead and mark this as Resolved/Infogiven when at work next week. - Tim