Ticket 1766 - Add a user's associations on submit
Summary: Add a user's associations on submit
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 14.11.6
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2015-06-25 20:10 MDT by Ben Fitzpatrick
Modified: 2015-11-26 04:30 MST (History)
1 user (show)

See Also:
Site: Met Office
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Ben Fitzpatrick 2015-06-25 20:10:55 MDT
Hi,

We've set up a Slurm-powered compute cluster as an integrated part of a Linux estate with thousands of users, any of whom should be free to use it. However, in practice, we expect only a few hundred to actually use it, and maybe only tens on a day-to-day basis.

We are enforcing associations, so we have to have all users of the system registered on the Slurm database so that they can submit work. This means that we have a lot of unnecessary dead weight, and we have to run a cron job to maintain the list. The last point seems particularly common among Slurm users. Having a very large majority of inactive users means that our fair-share calculations are badly skewed (whichever algorithm we choose).

We would like a solution that allows a user to be added to the accounting database at the time of their first submit. This avoids having to use a cron job (or other clunky software) to synchronise the users. This way, only active users are present in the database for calculations like fair-share.

We've thought about using a fairly nasty job_submit.lua plugin to achieve this, but it has to shell out to sacctmgr to check if a user exists, which takes 0.2 seconds per submit. Obviously this causes a bottleneck and doesn't scale particularly well, since submits are done in serial. Is there a way to check this within the current API ('the slurmctld's global data structures')? Could we have an extension to the API to check this within a job submit plugin, or configuration for a hook command that could be run if the user does not exist in the accounting database?

Is there a better existing way to achieve this?

Adding a user on login is not an option for us, since we expect to add all desktops and servers as 'login nodes' (Slurm installed, munge running, no slurm daemons).

job_submit.lua example:

```
require = io
require = os
require = string

function slurm_job_submit(job_desc, part_list, submit_uid)
    local getent_answer, username, rc
    f = io.popen("getent passwd " .. submit_uid)
    getent_answer = f:read()
    f:close()
    if not getent_answer then
        slurm.log_error("failed to lookup uid " .. submit_uid)
        return slurm.FAILURE
    end
    username = string.match(getent_answer, "^(%w+):")
    f = io.popen("sacctmgr show -n user " .. username)
    if not f:read() then
        rc = os.execute("sacctmgr add -i user " .. username ..
                        " DefaultAccount=normalexpress" ..
                        " Account=whenever")
        if rc ~= 0 then
            slurm.log_error("could not add user " .. username)
            return slurm.FAILURE
        end
        slurm.log_info("added user " .. username)
    end
    f:close()
    return slurm.SUCCESS
end
```

Cheers,

Ben Fitzpatrick
Comment 1 Tim Wickberg 2015-11-12 04:58:26 MST
Our "5 - Enhancement" severity doesn't get tracked closely; we re-evaluate those requests as part of the release planning process and don't respond quickly to most of them. Our support staff triage and respond promptly to Sev. 1-4 bugs (which is how our support contracts are structured).

- Are you still on the 14.11 release when you'd submitted this, or have you had a chance to update to 15.08? We'd encourage you to stay current as best you can, and a lot of accounting and resource management features have been reworked in this new release.

- Which priority plugin are you using? The fairtree implementation should address most priority issues from having unused users as part of an account.

- For what you're trying to accomplish, a job_submit plugin may be the best option. 

If you added a check if job_desc.account is set before you shell out to see if the user has any account entries you'd avoid most of the overhead of your current approach. We don't have a way for the submit plugin to directly query the accounting info (and definitely don't have a way to modify it) at present, although this may be something to look into for a future release.

- Tim
Comment 2 Tim Wickberg 2015-11-12 05:20:55 MST
Looking through the Lua plugin further, checking that "default_account" is non-NULL would be sufficient to ensure a user has a record in the accounting database: "if not job_desc.default_account then" could replace the "getent" call and "sacctmgr" calls that are running on every job submission.

You'd still need to shell out for adding them in Lua (in a C plugin you'd be able to do this directly).
Comment 3 Tim Wickberg 2015-11-24 01:38:04 MST
Ben -

I'm going to go ahead and mark this as Resolved/Infogiven for now. If you have any further questions on this please re-open.

cheers,
- Tim
Comment 4 Ben Fitzpatrick 2015-11-25 23:06:43 MST
Thanks Tim, I'll try the default_account field. This looks like it wouldn't add much overhead, as we wouldn't need to shell out very often. Thanks for your help.

Other points:

- We are still on the 14.11 release, but plan to update.

- We tried all the algorithms, but all gave non-optimal numbers, including fair tree. Fair tree sets all equal-usage users to the same fair share rank, but still increments the rank for each user. This means that for 4 active users and 1996 inactive users, our users' rank in order of increasing usage goes something like this: 2000, 2000, 2000, 2000, 2000, ..., 2000, 4, 3, 2, 1. Fair share scores will go like this: 1, 1, 1, 1, 1, ..., 1, 4/2000, 3/2000, 2/2000, 1/2000 - here's an example with the names changed:

  User*    |   Usage  | Fair Share
telcontar  |        0 | 1.000000 
grey       |        0 | 1.000000
...
bombadil   |        0 | 1.000000
brandybuck |   533128 | 0.001016 
took       |   839251 | 0.000762 
gamgee     |  1043430 | 0.000508 
baggins    | 12094456 | 0.000254

(* Association, really)

These small gaps between the active users (compared to large gaps between them and the inactive users) makes it very difficult to set weights with a competing priority mechanism such as job age (e.g. trying to balance a newly submitted high fair share job with a longstanding low fair share job).

We're now using more-or-less pure fair share, so this problem matters less to us.
Comment 5 Tim Wickberg 2015-11-26 04:30:57 MST
On 11/26/2015 08:06 AM, bugs@schedmd.com wrote:
> - We are still on the 14.11 release, but plan to update.

Let me know if you need any specific advise on this; the update should 
be fairly straightforward.

There were some changes to the database layouts that can take some time 
to process - I'd recommend updating slurmdbd first - potentially a few 
days ahead of updating slurmctld + slurmd's across the cluster - to give 
it a chance to finish first. 15.08.4 should be stable at this point - 
there aren't any major issues we're currently tracking unlike those that 
led to the .3 then .4 releases in quick succession.


> These small gaps between the active users (compared to large gaps between them
> and the inactive users) makes it very difficult to set weights with a competing
> priority mechanism such as job age (e.g. trying to balance a newly submitted
> high fair share job with a longstanding low fair share job).
>
> We're now using more-or-less pure fair share, so this problem matters less to
> us.

The final calculated priority value is a 32-bit unsigned int, thus can 
go up to 4-billion and change. If you want some of the granularity back 
you can set PriorityWeightFairshare to 100,000,000 and still be well 
within that range. If your users are used to looking at the values it 
may be somewhat odd to see, but the priority only matters with respect 
to other priority values within your cluster anyways.

If you've no other open questions, I'll go ahead and mark this as 
Resolved/Infogiven when at work next week.

- Tim