Summary: | GrpSubmitJobs and job arrays | ||
---|---|---|---|
Product: | Slurm | Reporter: | Ryan Cox <ryan_cox> |
Component: | Limits | Assignee: | Unassigned Developer <dev-unassigned> |
Status: | OPEN --- | QA Contact: | |
Severity: | 5 - Enhancement | ||
Priority: | --- | ||
Version: | 14.11.4 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | BYU - Brigham Young University | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Tzag Elita Sites: | --- |
Linux Distro: | --- | Machine Name: | |
CLE Version: | Version Fixed: | ||
Target Release: | --- | DevPrio: | --- |
Emory-Cloud Sites: | --- |
Description
Ryan Cox
2015-05-12 08:48:22 MDT
I agree that we probably need better controls with job arrays, but there are some issues to consider. Here's a good example of a problem. Say a user submits a job array with 1000 tasks - that's all one job record. Then say the user changes the time limit on each odd numbered task in the array, which is simple with a regular expression. Suddenly we've got an extra 500 job records. (Each changed element in the job array spits out a new job record). We wouldn't want to partly perform the update request. Should we just reject the entire request or perhaps force the user to operate on the whole job array as a single entity (that would not create new job records as I recall). That's a very good example and I didn't realize that it's even an option. Personally I would be okay denying that request but I imagine that someone is interested in allowing that behavior. I'm not quite sure what to do about it but I can see how that job update would really mess things up. My simple idea doesn't seem so simple anymore... (In reply to Ryan Cox from comment #2) > That's a very good example and I didn't realize that it's even an option. > Personally I would be okay denying that request but I imagine that someone > is interested in allowing that behavior. I'm not quite sure what to do > about it but I can see how that job update would really mess things up. My > simple idea doesn't seem so simple anymore... Partial job array updates are the only serious problem that I can think off offhand, and I suspect it's a rare event. I would be okay with making updates an all-or-nothing operation for the pending portion of an array. Someone somewhere might depend on partial array updates but it doesn't make much sense to me. The point of an array job is that it's homogeneous, right? If partial updates are blocked, I assume that the entire pending portion of an array could still be updated even with some tasks running already? I don't see why not since the running jobs would have separate job records at that point. (In reply to Ryan Cox from comment #4) > I would be okay with making updates an all-or-nothing operation for the > pending portion of an array. Someone somewhere might depend on partial > array updates but it doesn't make much sense to me. The point of an array > job is that it's homogeneous, right? You might think so, but that is definitely not the mode of operation at some sites. One site in particular manages each task in a job array very much independently and they work with really large job arrays. > If partial updates are blocked, I > assume that the entire pending portion of an array could still be updated > even with some tasks running already? I don't see why not since the running > jobs would have separate job records at that point. That's not a problem. |