I have jobs submitted and running on all clusters in the federation, is there a way to remove my jobs from a single cluster without having to list them individually?
Do you want to cancel or requeue these jobs?
cancel. It appear that I can cancel all or one at a time, but not at the cluster level.
scancel has a cluster option: "-M or --clusters=", which would send the scancel to only that cluster, then you could add the username option to cancel all of your jobs.
I used that, but it cancelled all jobs, not just the cluster I requested.
To be more clear, I used: scancel -u eckert -M icrm1 and it removed jobs from all the clusters.
(In reply to Phil Eckert from comment #4) > I used that, but it cancelled all jobs, not just the cluster I requested. OK. I'm working on this and will get back to you as soon as possible.
A guest, because I like to guess ;-) would be that the scancel might be working from the VIABLE_SIBLINGS list when canceling, rather than the ACTIVE_SIBLINGS Another thought, if a job is not active I would think that the scancel would only act on jobs on the origin host, since I'm sure this is going to be confusing to users as it is. I'm still fairly convinced that jobs should only be vialbe on the host they are submitted on, and that if a user desires multiple hosts, they would request them. /g/g0/eckert[22] squeue JOBID CLUSTER ST ORIGIN VIABLE_SIBLINGS ACTIVE_SIBLINGS TIME NODES REASON NODELIST 67108914 icrm3 PD icrm1 icrm1,icrm2,icrm3 NA 0:00 20 Resources 67108915 icrm3 PD icrm1 icrm1,icrm2,icrm3 NA 0:00 20 Priority 67108916 icrm3 PD icrm1 icrm1,icrm2,icrm3 NA 0:00 20 Priority 67108910 icrm2 R icrm1 icrm1,icrm2,icrm3 icrm2 1:44 20 None icrm-2-host[1-20] 67108913 icrm3 R icrm1 icrm1,icrm2,icrm3 icrm3 1:41 20 None icrm-3-host[21-40] 67108912 icrm3 R icrm1 icrm1,icrm2,icrm3 icrm3 1:42 20 None icrm-3-host[1-20] 67108911 icrm1 R icrm1 icrm1,icrm2,icrm3 icrm1 1:42 20 None icrm-1-host[21-40] 67108909 icrm1 R icrm1 icrm1,icrm2,icrm3 icrm1 1:45 20 None icrm-1-host[1-20]
So, looking at test37.10 in the testsuite, it seems that you are experiencing the expected behavior, since scancel's are propagated throughout the federation, regardless of which cluster you send them to. I will research further as to whether this behavior should be modified and/or if there is an alternative, preferable method to cancel jobs only on targeted clusters. I'll take look at your suggestions and respond to those as I am able.
Here is what I have been told: The federation was designed to connect multiple homogeneous clusters together and make it feel largely like one cluster. Each cluster independently schedules each of the sibling jobs, coordinating with the origin cluster. When jobs are submitted to the federation, the origin cluster (the cluster that receives the job submit request) submits sibling jobs to each viable cluster — where viable clusters are: (all clusters in the federation || the subset of clusters requested with --clusters and/or --cluster-constraint). The active clusters are the clusters that have an actual job (a viable cluster could have rejected the job and thus not be an active cluster). Currently only the origin job knows about the active siblings (you can see this by doing squeue from the origin cluster). If the job is revoked on the origin cluster (meaning it isn't a viable cluster or the federated job was started on a sibling cluster) you can see it with the -a option with an squeue to the origin cluster (or an squeue -a --sibling from any cluster in the federation). The origin job needs to stay around to handle requests like starting, updates, cancellations, etc. to the federated job. Even though the clusters each have a copy of the federated job, they act as one because they are tied to the origin job on the cluster. So if you scancel a federated job all of them are removed. The scancel --sibling= option removes the job from the active siblings. If the job is requeued it would still be eligible to run on the viable siblings. If you want to modify the active siblings you can use "scontrol update jobid=<jobid> clusters=<clusters>" or "scontrol update clusterfeatures=<features>" (test37.6). I hope this clarifies how the federation was designed to work. I am going to mark this ticket as resolved, seeing as your question has been addressed and no immediate action is going to be taken, though this feature may be added in the future. Should you have anything further, feel free to post it here and we will respond.
Unfortunately, the way the federation works is very contrary to how we operate. We have many cluster, with different capabilities, different users with different accounts. We previously had the Moab grid installed, which jobs would run on the cluster they were submitted on, unless a list of clusters were provided, of which the submission host was not considered unless it was in the list. I mention Moab, only because that is what our users were accustomed to. Something that I think would make the federation more friendly to our needs would be for a job submitted without a cluster option or list to only be considered for the submitting host, and then we would have expected behavior. Also, a point that I think important regardless, is how to deal with a a cluster in the federation going down or becoming unavailable. If a federation setup is viewed as a "single" cluster then a user may have no idea which individual cluster in the larger cluster their job is running on. If that individual cluster goes down, the user will not be able to discern what has happened since the "federation cluster" is still up. Somewhere there needs to be a means of finding out the "last know state" of the jobs, to avoid the confusion that could be caused.
Squeue can be formatted to show which cluster a job is running on: when a job is running, the cluster named in the active siblings column is the cluster running that job. I am unfamiliar with the mechanisms (if they exist) that are in place to handle a federated cluster failure, but I will do some research and get you an answer as I am able. The federation is the result of sponsored development aimed at satisfying a specific set of needs. Having met those, we are open to enhancement requests. I will reopen this bug as a severity 5.
(In reply to Phil Eckert from comment #11) > Unfortunately, the way the federation works is very contrary to how we > operate. We have many cluster, with different capabilities, different users > with different accounts. We previously had the Moab grid installed, which > jobs would run on the cluster they were submitted on, unless a list of > clusters were provided, of which the submission host was not considered > unless it was in the list. I mention Moab, only because that is what our > users were accustomed to. > > Something that I think would make the federation more friendly to our needs > would be for a job submitted without a cluster option or list to only be > considered for the submitting host, and then we would have expected behavior. This might be accomplished with the job_submit plugin. You could catch any job without an explicit cluster list and add the -M or --clusters=<submission host> option to them, allowing them to run on only that cluster.
While we could write a plugin, it would be more manageable to have a slurm.conf setting (hint hint). That way it would be part of the slurm.conf documentation, I say this because I believe that more sites than just LLNL would desire this option.
Understood. The bug has been reopened as an enhancement request and will be addressed at some future point. Until then I hope the plugin solution proves sufficient.
For the lua plugin, ould you please tell me which structure elements contain the "--cluster=<name> and -M <name>" data. Thank you Phil
e.g. function slurm_job_submit(job_desc, part_list, submit_uid) if job_desc ~= nil and job_desc.clusters ~= nil then slurm.log_user("Clusters: " .. job_desc.clusters .. "\n"); end return slurm.SUCCESS end c1$ sbatch --wrap="hostname" -Mc1,c2 sbatch: Clusters: c1,c2 Submitted batch job 67317899 on cluster c1
I thought I sent this yesterday, but I am not seeing it, How I determine the cluster name using lua?"
I don't see it exported to the lua interface. But since script is being run by the controller, it could be hard coded in the lua script to the cluster associated with the script.