Low-priority support request: We have an increasing number of Jenkins-managed tasks that require some level of integration with Slurm. This includes both admin- and user-managed tasks. For example a software build that needs to take place on a compute node. Looking around the world, this appears to be an increasingly common integration in HPCland. So I was surprised to come up empty with a quick google search for an open source Slurm plugin for Jenkins. Has the SchedMD team come across any interesting or compelling patterns or approaches for this? I.e. something beyond "ssh to a login node, issue a sbatch syscall, and parse the output or return values". Slurm has a C API with a "slurm_submit_batch_job" method, but I'd guess this needs to be invoked from a login node, so it would only be part of a solution. Also, we're not in the habit of writing Jenkins jobs or integration code in C, or thinking about things like memleak either :)
If possible, please copy jones.clyde@gene.com on any responses to this ticket as well.
(In reply to Slaton Lipscomb from comment #1) > If possible, please copy jones.clyde@gene.com on any responses to this > ticket as well. He'd need to setup an account within our Bugzilla instance; once that's done he can add himself as a CC, or you can do it as well.
> Has the SchedMD team come across any interesting or compelling patterns or > approaches for this? I've seen a few people run this way, but don't know of any that have published their scripts. > I.e. something beyond "ssh to a login node, issue a sbatch syscall, and > parse the output or return values". I'd suggest using or srun to launch the tasks directly (possibly through SSH to a login node) - this way you'd get status back into Jenkins immediately, rather than needing to build something to monitor the job status. Unless you're willing to trust your Jenkins systems with the MUNGE credential, you'd likely need to > Slurm has a C API with a "slurm_submit_batch_job" method, but I'd guess this > needs to be invoked from a login node, so it would only be part of a > solution. Also, we're not in the habit of writing Jenkins jobs or > integration code in C, or thinking about things like memleak either :) You could certainly do it that way as well, although I wouldn't suggest trying to replace the srun command that way, there are a lot of tricky pieces related to shipping I/O around within the cluster that are hard to get right. But if you were just submitting jobs, then needing to query their status, coding against that shouldn't be too difficult.
Marking this as resolved/infogiven. Please reopen if you have any further questions. - Tim