Ticket 5782

Summary: sbatch with 2000 srun fails jobs
Product: Slurm Reporter: Sohrab <sohrab1982>
Component: slurmdAssignee: Jacob Jenson <jacob>
Status: RESOLVED INVALID QA Contact:
Severity: 6 - No support contract    
Priority: ---    
Version: - Unsupported Older Versions   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: error log

Description Sohrab 2018-09-27 05:16:05 MDT
Created attachment 7905 [details]
error log

Hi,

I am a batch script which calls srun 2000 times. srun commands simply call a python script with a 2 minutes sleep and a print command. Some of the jobs finish and some dont (pretty random) .I get errors as attached!

Best Regards,
Sohrab
Comment 1 Jacob Jenson 2018-09-27 08:56:55 MDT
Sohrab,

Could you please tell me which site you are from? Our system couldn't match
your gmail address with a Slurm support contract. Once we know which site
you are from we can either route this ticket to the SchedMD Slurm support
team if a current contract exists or discuss Slurm support options.

Thanks,
Jacob

On Thu, Sep 27, 2018 at 5:16 AM <bugs@schedmd.com> wrote:

> Site -Other-
> Bug ID 5782 <https://bugs.schedmd.com/show_bug.cgi?id=5782>
> Summary sbatch with 2000 srun fails jobs
> Product Slurm
> Version - Unsupported Older Versions
> Hardware Linux
> OS Linux
> Status UNCONFIRMED
> Severity 6 - No support contract
> Priority ---
> Component slurmd
> Assignee jacob@schedmd.com
> Reporter sohrab1982@gmail.com
>
> Created attachment 7905 [details] <https://bugs.schedmd.com/attachment.cgi?id=7905> [details] <https://bugs.schedmd.com/attachment.cgi?id=7905&action=edit>
> error log
>
> Hi,
>
> I am a batch script which calls srun 2000 times. srun commands simply call a
> python script with a 2 minutes sleep and a print command. Some of the jobs
> finish and some dont (pretty random) .I get errors as attached!
>
> Best Regards,
> Sohrab
>
> ------------------------------
> You are receiving this mail because:
>
>    - You are the assignee for the bug.
>
>