Ticket 5782 - sbatch with 2000 srun fails jobs
Summary: sbatch with 2000 srun fails jobs
Status: RESOLVED INVALID
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmd (show other tickets)
Version: - Unsupported Older Versions
Hardware: Linux Linux
: 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2018-09-27 05:16 MDT by Sohrab
Modified: 2018-09-27 08:56 MDT (History)
0 users

See Also:
Site: -Other-
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
error log (322.75 KB, text/plain)
2018-09-27 05:16 MDT, Sohrab
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Sohrab 2018-09-27 05:16:05 MDT
Created attachment 7905 [details]
error log

Hi,

I am a batch script which calls srun 2000 times. srun commands simply call a python script with a 2 minutes sleep and a print command. Some of the jobs finish and some dont (pretty random) .I get errors as attached!

Best Regards,
Sohrab
Comment 1 Jacob Jenson 2018-09-27 08:56:55 MDT
Sohrab,

Could you please tell me which site you are from? Our system couldn't match
your gmail address with a Slurm support contract. Once we know which site
you are from we can either route this ticket to the SchedMD Slurm support
team if a current contract exists or discuss Slurm support options.

Thanks,
Jacob

On Thu, Sep 27, 2018 at 5:16 AM <bugs@schedmd.com> wrote:

> Site -Other-
> Bug ID 5782 <https://bugs.schedmd.com/show_bug.cgi?id=5782>
> Summary sbatch with 2000 srun fails jobs
> Product Slurm
> Version - Unsupported Older Versions
> Hardware Linux
> OS Linux
> Status UNCONFIRMED
> Severity 6 - No support contract
> Priority ---
> Component slurmd
> Assignee jacob@schedmd.com
> Reporter sohrab1982@gmail.com
>
> Created attachment 7905 [details] <https://bugs.schedmd.com/attachment.cgi?id=7905> [details] <https://bugs.schedmd.com/attachment.cgi?id=7905&action=edit>
> error log
>
> Hi,
>
> I am a batch script which calls srun 2000 times. srun commands simply call a
> python script with a 2 minutes sleep and a print command. Some of the jobs
> finish and some dont (pretty random) .I get errors as attached!
>
> Best Regards,
> Sohrab
>
> ------------------------------
> You are receiving this mail because:
>
>    - You are the assignee for the bug.
>
>