Ticket 14061

Summary: Slurm torque wrapper not submitting the jobs to scheduler
Product: Slurm Reporter: Manikanta_Eluri
Component: SchedulingAssignee: Marshall Garey <marshall>
Status: RESOLVED DUPLICATE QA Contact:
Severity: 2 - High Impact    
Priority: --- CC: jacob, jess
Version: - Unsupported Older Versions   
Hardware: Linux   
OS: Linux   
Site: AMAT Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: image001.png
slurm1.zip

Description Manikanta_Eluri 2022-05-12 10:14:40 MDT
As synopsys tcad doesn't support SLURm natively, we use these Slurm Torue warppers and perl api to pass the jobs submitted to torque will be passed to Slurm.

But its not working a expected. seems the libraries are loading fine. Please assist.

Thanks,
Mani
Comment 2 Manikanta_Eluri 2022-05-13 11:33:26 MDT
Hello Team - Following up to see if there is any update.
Thanks,
Mani
Comment 3 Jason Booth 2022-05-13 13:36:48 MDT
Mani, our system is having trouble associating you with a valid support contract. AMAT has a list of designated technical contacts. One of our account representative has reached out to the site to validate this issue. 

To avoid a delay with support, it would be helpful to have one of the technical contact from AMAT reach out to us and validate the issue logged here.
Comment 4 Manikanta_Eluri 2022-05-16 12:30:40 MDT
Hi Jacob/Jess,

I am Mani from HPC app support team and I have raised a subjected bug with Slurm Support but looks like it needs confirmation from you.

Please let me know if any other information is needed from me and help expedite the resolution.

Thank you.

Regards,
Manikanta Eluri
HPC-Apps | Applied Materials
E-mail : HPC_Unified_Support@amat.com<mailto:HPC_Unified_Support@amat.com>

From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Friday, May 13, 2022 12:37 PM
To: Manikanta Eluri --CNTR <Manikanta_Eluri@contractor.amat.com>
Subject: [EXTERNAL] [Bug 14061] Slurm torque wrapper not submitting the jobs to scheduler

Jason Booth changed bug 14061 What Removed Added CC ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍

Jason Booth<mailto:jbooth@schedmd.com> changed bug 14061<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061__;!!NH8t9uXaRvxizNEf!VwwO6V7Njf0k1G73h8HTZ1j-PRktOMmdRb0wAun-Hg-8H3ysIuq5XyvdFCzbLYa4wTObtv5SQVUvSNTnT5cWKeuH$>
What
Removed
Added
CC

jacob@schedmd.com<mailto:jacob@schedmd.com>, jess@schedmd.com<mailto:jess@schedmd.com>
Comment # 3<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061*c3__;Iw!!NH8t9uXaRvxizNEf!VwwO6V7Njf0k1G73h8HTZ1j-PRktOMmdRb0wAun-Hg-8H3ysIuq5XyvdFCzbLYa4wTObtv5SQVUvSNTnT1UlMLOx$> on bug 14061<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061__;!!NH8t9uXaRvxizNEf!VwwO6V7Njf0k1G73h8HTZ1j-PRktOMmdRb0wAun-Hg-8H3ysIuq5XyvdFCzbLYa4wTObtv5SQVUvSNTnT5cWKeuH$> from Jason Booth<mailto:jbooth@schedmd.com>

Mani, our system is having trouble associating you with a valid support

contract. AMAT has a list of designated technical contacts. One of our account

representative has reached out to the site to validate this issue.



To avoid a delay with support, it would be helpful to have one of the technical

contact from AMAT reach out to us and validate the issue logged here.

________________________________
You are receiving this mail because:

  *   You reported the bug.

The content of this message is APPLIED MATERIALS CONFIDENTIAL. If you are not the intended recipient, please notify me, delete this email and do not use or distribute this email.
Comment 5 Manikanta_Eluri 2022-05-16 12:37:13 MDT
<<Looping Souvik and Shraddha >>
From: Manikanta Eluri --CNTR
Sent: Monday, May 16, 2022 11:31 AM
To: jacob@schedmd.com; jess@schedmd.com
Cc: bugs@schedmd.com
Subject: RE: [EXTERNAL] [Bug 14061] Slurm torque wrapper not submitting the jobs to scheduler

Hi Jacob/Jess,

I am Mani from HPC app support team and I have raised a subjected bug with Slurm Support but looks like it needs confirmation from you.

Please let me know if any other information is needed from me and help expedite the resolution.

Thank you.

Regards,
Manikanta Eluri
HPC-Apps | Applied Materials
E-mail : HPC_Unified_Support@amat.com<mailto:HPC_Unified_Support@amat.com>

From: bugs@schedmd.com<mailto:bugs@schedmd.com> <bugs@schedmd.com<mailto:bugs@schedmd.com>>
Sent: Friday, May 13, 2022 12:37 PM
To: Manikanta Eluri --CNTR <Manikanta_Eluri@contractor.amat.com<mailto:Manikanta_Eluri@contractor.amat.com>>
Subject: [EXTERNAL] [Bug 14061] Slurm torque wrapper not submitting the jobs to scheduler

Jason Booth changed bug 14061 What Removed Added CC ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍

Jason Booth<mailto:jbooth@schedmd.com> changed bug 14061<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061__;!!NH8t9uXaRvxizNEf!VwwO6V7Njf0k1G73h8HTZ1j-PRktOMmdRb0wAun-Hg-8H3ysIuq5XyvdFCzbLYa4wTObtv5SQVUvSNTnT5cWKeuH$>
What
Removed
Added
CC

jacob@schedmd.com<mailto:jacob@schedmd.com>, jess@schedmd.com<mailto:jess@schedmd.com>
Comment # 3<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061*c3__;Iw!!NH8t9uXaRvxizNEf!VwwO6V7Njf0k1G73h8HTZ1j-PRktOMmdRb0wAun-Hg-8H3ysIuq5XyvdFCzbLYa4wTObtv5SQVUvSNTnT1UlMLOx$> on bug 14061<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061__;!!NH8t9uXaRvxizNEf!VwwO6V7Njf0k1G73h8HTZ1j-PRktOMmdRb0wAun-Hg-8H3ysIuq5XyvdFCzbLYa4wTObtv5SQVUvSNTnT5cWKeuH$> from Jason Booth<mailto:jbooth@schedmd.com>

Mani, our system is having trouble associating you with a valid support

contract. AMAT has a list of designated technical contacts. One of our account

representative has reached out to the site to validate this issue.



To avoid a delay with support, it would be helpful to have one of the technical

contact from AMAT reach out to us and validate the issue logged here.

________________________________
You are receiving this mail because:

  *   You reported the bug.

The content of this message is APPLIED MATERIALS CONFIDENTIAL. If you are not the intended recipient, please notify me, delete this email and do not use or distribute this email.
Comment 6 Jason Booth 2022-05-16 14:04:54 MDT
Hi Mani - Jess has authorized us to look into this issue while we sort out the issue with your authorized contacts. I will have Marshall follow up with you shortly. In the meantime, would you please past the errors you are seeing and any options you are using with the wrapper?
Comment 7 Manikanta_Eluri 2022-05-16 14:08:30 MDT
Created attachment 25043 [details]
image001.png

Hello Jason,

Thnak you. Since TCAD doesn’t support SLURM directly, but supports Torque. So we use these wrappers to submit the jobs. Below is the error we are getting now:

[cid:image001.png@01D86926.054DD410]


Thank you.

Regards,
Manikanta Eluri
HPC-Apps | Applied Materials
E-mail : HPC_Unified_Support@amat.com<mailto:HPC_Unified_Support@amat.com>

From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Monday, May 16, 2022 1:05 PM
To: Manikanta Eluri --CNTR <Manikanta_Eluri@contractor.amat.com>
Subject: [EXTERNAL] [Bug 14061] Slurm torque wrapper not submitting the jobs to scheduler

Jason Booth changed bug 14061 What Removed Added Assignee jbooth@schedmd.com<mailto:jbooth@schedmd.com> ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍

Jason Booth<mailto:jbooth@schedmd.com> changed bug 14061<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061__;!!NH8t9uXaRvxizNEf!RkNYbY69-wjUQWJAubGrCX9YxKXmlX1u3YtSPT0LDd6n33tjx4iHas5C_6GprXuAEd_ZJl9g7uqOGH2X-3uCXdnT$>
What
Removed
Added
Assignee
jbooth@schedmd.com<mailto:jbooth@schedmd.com>
marshall@schedmd.com<mailto:marshall@schedmd.com>
Comment # 6<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061*c6__;Iw!!NH8t9uXaRvxizNEf!RkNYbY69-wjUQWJAubGrCX9YxKXmlX1u3YtSPT0LDd6n33tjx4iHas5C_6GprXuAEd_ZJl9g7uqOGH2X-x6QWyds$> on bug 14061<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061__;!!NH8t9uXaRvxizNEf!RkNYbY69-wjUQWJAubGrCX9YxKXmlX1u3YtSPT0LDd6n33tjx4iHas5C_6GprXuAEd_ZJl9g7uqOGH2X-3uCXdnT$> from Jason Booth<mailto:jbooth@schedmd.com>

Hi Mani - Jess has authorized us to look into this issue while we sort out the

issue with your authorized contacts. I will have Marshall follow up with you

shortly. In the meantime, would you please past the errors you are seeing and

any options you are using with the wrapper?

________________________________
You are receiving this mail because:

  *   You reported the bug.

The content of this message is APPLIED MATERIALS CONFIDENTIAL. If you are not the intended recipient, please notify me, delete this email and do not use or distribute this email.
Comment 8 Marshall Garey 2022-05-16 14:41:09 MDT
Thanks. Can you also upload your slurm.conf file?
Comment 9 Manikanta_Eluri 2022-05-16 15:19:17 MDT
Created attachment 25046 [details]
slurm1.zip

Hello Team,
PFA..

Thank you.

Regards,
Manikanta Eluri
HPC-Apps | Applied Materials
E-mail : HPC_Unified_Support@amat.com<mailto:HPC_Unified_Support@amat.com>

From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Monday, May 16, 2022 1:41 PM
To: Manikanta Eluri --CNTR <Manikanta_Eluri@contractor.amat.com>
Subject: [EXTERNAL] [Bug 14061] Slurm torque wrapper not submitting the jobs to scheduler

Comment # 8 on bug 14061 from Marshall Garey Thanks. Can you also upload your slurm.conf file? You are receiving this mail because: ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍

Comment # 8<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061*c8__;Iw!!NH8t9uXaRvxizNEf!TjE6ctsSywsg3jcwQnvzeJUSq8gIeUgfYFvPwQYdPavDxfuC9IAWAKdwJChebgo4X1e5X0Z-ZtS-o6xbL0NceWCK$> on bug 14061<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061__;!!NH8t9uXaRvxizNEf!TjE6ctsSywsg3jcwQnvzeJUSq8gIeUgfYFvPwQYdPavDxfuC9IAWAKdwJChebgo4X1e5X0Z-ZtS-o6xbL9k_ctBm$> from Marshall Garey<mailto:marshall@schedmd.com>

Thanks. Can you also upload your slurm.conf file?

________________________________
You are receiving this mail because:

  *   You reported the bug.

The content of this message is APPLIED MATERIALS CONFIDENTIAL. If you are not the intended recipient, please notify me, delete this email and do not use or distribute this email.
Comment 10 Marshall Garey 2022-05-16 16:19:51 MDT
Thanks.

I cannot reproduce this. As far as I can tell, this error does not come from Slurm. Is tcad generating this error?

> Multithreaded jobs are not supported on Torque scheduler.

You will need to identify what tcad does not like (in this case, some option related to multithreading).

What is your reproducer? Can you reproduce this without using the tcad tool?

For example, I just ran this successfully:

$  cat script.sh
#!/bin/sh
hostname
$ qsub sript.sh
757
$ cat script.sh.o757 
voyager
Comment 11 Manikanta_Eluri 2022-05-16 17:15:06 MDT
Hello Team,

It’s the error pops up when submitting the jobs to Torque via tcad. I am not sure if this is specific to tcad or Torque itself, but with the error it sounded like its related to  Torque which doesn’t support multi threading.

Is multithreading supported by default in torque?

Thank you.

Regards,
Manikanta Eluri
HPC-Apps | Applied Materials
E-mail : HPC_Unified_Support@amat.com<mailto:HPC_Unified_Support@amat.com>

From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Monday, May 16, 2022 3:20 PM
To: Manikanta Eluri --CNTR <Manikanta_Eluri@contractor.amat.com>
Subject: [EXTERNAL] [Bug 14061] Slurm torque wrapper not submitting the jobs to scheduler

Comment # 10 on bug 14061 from Marshall Garey Thanks. I cannot reproduce this. As far as I can tell, this error does not come from Slurm. Is tcad generating this error? ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍

Comment # 10<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061*c10__;Iw!!NH8t9uXaRvxizNEf!SaUViGIuWa0TKs7kwvHVlyyNc-xf9gH4K_64tuWin3n0kqbggL6ZpbBnsvtilF51hV2hiwA6hl_gOJL6vZkdQ-fA$> on bug 14061<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061__;!!NH8t9uXaRvxizNEf!SaUViGIuWa0TKs7kwvHVlyyNc-xf9gH4K_64tuWin3n0kqbggL6ZpbBnsvtilF51hV2hiwA6hl_gOJL6vRKYWvWr$> from Marshall Garey<mailto:marshall@schedmd.com>

Thanks.



I cannot reproduce this. As far as I can tell, this error does not come from

Slurm. Is tcad generating this error?



> Multithreaded jobs are not supported on Torque scheduler.



You will need to identify what tcad does not like (in this case, some option

related to multithreading).



What is your reproducer? Can you reproduce this without using the tcad tool?



For example, I just ran this successfully:



$  cat script.sh

#!/bin/sh

hostname

$ qsub sript.sh

757

$ cat script.sh.o757

voyager

________________________________
You are receiving this mail because:

  *   You reported the bug.

The content of this message is APPLIED MATERIALS CONFIDENTIAL. If you are not the intended recipient, please notify me, delete this email and do not use or distribute this email.
Comment 12 Jason Booth 2022-05-17 10:05:53 MDT
Mani -

> It’s the error pops up when submitting the jobs to Torque via tcad. I am not 
> sure if this is specific to tcad or Torque itself, but with the error it sounded 
> like its related to  Torque which doesn’t support multi threading.
>
> Is multithreading supported by default in torque?

This error is not something that is part of the Slurm codebase or the wrapper scripts. Can you send us the following:

1. What switches are being called with qsub.
2. Please let us know what server tcad is configured against.
3. Please verify that tcad is calling the qsub wrapper and not a Linux binary file.
> $ file /path/to/torque/qsub.pl
> /path/to/torque/qsub.pl: Perl script text executable
4. Please also upload the slurmctld.log from the server that spans the time when these jobs are being submitted.
Comment 13 Marshall Garey 2022-05-17 16:12:34 MDT
I'm marking this as a duplicate of bug 14088, which was opened by your colleague. We'll continue the discussion on 14088.

*** This ticket has been marked as a duplicate of ticket 14088 ***