| Summary: | Slurm torque wrapper not submitting the jobs to scheduler | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Manikanta_Eluri |
| Component: | Scheduling | Assignee: | Marshall Garey <marshall> |
| Status: | RESOLVED DUPLICATE | QA Contact: | |
| Severity: | 2 - High Impact | ||
| Priority: | --- | CC: | jacob, jess |
| Version: | - Unsupported Older Versions | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | AMAT | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: |
image001.png
slurm1.zip |
||
|
Description
Manikanta_Eluri
2022-05-12 10:14:40 MDT
Hello Team - Following up to see if there is any update. Thanks, Mani Mani, our system is having trouble associating you with a valid support contract. AMAT has a list of designated technical contacts. One of our account representative has reached out to the site to validate this issue. To avoid a delay with support, it would be helpful to have one of the technical contact from AMAT reach out to us and validate the issue logged here. Hi Jacob/Jess, I am Mani from HPC app support team and I have raised a subjected bug with Slurm Support but looks like it needs confirmation from you. Please let me know if any other information is needed from me and help expedite the resolution. Thank you. Regards, Manikanta Eluri HPC-Apps | Applied Materials E-mail : HPC_Unified_Support@amat.com<mailto:HPC_Unified_Support@amat.com> From: bugs@schedmd.com <bugs@schedmd.com> Sent: Friday, May 13, 2022 12:37 PM To: Manikanta Eluri --CNTR <Manikanta_Eluri@contractor.amat.com> Subject: [EXTERNAL] [Bug 14061] Slurm torque wrapper not submitting the jobs to scheduler Jason Booth changed bug 14061 What Removed Added CC Jason Booth<mailto:jbooth@schedmd.com> changed bug 14061<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061__;!!NH8t9uXaRvxizNEf!VwwO6V7Njf0k1G73h8HTZ1j-PRktOMmdRb0wAun-Hg-8H3ysIuq5XyvdFCzbLYa4wTObtv5SQVUvSNTnT5cWKeuH$> What Removed Added CC jacob@schedmd.com<mailto:jacob@schedmd.com>, jess@schedmd.com<mailto:jess@schedmd.com> Comment # 3<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061*c3__;Iw!!NH8t9uXaRvxizNEf!VwwO6V7Njf0k1G73h8HTZ1j-PRktOMmdRb0wAun-Hg-8H3ysIuq5XyvdFCzbLYa4wTObtv5SQVUvSNTnT1UlMLOx$> on bug 14061<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061__;!!NH8t9uXaRvxizNEf!VwwO6V7Njf0k1G73h8HTZ1j-PRktOMmdRb0wAun-Hg-8H3ysIuq5XyvdFCzbLYa4wTObtv5SQVUvSNTnT5cWKeuH$> from Jason Booth<mailto:jbooth@schedmd.com> Mani, our system is having trouble associating you with a valid support contract. AMAT has a list of designated technical contacts. One of our account representative has reached out to the site to validate this issue. To avoid a delay with support, it would be helpful to have one of the technical contact from AMAT reach out to us and validate the issue logged here. ________________________________ You are receiving this mail because: * You reported the bug. The content of this message is APPLIED MATERIALS CONFIDENTIAL. If you are not the intended recipient, please notify me, delete this email and do not use or distribute this email. <<Looping Souvik and Shraddha >> From: Manikanta Eluri --CNTR Sent: Monday, May 16, 2022 11:31 AM To: jacob@schedmd.com; jess@schedmd.com Cc: bugs@schedmd.com Subject: RE: [EXTERNAL] [Bug 14061] Slurm torque wrapper not submitting the jobs to scheduler Hi Jacob/Jess, I am Mani from HPC app support team and I have raised a subjected bug with Slurm Support but looks like it needs confirmation from you. Please let me know if any other information is needed from me and help expedite the resolution. Thank you. Regards, Manikanta Eluri HPC-Apps | Applied Materials E-mail : HPC_Unified_Support@amat.com<mailto:HPC_Unified_Support@amat.com> From: bugs@schedmd.com<mailto:bugs@schedmd.com> <bugs@schedmd.com<mailto:bugs@schedmd.com>> Sent: Friday, May 13, 2022 12:37 PM To: Manikanta Eluri --CNTR <Manikanta_Eluri@contractor.amat.com<mailto:Manikanta_Eluri@contractor.amat.com>> Subject: [EXTERNAL] [Bug 14061] Slurm torque wrapper not submitting the jobs to scheduler Jason Booth changed bug 14061 What Removed Added CC Jason Booth<mailto:jbooth@schedmd.com> changed bug 14061<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061__;!!NH8t9uXaRvxizNEf!VwwO6V7Njf0k1G73h8HTZ1j-PRktOMmdRb0wAun-Hg-8H3ysIuq5XyvdFCzbLYa4wTObtv5SQVUvSNTnT5cWKeuH$> What Removed Added CC jacob@schedmd.com<mailto:jacob@schedmd.com>, jess@schedmd.com<mailto:jess@schedmd.com> Comment # 3<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061*c3__;Iw!!NH8t9uXaRvxizNEf!VwwO6V7Njf0k1G73h8HTZ1j-PRktOMmdRb0wAun-Hg-8H3ysIuq5XyvdFCzbLYa4wTObtv5SQVUvSNTnT1UlMLOx$> on bug 14061<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061__;!!NH8t9uXaRvxizNEf!VwwO6V7Njf0k1G73h8HTZ1j-PRktOMmdRb0wAun-Hg-8H3ysIuq5XyvdFCzbLYa4wTObtv5SQVUvSNTnT5cWKeuH$> from Jason Booth<mailto:jbooth@schedmd.com> Mani, our system is having trouble associating you with a valid support contract. AMAT has a list of designated technical contacts. One of our account representative has reached out to the site to validate this issue. To avoid a delay with support, it would be helpful to have one of the technical contact from AMAT reach out to us and validate the issue logged here. ________________________________ You are receiving this mail because: * You reported the bug. The content of this message is APPLIED MATERIALS CONFIDENTIAL. If you are not the intended recipient, please notify me, delete this email and do not use or distribute this email. Hi Mani - Jess has authorized us to look into this issue while we sort out the issue with your authorized contacts. I will have Marshall follow up with you shortly. In the meantime, would you please past the errors you are seeing and any options you are using with the wrapper? Created attachment 25043 [details] image001.png Hello Jason, Thnak you. Since TCAD doesn’t support SLURM directly, but supports Torque. So we use these wrappers to submit the jobs. Below is the error we are getting now: [cid:image001.png@01D86926.054DD410] Thank you. Regards, Manikanta Eluri HPC-Apps | Applied Materials E-mail : HPC_Unified_Support@amat.com<mailto:HPC_Unified_Support@amat.com> From: bugs@schedmd.com <bugs@schedmd.com> Sent: Monday, May 16, 2022 1:05 PM To: Manikanta Eluri --CNTR <Manikanta_Eluri@contractor.amat.com> Subject: [EXTERNAL] [Bug 14061] Slurm torque wrapper not submitting the jobs to scheduler Jason Booth changed bug 14061 What Removed Added Assignee jbooth@schedmd.com<mailto:jbooth@schedmd.com> Jason Booth<mailto:jbooth@schedmd.com> changed bug 14061<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061__;!!NH8t9uXaRvxizNEf!RkNYbY69-wjUQWJAubGrCX9YxKXmlX1u3YtSPT0LDd6n33tjx4iHas5C_6GprXuAEd_ZJl9g7uqOGH2X-3uCXdnT$> What Removed Added Assignee jbooth@schedmd.com<mailto:jbooth@schedmd.com> marshall@schedmd.com<mailto:marshall@schedmd.com> Comment # 6<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061*c6__;Iw!!NH8t9uXaRvxizNEf!RkNYbY69-wjUQWJAubGrCX9YxKXmlX1u3YtSPT0LDd6n33tjx4iHas5C_6GprXuAEd_ZJl9g7uqOGH2X-x6QWyds$> on bug 14061<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061__;!!NH8t9uXaRvxizNEf!RkNYbY69-wjUQWJAubGrCX9YxKXmlX1u3YtSPT0LDd6n33tjx4iHas5C_6GprXuAEd_ZJl9g7uqOGH2X-3uCXdnT$> from Jason Booth<mailto:jbooth@schedmd.com> Hi Mani - Jess has authorized us to look into this issue while we sort out the issue with your authorized contacts. I will have Marshall follow up with you shortly. In the meantime, would you please past the errors you are seeing and any options you are using with the wrapper? ________________________________ You are receiving this mail because: * You reported the bug. The content of this message is APPLIED MATERIALS CONFIDENTIAL. If you are not the intended recipient, please notify me, delete this email and do not use or distribute this email. Thanks. Can you also upload your slurm.conf file? Created attachment 25046 [details] slurm1.zip Hello Team, PFA.. Thank you. Regards, Manikanta Eluri HPC-Apps | Applied Materials E-mail : HPC_Unified_Support@amat.com<mailto:HPC_Unified_Support@amat.com> From: bugs@schedmd.com <bugs@schedmd.com> Sent: Monday, May 16, 2022 1:41 PM To: Manikanta Eluri --CNTR <Manikanta_Eluri@contractor.amat.com> Subject: [EXTERNAL] [Bug 14061] Slurm torque wrapper not submitting the jobs to scheduler Comment # 8 on bug 14061 from Marshall Garey Thanks. Can you also upload your slurm.conf file? You are receiving this mail because: Comment # 8<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061*c8__;Iw!!NH8t9uXaRvxizNEf!TjE6ctsSywsg3jcwQnvzeJUSq8gIeUgfYFvPwQYdPavDxfuC9IAWAKdwJChebgo4X1e5X0Z-ZtS-o6xbL0NceWCK$> on bug 14061<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061__;!!NH8t9uXaRvxizNEf!TjE6ctsSywsg3jcwQnvzeJUSq8gIeUgfYFvPwQYdPavDxfuC9IAWAKdwJChebgo4X1e5X0Z-ZtS-o6xbL9k_ctBm$> from Marshall Garey<mailto:marshall@schedmd.com> Thanks. Can you also upload your slurm.conf file? ________________________________ You are receiving this mail because: * You reported the bug. The content of this message is APPLIED MATERIALS CONFIDENTIAL. If you are not the intended recipient, please notify me, delete this email and do not use or distribute this email. Thanks.
I cannot reproduce this. As far as I can tell, this error does not come from Slurm. Is tcad generating this error?
> Multithreaded jobs are not supported on Torque scheduler.
You will need to identify what tcad does not like (in this case, some option related to multithreading).
What is your reproducer? Can you reproduce this without using the tcad tool?
For example, I just ran this successfully:
$ cat script.sh
#!/bin/sh
hostname
$ qsub sript.sh
757
$ cat script.sh.o757
voyager
Hello Team, It’s the error pops up when submitting the jobs to Torque via tcad. I am not sure if this is specific to tcad or Torque itself, but with the error it sounded like its related to Torque which doesn’t support multi threading. Is multithreading supported by default in torque? Thank you. Regards, Manikanta Eluri HPC-Apps | Applied Materials E-mail : HPC_Unified_Support@amat.com<mailto:HPC_Unified_Support@amat.com> From: bugs@schedmd.com <bugs@schedmd.com> Sent: Monday, May 16, 2022 3:20 PM To: Manikanta Eluri --CNTR <Manikanta_Eluri@contractor.amat.com> Subject: [EXTERNAL] [Bug 14061] Slurm torque wrapper not submitting the jobs to scheduler Comment # 10 on bug 14061 from Marshall Garey Thanks. I cannot reproduce this. As far as I can tell, this error does not come from Slurm. Is tcad generating this error? Comment # 10<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061*c10__;Iw!!NH8t9uXaRvxizNEf!SaUViGIuWa0TKs7kwvHVlyyNc-xf9gH4K_64tuWin3n0kqbggL6ZpbBnsvtilF51hV2hiwA6hl_gOJL6vZkdQ-fA$> on bug 14061<https://urldefense.com/v3/__https:/bugs.schedmd.com/show_bug.cgi?id=14061__;!!NH8t9uXaRvxizNEf!SaUViGIuWa0TKs7kwvHVlyyNc-xf9gH4K_64tuWin3n0kqbggL6ZpbBnsvtilF51hV2hiwA6hl_gOJL6vRKYWvWr$> from Marshall Garey<mailto:marshall@schedmd.com> Thanks. I cannot reproduce this. As far as I can tell, this error does not come from Slurm. Is tcad generating this error? > Multithreaded jobs are not supported on Torque scheduler. You will need to identify what tcad does not like (in this case, some option related to multithreading). What is your reproducer? Can you reproduce this without using the tcad tool? For example, I just ran this successfully: $ cat script.sh #!/bin/sh hostname $ qsub sript.sh 757 $ cat script.sh.o757 voyager ________________________________ You are receiving this mail because: * You reported the bug. The content of this message is APPLIED MATERIALS CONFIDENTIAL. If you are not the intended recipient, please notify me, delete this email and do not use or distribute this email. Mani - > It’s the error pops up when submitting the jobs to Torque via tcad. I am not > sure if this is specific to tcad or Torque itself, but with the error it sounded > like its related to Torque which doesn’t support multi threading. > > Is multithreading supported by default in torque? This error is not something that is part of the Slurm codebase or the wrapper scripts. Can you send us the following: 1. What switches are being called with qsub. 2. Please let us know what server tcad is configured against. 3. Please verify that tcad is calling the qsub wrapper and not a Linux binary file. > $ file /path/to/torque/qsub.pl > /path/to/torque/qsub.pl: Perl script text executable 4. Please also upload the slurmctld.log from the server that spans the time when these jobs are being submitted. I'm marking this as a duplicate of bug 14088, which was opened by your colleague. We'll continue the discussion on 14088. *** This ticket has been marked as a duplicate of ticket 14088 *** |