Ticket 14088

Summary: Bug 14061 - Slurm torque wrapper not submitting the jobs to scheduler
Product: Slurm Reporter: Shraddha Kiran <Shraddha_Kiran>
Component: SchedulingAssignee: Marshall Garey <marshall>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: Manikanta_Eluri, Shraddha_Kiran
Version: - Unsupported Older Versions   
Hardware: Linux   
OS: Linux   
Site: AMAT Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: Slurm logs

Description Shraddha Kiran 2022-05-17 14:51:48 MDT
Hello,

Raising another case for Bug 14061 - Slurm torque wrapper not submitting the jobs to scheduler as my team member doesn't have access to raise one currently.

Could you please assist ?

Thank you

Shraddha
Comment 1 Jason Booth 2022-05-17 15:00:08 MDT
This is a duplicate of bug#14061. In the interest of time, we started working on that issue while Mani's acces was being sorted out.

Please see the latest inquiry in bug#14061comment#12. Copied below for your convenience.

Mani -

> It’s the error pops up when submitting the jobs to Torque via tcad. I am not 
> sure if this is specific to tcad or Torque itself, but with the error it sounded 
> like its related to  Torque which doesn’t support multi threading.

>
> Is multithreading supported by default in torque?


This error is not something that is part of the Slurm codebase or the wrapper scripts. Can you send us the following:

1. What switches are being called with qsub.
2. Please let us know what server tcad is configured against.
3. Please verify that tcad is calling the qsub wrapper and not a Linux binary file.
> $ file /path/to/torque/qsub.pl
> /path/to/torque/qsub.pl: Perl script text executable

4. Please also upload the slurmctld.log from the server that spans the time when these jobs are being submitted.
Comment 2 Marshall Garey 2022-05-17 16:12:34 MDT
*** Ticket 14061 has been marked as a duplicate of this ticket. ***
Comment 3 Manikanta_Eluri 2022-05-18 08:03:43 MDT
Created attachment 25089 [details]
Slurm logs

Hi Jason,
Please see my answers inline:

1. What switches are being called with qsub.
yes, its called from the correct directory. /cm/shared/apps/slurm/19.05.7/bin/qsub

2. Please let us know what server tcad is configured against.
TCAD is installed on a netwok storage which is accessible across the cluster nodes.


3. Please verify that tcad is calling the qsub wrapper and not a Linux binary file.
> $ file /path/to/torque/qsub.pl
> /path/to/torque/qsub.pl: Perl script text executable


[root@dcalph000 bin]# file /cm/shared/apps/slurm/19.05.7/bin/qsub
/cm/shared/apps/slurm/19.05.7/bin/qsub: Perl script, ASCII text executable
[root@dcalph000 bin]#


4. Please also upload the slurmctld.log from the server that spans the time when these jobs are being submitted.

Logs are attached.
Comment 5 Jason Booth 2022-05-18 12:11:08 MDT
The pre-processing is happening in your application "sptopo3d" before the job is submitted to the qsub wrapper. 

I suggest you contact synopsys/tcad support regarding this issue. There is probably an option in the application the researcher is using to deselect multithreaded support before it is submitted so that the pre-processing done by synopsys/tcad does not error out.
Comment 6 Marshall Garey 2022-06-01 13:19:11 MDT
As Jason pointed out, this error does not come from Slurm. Have you been able to resolve this with synopsys/tcad support?
Comment 7 Shraddha Kiran 2022-06-02 05:20:26 MDT
(In reply to Marshall Garey from comment #6)
> As Jason pointed out, this error does not come from Slurm. Have you been
> able to resolve this with synopsys/tcad support?

Hello 

Not yet, we are still trying to fix this. We shall let you know for any updates

Thank you
Shraddha
Comment 8 Marshall Garey 2022-06-02 13:16:47 MDT
I'm closing this as infogiven. If you have any Slurm-related questions about this issue, feel free to re-open this ticket.