Ticket 12714

Summary:	slurmd/v0.0.37/job/submit cause the slurmrestd daemon corrupt.
Product:	Slurm	Reporter:	brown kestrel <qing.na>
Component:	slurmrestd	Assignee:	Jacob Jenson <jacob>
Status:	RESOLVED INVALID	QA Contact:
Severity:	6 - No support contract
Priority:	---	CC:	nate
Version:	21.08.1
Hardware:	Linux
OS:	Linux
Site:	-Other-	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description brown kestrel 2021-10-21 00:21:52 MDT

Everything is just fine when I use the slurmrestd.
But once I add the argv parameters in the request body.
The slurmrestd will be terminated with a segment fault.
is it a bug?

Request URL: http://{{slurmUrl}}/slurm/v0.0.37/job/submit

Request Body
 
{
    "job": {
        "name":"demo_test",
        "current_working_directory":"/gfs/jobs",
        "tasks": 1,
        "nodes": [1,2],
        "environment": {
            "PATH":"/bin:/usr/bin/:/usr/local/bin/",
            "LD_LIBRARY_PATH":"/lib/:/lib64/:/usr/local/lib"
        },
        "standard_output": "demo.%j.out"
    },
    "script":"#!/bin/bash\n sbatch demo.sh"
}


Request Body which cause the slurmrestd corrupt.

{
    "job": {
        "name":"demo_test",
        "current_working_directory":"/gfs/jobs",
        "tasks": 1,
        "nodes": [1,2],
        "environment": {
            "PATH":"/bin:/usr/bin/:/usr/local/bin/",
            "LD_LIBRARY_PATH":"/lib/:/lib64/:/usr/local/lib"
        },
        "standard_output": "demo.%j.out",
        "argv": [
            "hello"
        ]
    },
    "script":"#!/bin/bash\n sbatch demo.sh"
}


 


Slurmrest log

=======
21 14:14:08 10-23-145-163 kernel: slurmrestd[1327]: segfault at 2f ip 00007feaf6ef9f35 sp 00007feaf52e9440 error 4 in libslurmfull.so[7feaf6e3a000+1d0000]
21 14:14:08 10-23-145-163 systemd[1]: slurmrestd.service: main process exited, code=killed, status=11/SEGV
21 14:14:08 10-23-145-163 systemd[1]: Unit slurmrestd.service entered failed state.
21 14:14:08 10-23-145-163 systemd[1]: slurmrestd.service failed.
 

Can anyone could help with it ?

Comment 2 Jacob Jenson 2021-10-21 09:31:37 MDT

This has been verified as a bug. If ucloud.cn will purchase Slurm support then our professional support team can work with you to resolve this bug. 

Thank you,
Jacob

Comment 3 brown kestrel 2021-10-21 19:59:51 MDT

(In reply to Jacob Jenson from comment #2)
> This has been verified as a bug. If ucloud.cn will purchase Slurm support
> then our professional support team can work with you to resolve this bug. 
> 
> Thank you,
> Jacob


Hi Jacob,

Thanks for your kindly reply,


I'm evaluating/learning Slurm for the moment, and hope to run some demo-level computing jobs.

I understood that Schedmd does not provide non-commercial support since time is precious for everyone.

I have read through the Slurm contribution guide ( https://github.com/SchedMD/slurm/blob/master/CONTRIBUTING.md
), it seems like I cloud push my patch here in the attachment if I find the solution by myself right?

Of course, if I find the commercial support is necessary during the evaluation.

We will get one.

Best regards,
kestrel

Comment 4 Jason Booth 2021-10-22 09:31:17 MDT

>I have read through the Slurm contribution guide 
> (https://github.com/SchedMD/slurm/blob/master/CONTRIBUTING.md),
>  it seems like I cloud push my patch here in the attachment if I find the
> solution by myself right?

Correct, this is the procedure you would use when submitting a patch to us for inclusion with Slurm.