Ticket 1500 - scontrol fails to update a partition for a job
Summary: scontrol fails to update a partition for a job
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 14.11.1
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: David Bigagli
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2015-03-02 22:43 MST by Gianluca Castellani
Modified: 2015-03-04 04:29 MST (History)
2 users (show)

See Also:
Site: KAUST
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Gianluca Castellani 2015-03-02 22:43:56 MST
[root@slurm01 ~]# scontrol update job=24885 partition=smp
Invalid user id for job 24885
[root@slurm01 ~]# scontrol update job=24885 partition=smp
[root@slurm01 ~]#

Hello,
I am trying to change a job partition from slurm master node. It looks like that I need to run the command twice to make it work.


Best,
Gianluca
Comment 1 David Bigagli 2015-03-03 08:45:20 MST
Ciao Gianluca,
            I tried to reproduce it but did not see this error.
Who owns the job 24885? Could you please send the output of
'scontrol show job 24885' or any other jobid you reproduce this
problem with. Is there any error logged in the slurmctld log file?

David
Comment 2 Gianluca Castellani 2015-03-03 15:28:43 MST
ciao,
that job is over, but the same just happened.

I retried with another one.
 [root@slurm01 ~]# scontrol update job=24854 partition=defaultq
Invalid user id for job 24854

Although the message complains about invalid user id the job partition has
been altered:
[root@slurm01 ~]# scontrol show job  24854
JobId=24854 JobName=BTTT4PCBM-wB97XD-opt01403TDroot19
   UserId=zhany0d(135420) GroupId=noor-users(1001)
   Priority=27347 Nice=0 Account=idle QOS=default
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:01:11 TimeLimit=8-23:55:00 TimeMin=N/A
   SubmitTime=2015-03-03T11:28:38 EligibleTime=2015-03-03T11:28:38
   StartTime=2015-03-04T08:17:33 EndTime=2015-03-13T08:12:33
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
*   Partition=defaultq* AllocNode:Sid=rcfen02:27197

Looking at the log file
[root@slurm01 ~]# grep 24854 /var/log/slurm/slurmctld.log
[2015-03-03T11:28:38.106] _slurm_rpc_submit_batch_job JobId=24854 usec=391
[2015-03-04T08:17:19.032] job_submit.lua: *slurm_job_modify: for job 24854
from uid 0, setting default comment value: ***TEST_COMMENT****
[2015-03-04T08:17:19.032] *update_job: setting partition to defaultq for
job_id 24854*
[2015-03-04T08:17:33.302] sched: Allocate JobId=24854 NodeList=ca119
#CPUs=16

I am guessing that the culprit is my job_submit.lua (I did not modified the
slurm_job_modify function)

function slurm_job_modify(job_desc, job_rec, part_list, modify_uid)
    if job_desc.comment == nil then
        local comment = "****TEST_COMMENT****"
        slurm.log_info("*slurm_job_modify: for job %u from uid %u, setting
default comment value: %s*",
        job_rec.job_id, modify_uid, comment)
        job_desc.comment = comment
    end

    return slurm.SUCCESS
end

I think that you can close the bug.

Best,
Gianluca







On Wed, Mar 4, 2015 at 1:45 AM, <bugs@schedmd.com> wrote:

>   *Comment # 1 <http://bugs.schedmd.com/show_bug.cgi?id=1500#c1> on bug
> 1500 <http://bugs.schedmd.com/show_bug.cgi?id=1500> from David Bigagli
> <david@schedmd.com> *
>
> Ciao Gianluca,
>             I tried to reproduce it but did not see this error.
> Who owns the job 24885? Could you please send the output of
> 'scontrol show job 24885' or any other jobid you reproduce this
> problem with. Is there any error logged in the slurmctld log file?
>
> David
>
>  ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 3 David Bigagli 2015-03-04 04:29:03 MST
Va bene dajje. You can try without the submit plugin and let us know the results.

Bella,
      David