Ticket 245 - red herring errors
Summary: red herring errors
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Bluegene select plugin (show other tickets)
Version: 2.5.x
Hardware: IBM BlueGene Linux
: 4 - Minor Issue
Assignee: Danny Auble
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2013-02-25 06:12 MST by Don Lipari
Modified: 2013-04-18 10:13 MDT (History)
0 users

See Also:
Site: LLNL
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Don Lipari 2013-02-25 06:12:09 MST
In chasing down a problem with cancelling a job step on Sequoia, I was distracted by the following errors in the slurmd.log:

not distracting:
[2013-02-25T10:56:19-08:00] debug:  Sending signal 9 to step 394271.1
distracting:
[2013-02-25T10:56:19-08:00] debug:  _step_connect: connect: No such file or directory
[2013-02-25T10:56:19-08:00] debug:  signal for nonexistant 394271.1 stepd_connect failed: No such file or directory

The signal got through even though _signal_jobstep() returns early due to the -1 return status of stepd_connect().

The errors are reported due to the absence of nodename_job.step directories in the SlurmdSpoolDir on BlueGene systems.

Don
Comment 1 Don Lipari 2013-02-25 06:16:15 MST
(In reply to comment #0)
[...]
> The errors are reported due to the absence of nodename_job.step directories
> in the SlurmdSpoolDir on BlueGene systems.

correction:  nodename_job.step "sockets"
Comment 2 Danny Auble 2013-04-18 10:13:24 MDT
This should be fixed in ac7c76ab6ffa0e1d049a42463752c1d6f9cb6d6d.