Ticket 7715 - different errors in log files
Summary: different errors in log files
Status: RESOLVED DUPLICATE of ticket 7468
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 19.05.2
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Albert Gil
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2019-09-10 02:24 MDT by Ahmed Essam ElMazaty
Modified: 2019-09-30 06:38 MDT (History)
0 users

See Also:
Site: KAUST
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Ahmed Essam ElMazaty 2019-09-10 02:24:45 MDT
Hello,
I see some errors which are flooding slurmctld logs
Can you please explain what do they indicate and what's the impact on the system
The first one is 
error: gres/gpu: job 6457286 dealloc node dgpu501-14 type gtx1080ti gres count underflow (0 1)
It appears for different jobs and GPU nodes
The second error message is
error: select/cons_res: node cn603-15-r memory is under-allocated (61440-81920) for JobId=6426888
This also appears for different jobs and nodes

Thanks,
Ahmed
Comment 3 Albert Gil 2019-09-11 10:45:16 MDT
Hi Ahmed,

Both errors shouldn't appear, and they are indicating some internal malfunction that we are working to fix.

> The first one is 
> error: gres/gpu: job 6457286 dealloc node dgpu501-14 type gtx1080ti gres
> count underflow (0 1)
> It appears for different jobs and GPU nodes

This one is still not fixed, but we are already aware and working on it on bug 7468.

> The second error message is
> error: select/cons_res: node cn603-15-r memory is under-allocated
> (61440-81920) for JobId=6426888
> This also appears for different jobs and nodes

This is already fixed in branch slurm-19.05 and will be released as part of 19.05.3.
See bug 6769 comment 41 for details.

Regards,
Albert
Comment 4 Albert Gil 2019-09-30 06:38:07 MDT
Hi Ahmed,

If this is Ok for you I'm closing this bug as duplicated of bug 7468.

Regards,
Albert

*** This ticket has been marked as a duplicate of ticket 7468 ***