Ticket 7715

Summary: different errors in log files
Product: Slurm Reporter: Ahmed Essam ElMazaty <ahmed.mazaty>
Component: OtherAssignee: Albert Gil <albert.gil>
Status: RESOLVED DUPLICATE QA Contact:
Severity: 3 - Medium Impact    
Priority: ---    
Version: 19.05.2   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=7468
https://bugs.schedmd.com/show_bug.cgi?id=6769
Site: KAUST Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Ahmed Essam ElMazaty 2019-09-10 02:24:45 MDT
Hello,
I see some errors which are flooding slurmctld logs
Can you please explain what do they indicate and what's the impact on the system
The first one is 
error: gres/gpu: job 6457286 dealloc node dgpu501-14 type gtx1080ti gres count underflow (0 1)
It appears for different jobs and GPU nodes
The second error message is
error: select/cons_res: node cn603-15-r memory is under-allocated (61440-81920) for JobId=6426888
This also appears for different jobs and nodes

Thanks,
Ahmed
Comment 3 Albert Gil 2019-09-11 10:45:16 MDT
Hi Ahmed,

Both errors shouldn't appear, and they are indicating some internal malfunction that we are working to fix.

> The first one is 
> error: gres/gpu: job 6457286 dealloc node dgpu501-14 type gtx1080ti gres
> count underflow (0 1)
> It appears for different jobs and GPU nodes

This one is still not fixed, but we are already aware and working on it on bug 7468.

> The second error message is
> error: select/cons_res: node cn603-15-r memory is under-allocated
> (61440-81920) for JobId=6426888
> This also appears for different jobs and nodes

This is already fixed in branch slurm-19.05 and will be released as part of 19.05.3.
See bug 6769 comment 41 for details.

Regards,
Albert
Comment 4 Albert Gil 2019-09-30 06:38:07 MDT
Hi Ahmed,

If this is Ok for you I'm closing this bug as duplicated of bug 7468.

Regards,
Albert

*** This ticket has been marked as a duplicate of ticket 7468 ***