Created attachment 12741 [details] Valgrind output In a small cluster (10 nodes) the memory leak can be easily 1Gb per day.
Created attachment 12742 [details] slurm config
Created attachment 12743 [details] slurm log
Hi I think that I find the cause of this leak. I will provide a fix/patch soon. Could you send me output from: "scontrol show res" Dominik
Dear Dominik, scontrol show res ReservationName=bbres StartTime=2019-12-13T16:01:14 EndTime=2020-12-12T16:01:14 Duration=365-00:00:00 Nodes=node3307.joltik.os NodeCnt=1 CoreCnt=8 Features=(null) PartitionName=joltik Flags= NodeName=node3307.joltik.os CoreIDs=8-15 TRES=cpu=8 Users=vsc43020 Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a ReservationName=tier2q1maintenance StartTime=2020-01-31T08:00:00 EndTime=2021-01-30T08:00:00 Duration=365-00:00:00 Nodes=node3300.joltik.os,node3301.joltik.os,node3302.joltik.os,node3303.joltik.os,node3304.joltik.os,node3305.joltik.os,node3306.joltik.os,node3307.joltik.os,node3308.joltik.os,node3309.joltik.os NodeCnt=10 CoreCnt=320 Features=(null) PartitionName=(null) Flags=MAINT,IGNORE_JOBS,SPEC_NODES,ALL_NODES TRES=cpu=320 Users=(null) Accounts=gvo00002 Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a
Created attachment 12744 [details] patch proposal this patch should fix this memory leak sorry that this took so long but I have done additional tests to reduce the probability of introducing new issues
Hi Did you have a chance to test this patch? Dominik
Dear Dominik, Thanks for the patch, it solved the the memory leak. Balazs
Hi Those commits fix leak in _core_bitmap_to_array() and 3 smaller leak in reservation related code. All of this fix will be included in 19.05.6 and above. https://github.com/SchedMD/slurm/commit/ffb20605 https://github.com/SchedMD/slurm/commit/0713c41a https://github.com/SchedMD/slurm/commit/164bafcc I'll go ahead and close this out. Dominik