8849 – slurmctld cache's DefaultAccount incorrectly; only reset with slurmctld restart

Ticket 8849 - slurmctld cache's DefaultAccount incorrectly; only reset with slurmctld restart

Summary: slurmctld cache's DefaultAccount incorrectly; only reset with slurmctld restart

Status:	RESOLVED CANNOTREPRODUCE

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	slurmctld (show other tickets)
Version:	20.02.3
Hardware:	Linux Linux

Severity:	4 - Minor Issue
Assignee:	Albert Gil
QA Contact:

URL:

Duplicates (1):	9793 (view as ticket list)
Depends on:
Blocks:

Reported:	2020-04-13 15:11 MDT by S Senator
Modified:	2021-06-02 09:48 MDT (History)
CC List:	1 user (show)

See Also:
Site:	LANL
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	CentOS
Machine Name:	vc, vx
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
cluster1 ("vc") slurm.conf (5.81 KB, text/plain) 2020-04-14 10:52 MDT, S Senator	Details
cluster1 ("vc") slurmdbd.conf (229 bytes, text/plain) 2020-04-14 10:53 MDT, S Senator	Details
cluster2 ("vx") slurm.conf (4.81 KB, text/plain) 2020-04-14 10:54 MDT, S Senator	Details
cluster2 ("vx") slurmdbd.conf (229 bytes, text/plain) 2020-04-14 10:55 MDT, S Senator	Details
cluster1 ("vc") slurmctld.log (19.21 KB, text/x-log) 2020-04-14 13:28 MDT, S Senator	Details
cluster2 ("vx") slurmctld.log (13.86 KB, text/x-log) 2020-04-14 13:29 MDT, S Senator	Details
common cluster1 ("vc") slurmdbd.log (320 bytes, text/x-log) 2020-04-14 13:30 MDT, S Senator	Details
test job submission script (2.33 KB, application/x-shellscript) 2020-04-14 13:40 MDT, S Senator	Details
remediation script (1.28 KB, application/x-shellscript) 2020-04-14 13:41 MDT, S Senator	Details
slurmctld log (199.60 KB, application/gzip) 2020-07-25 23:28 MDT, S Senator	Details
slurmctld core dump (267.03 KB, application/gzip) 2020-07-25 23:29 MDT, S Senator	Details
slurmdbd log (36.99 KB, application/gzip) 2020-07-25 23:31 MDT, S Senator	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description S Senator 2020-04-13 15:11:23 MDT

A user with a DefaultAccount set correctly has jobs rejected with invalid qos/partition using sbatch or salloc if --account=... is not used for submission. If slurmctld is restarted these use the value defined in DefaultAccount and succeed, as expected.

This appears to be an issue only on a second cluster which shares a data base with a primary cluster and with slurm version 20.02.1.

On a newly-installed data base and cluster1:
1. create a cluster1, partition, qos, account, default account and user.
   qos=normal, account=default, defaultaccount=default
2. submit test jobs
   ==> all is good on cluster1

On a 2nd cluster which uses the same slurmdbd and cluster2:
3. create cluster2, partition, qos, account, default account and user
   (identical values & configuration as above)
4. sbatch or salloc test jobs (invoked by root test driver)
    ex. [works] # sbatch --chdir=/tmp --qos=normal --account=default --uid=user --wrap=date
   
    ex [fails] # sbatch --chdir=/tmp --qos=normal --uid=user --wrap=date

If the slurmctld is restarted ("systemctl restart slurmctld") where it is running on a separate node, then the test job submission works.

----
Is there an scontrol command to cause slurmctld to be restarted? (or hack such as 'scontrol takeover' without a backup controller causing the primary to be restarted?) That is, could this be invoked by an appropriately privileged user (=~ automated test suite) to cause slurmctld to flush its cache & restart rather than forcing a human to invoke something like 'ssh slurmctlr-node systemctl restart slurmctld'?

Automating ssh access in our environment is discouraged by security policy.

----
[root@vxlogin slurm]# sinfo --version
slurm 20.02.1
[root@vxlogin slurm]# uname -a
Linux vxlogin 3.10.0-1062.4.3.el7.x86_64 #1 SMP Wed Nov 13 23:58:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@vxlogin slurm]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
---

Comment 1 S Senator 2020-04-13 15:12:51 MDT

This is tagged as a slurmctld component but may be slurmdbd.

Comment 2 Albert Gil 2020-04-14 08:47:07 MDT

Hi Steven,

> A user with a DefaultAccount set correctly has jobs rejected with invalid
> qos/partition using sbatch or salloc if --account=... is not used for
> submission. If slurmctld is restarted these use the value defined in
> DefaultAccount and succeed, as expected.
> 
> This appears to be an issue only on a second cluster which shares a data
> base with a primary cluster and with slurm version 20.02.1.
> 
> On a newly-installed data base and cluster1:
> 1. create a cluster1, partition, qos, account, default account and user.
>    qos=normal, account=default, defaultaccount=default
> 2. submit test jobs
>    ==> all is good on cluster1
> 
> On a 2nd cluster which uses the same slurmdbd and cluster2:
> 3. create cluster2, partition, qos, account, default account and user
>    (identical values & configuration as above)
> 4. sbatch or salloc test jobs (invoked by root test driver)
>     ex. [works] # sbatch --chdir=/tmp --qos=normal --account=default
> --uid=user --wrap=date
>    
>     ex [fails] # sbatch --chdir=/tmp --qos=normal --uid=user --wrap=date
> 
> If the slurmctld is restarted ("systemctl restart slurmctld") where it is
> running on a separate node, then the test job submission works.

Thanks for the detailed explanations.
Unfortunately I've not being able to reproduce the issue so far.

Could you detail a bit more the order that you follow to create/add the new cluster in the db, when do you start the new slurmctld, and when you create the new account and user of the new cluster in the db. Did you start the new slurmctld with slurmdbd down?
And could you attach the logs from both slurmctlds and also from the shared slurmdbd?
And both slurm.conf and the one slurmdbd.conf?

Once it works for the first time, I guess that you are not able to reproduce it neither, right?
 
> Is there an scontrol command to cause slurmctld to be restarted? (or hack
> such as 'scontrol takeover' without a backup controller causing the primary
> to be restarted?)

Wait a minute, those that mean that you are using multiple SlurmctldHost (old BackupController/Addr)?
If this is the case, then there is no "multiple clusters", but only one with high-availability.

So far I understood that you have two clusters, meaning two independent slurmctld with two independent slurm.conf, both sharing the slurmdbd but each one with its own cluster on the database. Not multiple SlurmctldHost in a single slurm.conf.
Was I right?

> That is, could this be invoked by an appropriately
> privileged user (=~ automated test suite) to cause slurmctld to flush its
> cache & restart rather than forcing a human to invoke something like 'ssh
> slurmctlr-node systemctl restart slurmctld'?
> 
> Automating ssh access in our environment is discouraged by security policy.

We don't have such command, mainly because slurmctld is designed (and needs) to refresh its cache whenever is necessary from the database.
I'm not sure if this could become and enhancement.


Regards,
Albert

Comment 3 S Senator 2020-04-14 10:52:18 MDT

Created attachment 13779 [details]
cluster1 ("vc") slurm.conf

Comment 4 S Senator 2020-04-14 10:53:12 MDT

(In reply to Albert Gil from comment #2)

> Could you detail a bit more the order that you follow to create/add the new
> cluster in the db, when do you start the new slurmctld, and when you create
> the new account and user of the new cluster in the db. Did you start the new
> slurmctld with slurmdbd down?

After cluster1 is fully up (slurmdbd included) & has run its own test jobs, the cluster2 nodes are provisioned. The first node is the cluster2 slurmctld scheduler node. It spins up its own slurmctld instance, which is configured to point to the cluster1's dbd. That's the point where the 'add cluster cluster2' is invoked, which succeeds. The relevant tables are populated in mysql and the clustername (as shown in the clusters table matches what is retrieved from cluster2's slurm.conf.)

So, no, the slurmdbd was definitely up when the new cluster was created.

After the cluster2 compute nodes are up & running slurmd, the front-end/login node is provisioned. As a late stage in its provisioning, dependent on seeing the socket on the compute nodes, the users and their associations are created. The default account is set as a separate call to sacctmgr. (all of which have a return code = 0)


> And could you attach the logs from both slurmctlds and also from the shared
> slurmdbd?
> And both slurm.conf and the one slurmdbd.conf?
[done: note vc = cluster1, vx = cluster2]

> 
> Once it works for the first time, I guess that you are not able to reproduce
> it neither, right?

Yes, once slurmctld has been restarted the problem does not reoccur.


> Wait a minute, those that mean that you are using multiple SlurmctldHost
> (old BackupController/Addr)?

No. This was just an extrapolation that the takeover logic could be triggered to cause a cache flush since I didn't see any other mechanism besides restarting cluster2's slurmctld to accomplish that.

> If this is the case, then there is no "multiple clusters", but only one with
> high-availability.

This was just speculation on my part about possible mechanisms. We do not use a secondary controller nor are these clusters configured to do so.


> So far I understood that you have two clusters, meaning two independent
> slurmctld with two independent slurm.conf, both sharing the slurmdbd but
> each one with its own cluster on the database.

Yes. [attaching the slurm.conf & slurmdbd.conf]

> Not multiple SlurmctldHost in a single slurm.conf.
> Was I right?

You are right. There are not multiple SlurmctldHosts. That was just speculation.

> We don't have such command, mainly because slurmctld is designed (and needs)
> to refresh its cache whenever is necessary from the database.

I was just speculating about existing logic that could be triggered to provoke a slurmctld cache flush, without requiring external agency (systemctl restart slurmctld, in this case.)

Comment 5 S Senator 2020-04-14 10:53:46 MDT

Created attachment 13780 [details]
cluster1 ("vc") slurmdbd.conf

Comment 6 S Senator 2020-04-14 10:54:35 MDT

Created attachment 13781 [details]
cluster2 ("vx") slurm.conf

Comment 7 S Senator 2020-04-14 10:55:46 MDT

Created attachment 13782 [details]
cluster2 ("vx") slurmdbd.conf

Comment 8 S Senator 2020-04-14 10:56:54 MDT

I am reconfiguring the two clusters & will regenerate the logs & attach them as requested. This will take ~1 hour.

Comment 9 S Senator 2020-04-14 13:28:56 MDT

Created attachment 13784 [details]
cluster1 ("vc") slurmctld.log

Comment 10 S Senator 2020-04-14 13:29:32 MDT

Created attachment 13785 [details]
cluster2 ("vx") slurmctld.log

Comment 11 S Senator 2020-04-14 13:30:14 MDT

Created attachment 13786 [details]
common cluster1 ("vc") slurmdbd.log

Comment 12 S Senator 2020-04-14 13:35:07 MDT

User|Def Acct|Def WCKey|Admin|Cluster|Account|Partition|Share|Priority|MaxJobs|MaxNodes|MaxCPUs|MaxSubmit|MaxWall|MaxCPUMins|QOS|Def QOS|
sts|default||Administrator|vc|default|compile|1||||||||normal||
sts|default||Administrator|vc|default|login|1||||||||normal||
sts|default||Administrator|vc|default|exclusive|1||||||||normal||
sts|default||Administrator|vc|default|shared|1||||||||normal||
sts|default||Administrator|vx|default|login|1||||||||normal||
sts|default||Administrator|vx|default|exclusive|1||||||||normal||
sts|default||Administrator|vx|default|shared|1||||||||normal||

---
% id sts
uid=24800(sts) gid=24800(sts) groups=24800(sts),1000(vagrant)
---
% tail /var/log/slurm/slurmctld.vxsched.log
[2020-04-14T13:23:57.116] error: User 24800 not found
[2020-04-14T13:23:57.120] _job_create: invalid account or partition for user 24800, account '(null)', and partition 'login'
[2020-04-14T13:23:57.150] _slurm_rpc_submit_batch_job: Invalid account or account/partition combination specified
---

Comment 13 S Senator 2020-04-14 13:40:20 MDT

Created attachment 13788 [details]
test job submission script

Comment 14 S Senator 2020-04-14 13:41:54 MDT

Created attachment 13789 [details]
remediation script

explicitly setting the DefaultAccount does not seem necessary
causing the slurmctld to be restarted remediates the problem

Comment 19 Albert Gil 2020-06-30 09:53:50 MDT

Hi Steven,

Sorry for the delay.
I'm still not able to reproduce the issue, but I've some clues to follow:

1) About the cluster registration:

> After cluster1 is fully up (slurmdbd included) & has run its own test jobs,
> the cluster2 nodes are provisioned. The first node is the cluster2 slurmctld
> scheduler node. It spins up its own slurmctld instance, which is configured
> to point to the cluster1's dbd. That's the point where the 'add cluster
> cluster2' is invoked, which succeeds.

I would like to know more details about the 'add cluster cluster2/vx' that you mentioned.
Do you run a sacctmgr command on provision/loader/shload.sh file or similar to "sacctmgr add cluster", or this is done automatically whem vxsched is started?
Or maybe you do both? In which order?

I think that is just the slurmctld of boths clusters who do the registration when started, but from your logs and comments I cannot be certain:

vc:
[2020-04-14T12:03:54.329] Registering slurmctld at port 6817 with slurmdbd

vx:
[2020-04-14T13:00:57.375] Registering slurmctld at port 6817 with slurmdbd

Could you increase the debug level of the slurmdbd to debug3?
That would confirm how the cluster2 was registered to vcdb.

2) About a couple of non expected lines in the slurmctld logs:

vc:
[2020-04-14T12:03:54.864] killing old slurmctld[8076]

vx:
[2020-04-14T13:00:57.956] killing old slurmctld[8267]

It looks like the on those fresh nodes provisioned there were already slurmcltds running?
Do you see any reason why this could be?
Maybe systemd is starting them before your provisioning script or similar?


3) About the --uid

The failing sbatch commands are run by root and using --uid, could you try to avoid --uid as a test?
The code path of --uid may be different than a normal submission, and in newer version of slurm 20.02.x we did some fixes related to --uid:

https://github.com/SchedMD/slurm/blob/master/NEWS#L82
https://github.com/SchedMD/slurm/blob/master/NEWS#L86

I don't think it is, but I would like to discard the issue being related to it.
Could you try to use sudo or something similar instead?

4) Small double-check question: the current workaround is restarting the slurmctld of *only* vx, right? The vc is not restarted, right? Neither vcdb, right?

Thanks,
Albert

Comment 20 S Senator 2020-06-30 10:15:22 MDT

> 4) Small double-check question: the current workaround is restarting the slurmctld of *only* vx, right? The vc is not restarted, right? Neither vcdb, right?

Yes. The current work-around is to restart the cluster2/vx slurmctld, only.

> The failing sbatch commands are run by root and using --uid, could you try to avoid --uid as a test?
> Could you increase the debug level of the slurmdbd to debug3?

I will rework the tests to use sudo rather than the '--uid' mechanism, and set the debug level to 'debug3'.

This stage of the automated test is a final system verification as a (pseudo) random set of users job submission.

> 2) About a couple of non expected lines in the slurmctld logs:
> vc:
> [2020-04-14T12:03:54.864] killing old slurmctld[8076] 
> vx:
> [2020-04-14T13:00:57.956] killing old slurmctld[8267]

This is a consequence of the automated construction of the cluster:
The initial sched node starts a munge instance so that sacct commands may be run.
slurmctld is manually started.
QoS and other global db tables are populated.
The preliminary munge and slurmctld processes are stopped.
File system and directory permissions are set to what the systemd service files expect and require.
munge is started using systemd.
slurmctld is started using systemd.

This occurs on the vcsched or vxsched nodes, which do not have slurmd installed. The compute nodes do not have the slurmctld rpms installed.

If you have a well-provisioned (wrt. RAM and disk) you can reproduce this from https://github.com/hpc/hpc-collab. The vc and vx recipes are there. (Feedback appreciated.)

Comment 21 Albert Gil 2020-07-01 05:20:04 MDT

Hi Steven,

> > The failing sbatch commands are run by root and using --uid, could you try to avoid --uid as a test?
> > Could you increase the debug level of the slurmdbd to debug3?
> 
> I will rework the tests to use sudo rather than the '--uid' mechanism, and
> set the debug level to 'debug3'.

Great, thanks!

> > 2) About a couple of non expected lines in the slurmctld logs:
> > vc:
> > [2020-04-14T12:03:54.864] killing old slurmctld[8076] 
> > vx:
> > [2020-04-14T13:00:57.956] killing old slurmctld[8267]
> 
> This is a consequence of the automated construction of the cluster:
> The initial sched node starts a munge instance so that sacct commands may be
> run.

I guess that you mean the sacctmgr command, right?
And I guess that those commands are the addition of (def)accounts, users, QOSes, and also the cluster(s)?

> slurmctld is manually started.

From your other comments and the logs I understand that boths vc and vx slurmctld.
Do you know why you need to start them at this stage?
Note that most of the sacctmgr commands doesn't need the slurmctld to be running, but only slurmdbd.

> QoS and other global db tables are populated.

I assume that that's through the sacctmgr commands mentioned above, I guess.

> The preliminary munge and slurmctld processes are stopped.
> File system and directory permissions are set to what the systemd service
> files expect and require.
> munge is started using systemd.
> slurmctld is started using systemd.

I can see the manual part of the munge and slurmdbd (if the sacctmgr commands are not run on vcdb host), but not certain about the need of the manual part of slurmctld.
Anyway, it shouldn't be the source of the problem.
I just mentioned in case it helps you to simplify the scripts, and also in case of uncontrolled daemons being launched.

Anyway, I'll note it to try to get a reproducer.

> This occurs on the vcsched or vxsched nodes, which do not have slurmd
> installed. The compute nodes do not have the slurmctld rpms installed.

Ok, that matches what I saw in the logs.

> If you have a well-provisioned (wrt. RAM and disk) you can reproduce this
> from https://github.com/hpc/hpc-collab. The vc and vx recipes are there.
> (Feedback appreciated.)

Great!
I'll play with it to see if it helps me to reproduce the issue.

Thanks,
Albert

Comment 22 S Senator 2020-07-01 10:18:54 MDT

(In reply to Albert Gil from comment #21)

> From your other comments and the logs I understand that boths vc and vx
> slurmctld.
> Do you know why you need to start them at this stage?

The main reason is as an additional validation step. Since each cluster node has automated verification of its capabilities, before successor nodes, such as the compute nodes, we use it to validate at each stage.

Feel free to put in an issue at https://github.com/hpc/hpc-collab/issues or via direct messages, or code.

Your (and other's) feedback would be highly appreciated. As mentioned in that
README, part of the motivation for this was to have similar test platforms to
generate reproducers of problems, scenarios, alternate configurations etc.

Comment 23 Albert Gil 2020-07-01 10:36:01 MDT

Hi Steven,

> The main reason is as an additional validation step. Since each cluster node
> has automated verification of its capabilities, before successor nodes, such
> as the compute nodes, we use it to validate at each stage.

Ok, that makes perfect sense!
I'm starting wondering if maybe the issue is actually related to that small detail of starting a new controller while there is one opened... at least it's clue to follow! ;-)

> Feel free to put in an issue at https://github.com/hpc/hpc-collab/issues or
> via direct messages, or code.
> 
> Your (and other's) feedback would be highly appreciated. As mentioned in that
> README, part of the motivation for this was to have similar test platforms to
> generate reproducers of problems, scenarios, alternate configurations etc.

Noted.
Some moths ago we made public some similar tool that we use for a very similar purpose, jfyi:
https://gitlab.com/SchedMD/training/docker-scale-out

Regards,
Albert

Comment 24 S Senator 2020-07-01 17:25:50 MDT

(In reply to Albert Gil from comment #23)
> I'm starting wondering if maybe the issue is actually related to that small
> detail of starting a new controller while there is one opened... at least
> it's clue to follow! ;-)


There shouldn't be another controller running in each cluster. We only start them early to validate & test the commands. Then we stop that instance and start them "normally" via systemd, so that we have confidence that these test instances are as production-like as possible. I'll investigate further, though, to prove that my assumption about what is happening matches what should be happening.

Comment 25 S Senator 2020-07-02 13:49:47 MDT

What slurmdbd.conf DebugFlags should be set to go with DebugLevel=debug3?

Comment 26 Albert Gil 2020-07-03 09:08:56 MDT

Hi Steven,

Thanks to update the version information.

> There shouldn't be another controller running in each cluster. We only start
> them early to validate & test the commands. Then we stop that instance and
> start them "normally" via systemd, so that we have confidence that these
> test instances are as production-like as possible. I'll investigate further,
> though, to prove that my assumption about what is happening matches what
> should be happening.

Ok.
I think that there are slurmctld still running once systemd starts the new ones for these couple of lines:

vc:
[2020-04-14T12:03:54.864] killing old slurmctld[8076]
vx:
[2020-04-14T13:00:57.956] killing old slurmctld[8267]

These log traces are only printed if an already running slurmctld is found:
https://github.com/SchedMD/slurm/blob/slurm-20.02/src/slurmctld/controller.c#L3007


> What slurmdbd.conf DebugFlags should be set to go with DebugLevel=debug3?

I was only interested in the debug3 level of slurmdbd, but now that you mentioned DebugFlags, maybe this ones could help us:

In slurmctld:
DebugFlags=Agent

In slurmdbd.conf:
DebugFlags=DB_EVENT,DB_ASSOC,DB_QUERY

Thanks,
Albert

Comment 27 S Senator 2020-07-25 23:28:47 MDT

Created attachment 15170 [details]
slurmctld log

Comment 28 S Senator 2020-07-25 23:29:39 MDT

Created attachment 15171 [details]
slurmctld core dump

Comment 29 S Senator 2020-07-25 23:31:51 MDT

Created attachment 15172 [details]
slurmdbd log

Comment 30 S Senator 2020-07-25 23:34:41 MDT

Updated logs & slurmctld with flags set as requested & attached.

Comment 31 S Senator 2020-07-27 02:23:51 MDT

The slurmctld core dump (2020-07-25 23:29, https://bugs.schedmd.com/attachment.cgi?id=15171) is probably not relevant. It appears to be from a broken slurm-spank-lua plugin.

Comment 32 Felip Moll 2020-10-12 04:42:31 MDT

*** Ticket 9793 has been marked as a duplicate of this ticket. ***

Comment 33 S Senator 2021-06-01 11:48:56 MDT

This does not seem reproducible with the latest release.

Comment 34 Albert Gil 2021-06-02 07:01:44 MDT

Hi Steven,

Sorry for the (too) long delay on this one.

> This does not seem reproducible with the latest release.

I guess that these are good news.
You mean 20.11.8, right?

Have you done any other update on the system besides Slurm?

I'm thinking on closing the bug as cannotreproduce.
Would that be ok for you too?

Regards,
Albert

Comment 35 S Senator 2021-06-02 09:41:25 MDT

As I speak today it is 20.11.7, because due to our configuration we didn't need to rush to 20.11.8. Yes, please feel free to close this.

Thank you,
-Steve Senator

________________________________________
From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Wednesday, June 2, 2021 7:01:44 AM
To: Senator, Steven Terry
Subject: [EXTERNAL] [Bug 8849] slurmctld cache's DefaultAccount incorrectly; only reset with slurmctld restart

Comment # 34<https://bugs.schedmd.com/show_bug.cgi?id=8849#c34> on bug 8849<https://bugs.schedmd.com/show_bug.cgi?id=8849> from Albert Gil<mailto:albert.gil@schedmd.com>

Hi Steven,

Sorry for the (too) long delay on this one.

> This does not seem reproducible with the latest release.

I guess that these are good news.
You mean 20.11.8, right?

Have you done any other update on the system besides Slurm?

I'm thinking on closing the bug as cannotreproduce.
Would that be ok for you too?

Regards,
Albert

________________________________
You are receiving this mail because:

  *   You reported the bug.

Comment 36 Albert Gil 2021-06-02 09:48:58 MDT

(In reply to S Senator from comment #35)
> As I speak today it is 20.11.7,

Thanks for the clarification.

> because due to our configuration we didn't
> need to rush to 20.11.8.

Good!

> Yes, please feel free to close this.

Ok.
I'm marking it as cannotreproduce, not fixed though.
Please, feel free to reopen it if we run out of luck and the issue is reproduced again.


Thanks Steve!