Ticket 2378

Summary: scontrol show burst displaying incorrect sizes
Product: Slurm Reporter: David Paul <dpaul>
Component: Burst BuffersAssignee: Moe Jette <jette>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: djbard, dmjacobsen, dpaul, tim
Version: 15.08.7   
Hardware: Cray XC   
OS: Linux   
Site: NERSC Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: Cori CLE Version:
Version Fixed: 15.08.9 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description David Paul 2016-01-25 10:30:51 MST
AT least 3 persistent reservation sizes displays a mismatch between dwstat (correct creation size) and the output of scontrol show burst.

Last week we changed the wlm_pool granularity from 400GB to 200GB, updated to 15.08.7, and added some Cray Datawarp patches.

Here is one example dwstat = 5.2TiB vs. slurm = 2720M:

<dwstat all snipped>

sess state             token creator owner             created expiration nodes
1293 CA---            presv1     CLI 15448 2016-01-19T14:59:21      never     0
inst state sess     bytes nodes             created expiration intact             label public confs
1114 CA--- 1293    5.2TiB    25 2016-01-19T14:59:21      never   true            presv1   true     1
conf state inst    type access_type activs
1368 CA--- 1114 scratch      stripe      0
 frag state inst  capacity gran     node
29347  CA-- 1114 212.91GiB 4MiB nid00913
29348  CA-- 1114 212.91GiB 4MiB nid02062
29349  CA-- 1114 212.91GiB 4MiB nid01418
29350  CA-- 1114 212.91GiB 4MiB nid01994
29351  CA-- 1114 212.91GiB 4MiB nid00785
29352  CA-- 1114 212.91GiB 4MiB nid00782
29353  CA-- 1114 212.91GiB 4MiB nid00142
29354  CA-- 1114 212.91GiB 4MiB nid02189
29355  CA-- 1114 212.91GiB 4MiB nid00781
29356  CA-- 1114 212.91GiB 4MiB nid00457
29357  CA-- 1114 212.91GiB 4MiB nid01098
29358  CA-- 1114 212.91GiB 4MiB nid00146
29359  CA-- 1114 212.91GiB 4MiB nid01865
29360  CA-- 1114 212.91GiB 4MiB nid01481
29361  CA-- 1114 212.91GiB 4MiB nid01737
29362  CA-- 1114 212.91GiB 4MiB nid00854
29363  CA-- 1114 212.91GiB 4MiB nid00653
29364  CA-- 1114 212.91GiB 4MiB nid02253
29365  CA-- 1114 212.91GiB 4MiB nid01225
29366  CA-- 1114 212.91GiB 4MiB nid00269
29367  CA-- 1114 212.91GiB 4MiB nid01237
29368  CA-- 1114 212.91GiB 4MiB nid01678
29369  CA-- 1114 212.91GiB 4MiB nid01802
29370  CA-- 1114 212.91GiB 4MiB nid01033
29371  CA-- 1114 212.91GiB 4MiB nid00853

scontrol show burst
Name=cray DefaultPool=wlm_pool Granularity=218016M TotalSpace=872936064M UsedSpace=21588704M
  StageInTimeout=86400 StageOutTimeout=86400 Flags=EnablePersistent,TeardownFailure
  GetSysState=/opt/cray/dw_wlm/default/bin/dw_wlm_cli
  Allocated Buffers:
    Name=presv1 CreateTime=2016-01-19T14:59:21 Size=2720M State=allocated UserID=dpaul(15448)
  Per User Buffer Use:
    UserID=dpaul(15448) Used=2720M
Comment 1 Tim Wickberg 2016-01-26 01:47:22 MST
Can you attach the output for "dw_wlm_cli --function pools" ?

Slurm parses that to construct the internal BB pool state, and I'm curious if there's a mismatch between that output and dwstat.
Comment 2 Moe Jette 2016-01-26 02:10:46 MST
Were the reservations with the size mismatch created before the update?
I'm wondering if its the granularity conversion that messed things up, in Slurm and/or DataWarp.
Comment 3 Moe Jette 2016-01-26 02:26:34 MST
I just reviewed the code in Slurm. We don't save burst buffer allocation sizes (except when emulating a Cray). The information all comes from the Cray APIs when Slurm starts up. That leads me to suspect that the Cray software didn't handle the granularity change well. The APIs report allocation sizes in terms of "quantity" (could of blocks, each having a size of "granularity", where the "granularity" is associated with the pool).
Comment 4 Moe Jette 2016-01-26 02:40:56 MST
Typo in previous message:

(In reply to Moe Jette from comment #3)
> I just reviewed the code in Slurm. We don't save burst buffer allocation
> sizes (except when emulating a Cray). The information all comes from the
> Cray APIs when Slurm starts up. That leads me to suspect that the Cray
> software didn't handle the granularity change well. The APIs report
> allocation sizes in terms of "quantity" (could of blocks, each having a size
                                           count
> of "granularity", where the "granularity" is associated with the pool).
Comment 5 David Paul 2016-01-27 04:27:39 MST
RE: Can you attach the output for "dw_wlm_cli --function pools" ?

nid00837:~ # /opt/cray/dw_wlm/default/bin/dw_wlm_cli --function pools
{"pools": [{"free": 381545, "granularity": 16777216, "id": "test_pool", "quantity": 381545, "units": "bytes"}, {"free": 3936, "granularity": 228606345216, "id": "wlm_pool", "quantity": 4004, "units": "bytes"}]}

RE: Were the reservations with the size mismatch created before the update?

Both. All reservation were created AFTER changing the granularity (which was changed prior to the software updates).  The "presv1" PR was created prior to the software updates (1/20).  The other PRs were created after the software updates.

Is it correct the Slurm deals with units of MBs (i.e. 1,048,576 bytes)?
Comment 6 Moe Jette 2016-01-27 04:47:39 MST
(In reply to David Paul from comment #5)
> RE: Can you attach the output for "dw_wlm_cli --function pools" ?
> 
> nid00837:~ # /opt/cray/dw_wlm/default/bin/dw_wlm_cli --function pools
> {"pools": [{"free": 381545, "granularity": 16777216, "id": "test_pool",
> "quantity": 381545, "units": "bytes"}, {"free": 3936, "granularity":
> 228606345216, "id": "wlm_pool", "quantity": 4004, "units": "bytes"}]}

Doing the math, for "wlm_pool", that works out to:
Granularity=218016M TotalSpace=872936064M UsedSpace=14825088M
While Slurm reported (from your initial ticket):
Granularity=218016M TotalSpace=872936064M UsedSpace=21588704M
So that is likely correct.

> RE: Were the reservations with the size mismatch created before the update?
> 
> Both. All reservation were created AFTER changing the granularity (which was
> changed prior to the software updates).  The "presv1" PR was created prior
> to the software updates (1/20).  The other PRs were created after the
> software updates.
> 
> Is it correct the Slurm deals with units of MBs (i.e. 1,048,576 bytes)?

Slurm works in units of bytes, but adds a suffix of "M", "G", "T", etc as appropriate.

Could you please attach the output of the following 2 commands. This is what Slurm is working from to determine current buffer state:
dw_wlm_cli -v --function show_sessions
dw_wlm_cli -v --function show_instances
Comment 7 Moe Jette 2016-01-27 05:01:02 MST
> > Is it correct the Slurm deals with units of MBs (i.e. 1,048,576 bytes)?
> 
> Slurm works in units of bytes, but adds a suffix of "M", "G", "T", etc as
> appropriate.

PS: Slurm does not display burst buffer size information a decimal point. It only promotes the suffix (e.g. from "M" to "G") if the value can be evenly divided by 1024.
Comment 8 David Paul 2016-01-27 08:07:01 MST
I have removed and recreated the "presv1" PR.  The size (5TB) is now displayed correctly (Size=5450400M).

One that is still inconsistent is djbTest - dwstat=212.91GiB , Slurm=Size=928M

nid00837:~ # dwstat most
     pool units quantity      free      gran
test_pool bytes  5.82TiB   5.82TiB     16MiB
 wlm_pool bytes 832.5TiB 818.15TiB 212.91GiB

sess state       token creator owner             created expiration nodes
1374 CA---     djbTest     CLI 61692 2016-01-23T17:03:19      never     0
inst state sess     bytes nodes             created expiration intact       label public confs
1193 CA--- 1374 212.91GiB     1 2016-01-23T17:03:19      never   true     djbTest   true     1

[dpaul@cori09]==> scontrol show burst
Name=cray DefaultPool=wlm_pool Granularity=218016M TotalSpace=872936064M UsedSpace=27249280M
  StageInTimeout=86400 StageOutTimeout=86400 Flags=EnablePersistent,TeardownFailure
  GetSysState=/opt/cray/dw_wlm/default/bin/dw_wlm_cli
  Allocated Buffers:
    Name=djbTest CreateTime=2016-01-23T17:03:19 Size=928M State=allocated UserID=djbard(61692)

Here are the outputs from the requested commands:

nid00837:~ # /opt/cray/dw_wlm/default/bin/dw_wlm_cli --function show_sessions

{"sessions": [{"created": 1453369801, "creator": "CLI", "expiration": 0, "expired": false, "id": 1342, "links": {"client_nodes": []}, "owner": 69266, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}, "token": "andreyBB"}, {"created": 1453597399, "creator": "CLI", "expiration": 0, "expired": false, "id": 1374, "links": {"client_nodes": []}, "owner": 61692, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}, "token": "djbTest"}, {"created": 1453664981, "creator": "CLI", "expiration": 0, "expired": false, "id": 1396, "links": {"client_nodes": []}, "owner": 61845, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}, "token": "pbbcombine7"}, {"created": 1453683395, "creator": "SLURM", "expiration": 0, "expired": false, "id": 1397, "links": {"client_nodes": ["nid00043"]}, "owner": 61692, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}, "token": "987616"}, {"created": 1453739246, "creator": "SLURM", "expiration": 0, "expired": false, "id": 1410, "links": {"client_nodes": []}, "owner": 60891, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}, "token": "1003072"}, {"created": 1453739246, "creator": "SLURM", "expiration": 0, "expired": false, "id": 1411, "links": {"client_nodes": []}, "owner": 60891, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}, "token": "1003077"}, {"created": 1453740856, "creator": "SLURM", "expiration": 0, "expired": false, "id": 1416, "links": {"client_nodes": ["nid00804", "nid00805", "nid00806", "nid00807", "nid00828", "nid00829", "nid00830", "nid00831"]}, "owner": 60891, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}, "token": "1003080"}, {"created": 1453741406, "creator": "SLURM", "expiration": 0, "expired": false, "id": 1417, "links": {"client_nodes": ["nid00416", "nid00417", "nid00418", "nid00494", "nid00884", "nid00885", "nid00886", "nid00887"]}, "owner": 60891, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}, "token": "1003081"}, {"created": 1453745502, "creator": "SLURM", "expiration": 0, "expired": false, "id": 1427, "links": {"client_nodes": ["nid02231"]}, "owner": 61692, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}, "token": "1005760"}, {"created": 1453745949, "creator": "SLURM", "expiration": 0, "expired": false, "id": 1431, "links": {"client_nodes": ["nid01106"]}, "owner": 61692, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}, "token": "1005806"}, {"created": 1453760704, "creator": "CLI", "expiration": 0, "expired": false, "id": 1457, "links": {"client_nodes": []}, "owner": 61692, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}, "token": "djbTest2"}, {"created": 1453920069, "creator": "CLI", "expiration": 0, "expired": false, "id": 1511, "links": {"client_nodes": []}, "owner": 15448, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}, "token": "presv1"}]}


nid00837:~ # /opt/cray/dw_wlm/default/bin/dw_wlm_cli --function show_instances

{"instances": [{"capacity": {"bytes": 228606345216, "nodes": 1}, "created": 1453369801, "expiration": 0, "expired": false, "id": 1162, "intact": true, "label": "andreyBB", "limits": {"write_window_length": 86400, "write_window_multiplier": 10}, "links": {"configurations": [1417], "session": 1342}, "public": true, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}}, {"capacity": {"bytes": 228606345216, "nodes": 1}, "created": 1453597399, "expiration": 0, "expired": false, "id": 1193, "intact": true, "label": "djbTest", "limits": {"write_window_length": 86400, "write_window_multiplier": 10}, "links": {"configurations": [1450], "session": 1374}, "public": true, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}}, {"capacity": {"bytes": 1143031726080, "nodes": 5}, "created": 1453664981, "expiration": 0, "expired": false, "id": 1200, "intact": true, "label": "pbbcombine7", "limits": {"write_window_length": 86400, "write_window_multiplier": 10}, "links": {"configurations": [1457], "session": 1396}, "public": true, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}}, {"capacity": {"bytes": 228606345216, "nodes": 1}, "created": 1453683395, "expiration": 0, "expired": false, "id": 1201, "intact": true, "label": "I1397-0", "limits": {"write_window_length": 86400, "write_window_multiplier": 10}, "links": {"configurations": [1458], "session": 1397}, "public": false, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}}, {"capacity": {"bytes": 228606345216, "nodes": 1}, "created": 1453739246, "expiration": 0, "expired": false, "id": 1213, "intact": true, "label": "I1410-0", "limits": {"write_window_length": 86400, "write_window_multiplier": 10}, "links": {"configurations": [1471], "session": 1410}, "public": false, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}}, {"capacity": {"bytes": 228606345216, "nodes": 1}, "created": 1453739246, "expiration": 0, "expired": false, "id": 1214, "intact": true, "label": "I1411-0", "limits": {"write_window_length": 86400, "write_window_multiplier": 10}, "links": {"configurations": [1470], "session": 1411}, "public": false, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}}, {"capacity": {"bytes": 228606345216, "nodes": 1}, "created": 1453740856, "expiration": 0, "expired": false, "id": 1219, "intact": true, "label": "I1416-0", "limits": {"write_window_length": 86400, "write_window_multiplier": 10}, "links": {"configurations": [1476], "session": 1416}, "public": false, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}}, {"capacity": {"bytes": 228606345216, "nodes": 1}, "created": 1453741406, "expiration": 0, "expired": false, "id": 1220, "intact": true, "label": "I1417-0", "limits": {"write_window_length": 86400, "write_window_multiplier": 10}, "links": {"configurations": [1477], "session": 1417}, "public": false, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}}, {"capacity": {"bytes": 228606345216, "nodes": 1}, "created": 1453745502, "expiration": 0, "expired": false, "id": 1230, "intact": true, "label": "I1427-0", "limits": {"write_window_length": 86400, "write_window_multiplier": 10}, "links": {"configurations": [1487], "session": 1427}, "public": false, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}}, {"capacity": {"bytes": 228606345216, "nodes": 1}, "created": 1453745949, "expiration": 0, "expired": false, "id": 1234, "intact": true, "label": "I1431-0", "limits": {"write_window_length": 86400, "write_window_multiplier": 10}, "links": {"configurations": [1491], "session": 1431}, "public": false, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}}, {"capacity": {"bytes": 228606345216, "nodes": 1}, "created": 1453760704, "expiration": 0, "expired": false, "id": 1257, "intact": true, "label": "djbTest2", "limits": {"write_window_length": 86400, "write_window_multiplier": 10}, "links": {"configurations": [1514], "session": 1457}, "public": true, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}}, {"capacity": {"bytes": 5715158630400, "nodes": 25}, "created": 1453920069, "expiration": 0, "expired": false, "id": 1297, "intact": true, "label": "presv1", "limits": {"write_window_length": 86400, "write_window_multiplier": 10}, "links": {"configurations": [1554], "session": 1511}, "public": true, "state": {"actualized": true, "fuse_blown": false, "goal": "create", "mixed": false, "transitioning": false}}]}
nid00837:~ #
Comment 9 Moe Jette 2016-01-27 09:22:52 MST
There is a variable without without a sufficient number of bits, so some high order bits are getting dropped. I need to review the code for more issues of this sort. I should be able to get you a patch within a couple of days.
Comment 10 Moe Jette 2016-01-27 09:44:26 MST
Fortunately, only 2 lines need to change, increasing a couple of variables from 32 to 64 bits. Patch is at location below:

https://github.com/SchedMD/slurm/commit/214b3abe9a41895adabc8168f03d4619c92932fc.patch

Buffers allocated while the slurmctld daemon should be fine. When the daemon restarts, buffer sizes (expressed in bytes) over 32-bits will get truncated.

This fix will be in version 15.08.9 when released, likely mid-February.
Comment 11 David Paul 2016-01-27 10:10:33 MST
Thanks for the quick turnaround, much appreciated!