| Summary: | Support for multiple burst buffer storage pools | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | David Paul <dpaul> |
| Component: | Burst Buffers | Assignee: | Moe Jette <jette> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | djbard, dmjacobsen, dpaul, tim |
| Version: | 15.08.3 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | NERSC | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | Cori | CLE Version: | |
| Version Fixed: | 16.05.0-pre1 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
David Paul
2015-11-24 06:54:51 MST
There is some code to handle this in place in the original Slurm code (from early 2015), but it has not been ported to the new Cray APIs or tested in a while. Let me do some testing and get back to you. I strongly suspect that some Slurm code updates will be required. Thanks Moe. No need to spend cycles on this just yet. Is there a way to specify the default pool to be other than "wlm_pool"? If we really need to test this we could change the default to "test_pool". (In reply to David Paul from comment #2) > Is there a way to specify the default pool to be other than "wlm_pool"? In your burst_buffer.conf, set: DefaultPool=wlm_pool More info here: http://slurm.schedmd.com/burst_buffer.conf.html But that changes the default for all jobs. Yes, we would have to do this in a dedicated test mode. We don't specify DefaultPool=wlm_pool.
The man page refers states: "The default value is the first pool reported by the burst buffer infrastructure."
With the addition of the "test_pool", dwstat reports test_pool first.
To be safe, I'll add DefaultPool=wlm_pool to the burst_buffer.conf file.
+++
root@ctl1==> dwstat pools
pool units quantity free gran
test_pool bytes 5.82TiB 5.82TiB 16MiB
wlm_pool bytes 832.52TiB 832.13TiB 397.44GiB
(In reply to David Paul from comment #2) > Thanks Moe. No need to spend cycles on this just yet. I've done a quick review of the logic and there's definitely some development required for this. Based upon your comment, I'm changing this to severity 4. (In reply to Moe Jette from comment #6) > I've done a quick review of the logic and there's definitely some > development required for this. Based upon your comment, I'm changing this to > severity 4. Just an FYI, we're discussing the addition of a "pool" option to the #DW and #BB directives creating buffers. I'm starting work on this development, but targeting release in Slurm version 16.05 (May 2016). I have finished adding support for multiple resource pools. The changes were quite extensive including changes to data structures and RPCs, so this needs to go into Slurm version 16.05 to be released in May 2016. That version of Slurm might be suitable for use on a testbed, but definitely not on a production system. In any case, the commit with the new logic is here: https://github.com/SchedMD/slurm/commit/d9259af5dc747384f5bfd55fcdb3625ea602ef9f Here is an example of what "scontrol show burst" looks like now. Note the line starting with "AltPoolName" (Alternate pool) Also the "Allocated Buffers" now include a "Pool=" field reported. $ scontrol show burst Name=cray DefaultPool=dwcache Granularity=16M TotalSpace=32G UsedSpace=16M AltPoolName[0]=test_pool Granularity=1 TotalSpace=2K UsedSpace=10 Flags=DisablePersistent,EmulateCray StageInTimeout=86400 StageOutTimeout=86400 ValidateTimeout=5 OtherTimeout=300 AllowUsers=jette,daemon GetSysState=/home/jette/Desktop/SLURM/install.linux/sbin/dw_wlm_cli Allocated Buffers: Name=test12 CreateTime=2016-01-05T10:53:17 Pool=dwcache Size=16M State=allocated UserID=jette(1000) Name=test16 CreateTime=2016-01-05T10:53:17 Pool=test_pool Size=10 State=allocated UserID=jette(1000) Per User Buffer Use: UserID=jette(1000) Used=16777226 |