|
Lines 16-22
Link Here
|
| 16 |
<a id="Overview"></a> |
16 |
<a id="Overview"></a> |
| 17 |
<h2>Overview</h2> |
17 |
<h2>Overview</h2> |
| 18 |
<p>The AcctGatherProfileType/hdf5 plugin allows Slurm to coordinate collecting |
18 |
<p>The AcctGatherProfileType/hdf5 plugin allows Slurm to coordinate collecting |
| 19 |
data on jobs it runs on a cluster that is more detailed than is practical to |
19 |
data on jobs it runs on a clustre that is more detailed than is practical to |
| 20 |
include in its database. The data comes from periodically sampling various |
20 |
include in its database. The data comes from periodically sampling various |
| 21 |
performance data either collected by Slurm, the operating system, or |
21 |
performance data either collected by Slurm, the operating system, or |
| 22 |
component software. The plugin will record the data from each source |
22 |
component software. The plugin will record the data from each source |
|
Lines 34-40
Data from other sources may be added in the future.</p>
Link Here
|
| 34 |
each allocated node of a job and then merged into a HDF5 file. |
34 |
each allocated node of a job and then merged into a HDF5 file. |
| 35 |
Individual files on a shared file system was chosen because it is possible |
35 |
Individual files on a shared file system was chosen because it is possible |
| 36 |
that the data is voluminous so solutions that pass data to the Slurm control |
36 |
that the data is voluminous so solutions that pass data to the Slurm control |
| 37 |
daemon via RPC may not scale to very large clusters or jobs with |
37 |
daemon via RPC may not scale to very large clustres or jobs with |
| 38 |
many allocated nodes.</p> |
38 |
many allocated nodes.</p> |
| 39 |
|
39 |
|
| 40 |
<p>A separate <a href="acct_gather_profile_plugins.html"> |
40 |
<p>A separate <a href="acct_gather_profile_plugins.html"> |
|
Lines 74-80
configured in the
Link Here
|
| 74 |
This enables the HDF5 plugin: |
74 |
This enables the HDF5 plugin: |
| 75 |
<p><b>AcctGatherProfileType</b> = acct_gather_profile/hdf5</p> |
75 |
<p><b>AcctGatherProfileType</b> = acct_gather_profile/hdf5</p> |
| 76 |
This sets the sampling frequency for data types: |
76 |
This sets the sampling frequency for data types: |
| 77 |
<p><b>JobAcctGatherFrequency</b> = <seconds></p> |
77 |
<p><b>AcctGatherNodeFreq</b> = <seconds></p> |
| 78 |
</div> |
78 |
</div> |
| 79 |
</div> |
79 |
</div> |
| 80 |
<div style="margin-left: 20px;"> |
80 |
<div style="margin-left: 20px;"> |
|
Lines 114-120
have a default polling frequency specified in slurm.conf in the
Link Here
|
| 114 |
JobAcctGatherFrequency parameter. The polling frequency can be overridden |
114 |
JobAcctGatherFrequency parameter. The polling frequency can be overridden |
| 115 |
using the --acctg-freq |
115 |
using the --acctg-freq |
| 116 |
<a href="srun.html">srun</a> parameter. |
116 |
<a href="srun.html">srun</a> parameter. |
| 117 |
They are both of the form task=sec,energy=sec,luster=sec,network=sec.<p> |
117 |
They are both of the form task=sec,energy=sec,lustre=sec,network=sec.<p> |
| 118 |
|
118 |
|
| 119 |
<p>The IPMI energy plugin also needs the EnergyIPMIFrequency value set |
119 |
<p>The IPMI energy plugin also needs the EnergyIPMIFrequency value set |
| 120 |
in the acct_gather.conf file. This sets the rate at which the plugin samples |
120 |
in the acct_gather.conf file. This sets the rate at which the plugin samples |
|
Lines 262-271
correlate activity with other sources such as logs.</DD></DT>
Link Here
|
| 262 |
<DD>CPU Frequency at time of sample in kilohertz.</DD></DT> |
262 |
<DD>CPU Frequency at time of sample in kilohertz.</DD></DT> |
| 263 |
</DL> |
263 |
</DL> |
| 264 |
|
264 |
|
| 265 |
<h3>Luster Data</h3> |
265 |
<h3>Lustre Data</h3> |
| 266 |
<p><b>AcctGatherFilesystemType</b>=acct_gather_filesystem/lustre<p> |
266 |
<p><b>AcctGatherFilesystemType</b>=acct_gather_filesystem/lustre<p> |
| 267 |
is required in slurm.conf to collect task data. |
267 |
is required in slurm.conf to collect task data. |
| 268 |
Appropriately set luster=freq in either JobAcctGatherFrequency in slurm.conf |
268 |
Appropriately set lustre=freq in either JobAcctGatherFrequency in slurm.conf |
| 269 |
or in --acctg-freq on the command line.</p> |
269 |
or in --acctg-freq on the command line.</p> |
| 270 |
|
270 |
|
| 271 |
<p>Each data sample in the Lustre Time Series contains the following data items. |
271 |
<p>Each data sample in the Lustre Time Series contains the following data items. |