View | Details | Raw Unified | Return to ticket 1062 | Differences between
and this patch

Collapse All | Expand All

(-)a/doc/html/hdf5_profile_user_guide.shtml (-6 / +6 lines)
Lines 16-22 Link Here
16
<a id="Overview"></a>
16
<a id="Overview"></a>
17
<h2>Overview</h2>
17
<h2>Overview</h2>
18
<p>The AcctGatherProfileType/hdf5 plugin allows Slurm to coordinate collecting
18
<p>The AcctGatherProfileType/hdf5 plugin allows Slurm to coordinate collecting
19
data on jobs it runs on a cluster that is more detailed than is practical to
19
data on jobs it runs on a clustre that is more detailed than is practical to
20
include in its database. The data comes from periodically sampling various
20
include in its database. The data comes from periodically sampling various
21
performance data either collected by Slurm, the operating system, or
21
performance data either collected by Slurm, the operating system, or
22
component software. The plugin will record the data from each source
22
component software. The plugin will record the data from each source
Lines 34-40 Data from other sources may be added in the future.</p> Link Here
34
each allocated node of a job and then merged into a HDF5 file.
34
each allocated node of a job and then merged into a HDF5 file.
35
Individual files on a shared file system was chosen because it is possible
35
Individual files on a shared file system was chosen because it is possible
36
that the data is voluminous so solutions that pass data to the Slurm control
36
that the data is voluminous so solutions that pass data to the Slurm control
37
daemon via RPC may not scale to very large clusters or jobs with
37
daemon via RPC may not scale to very large clustres or jobs with
38
many allocated nodes.</p>
38
many allocated nodes.</p>
39
39
40
<p>A separate <a href="acct_gather_profile_plugins.html">
40
<p>A separate <a href="acct_gather_profile_plugins.html">
Lines 74-80 configured in the Link Here
74
This enables the HDF5 plugin:
74
This enables the HDF5 plugin:
75
<p><b>AcctGatherProfileType</b> = acct_gather_profile/hdf5</p>
75
<p><b>AcctGatherProfileType</b> = acct_gather_profile/hdf5</p>
76
This sets the sampling frequency for data types:
76
This sets the sampling frequency for data types:
77
<p><b>JobAcctGatherFrequency</b> = &lt;seconds&gt;</p>
77
<p><b>AcctGatherNodeFreq</b> = &lt;seconds&gt;</p>
78
</div>
78
</div>
79
</div>
79
</div>
80
<div style="margin-left: 20px;">
80
<div style="margin-left: 20px;">
Lines 114-120 have a default polling frequency specified in slurm.conf in the Link Here
114
JobAcctGatherFrequency parameter. The polling frequency can be overridden
114
JobAcctGatherFrequency parameter. The polling frequency can be overridden
115
using the --acctg-freq
115
using the --acctg-freq
116
<a href="srun.html">srun</a> parameter.
116
<a href="srun.html">srun</a> parameter.
117
They are both of the form task=sec,energy=sec,luster=sec,network=sec.<p>
117
They are both of the form task=sec,energy=sec,lustre=sec,network=sec.<p>
118
118
119
<p>The IPMI energy plugin also needs the EnergyIPMIFrequency value set
119
<p>The IPMI energy plugin also needs the EnergyIPMIFrequency value set
120
in the acct_gather.conf file. This sets the rate at which the plugin samples
120
in the acct_gather.conf file. This sets the rate at which the plugin samples
Lines 262-271 correlate activity with other sources such as logs.</DD></DT> Link Here
262
<DD>CPU Frequency at time of sample in kilohertz.</DD></DT>
262
<DD>CPU Frequency at time of sample in kilohertz.</DD></DT>
263
</DL>
263
</DL>
264
264
265
<h3>Luster Data</h3>
265
<h3>Lustre Data</h3>
266
<p><b>AcctGatherFilesystemType</b>=acct_gather_filesystem/lustre<p>
266
<p><b>AcctGatherFilesystemType</b>=acct_gather_filesystem/lustre<p>
267
is required in slurm.conf to collect task data.
267
is required in slurm.conf to collect task data.
268
Appropriately set luster=freq in either JobAcctGatherFrequency in slurm.conf
268
Appropriately set lustre=freq in either JobAcctGatherFrequency in slurm.conf
269
or in --acctg-freq on the command line.</p>
269
or in --acctg-freq on the command line.</p>
270
270
271
<p>Each data sample in the Lustre Time Series contains the following data items.
271
<p>Each data sample in the Lustre Time Series contains the following data items.
(-)a/doc/man/man5/slurm.conf.5 (-1 / +1 lines)
Lines 214-220 counters of Infiniband devices through the OFED library. Link Here
214
\fBAcctGatherFilesystemType\fR
214
\fBAcctGatherFilesystemType\fR
215
Identifies the plugin to be used for filesystem traffic accounting.
215
Identifies the plugin to be used for filesystem traffic accounting.
216
The plugin is activated only when profiling on hdf5 files is activated and
216
The plugin is activated only when profiling on hdf5 files is activated and
217
the user asks for filesystem data collection for jobs through \-\-profile=Network
217
the user asks for filesystem data collection for jobs through \-\-profile=Lustre
218
(or =All). The collection of filesystem traffic data takes place on node level,
218
(or =All). The collection of filesystem traffic data takes place on node level,
219
hence only in case of exclusive job allocation the collected values will
219
hence only in case of exclusive job allocation the collected values will
220
reflect the jobs real traffic. All filesystem traffic data are logged on hdf5 files
220
reflect the jobs real traffic. All filesystem traffic data are logged on hdf5 files

Return to ticket 1062