FTIO captures periodic I/O using frequency techniques. Many high-performance computing (HPC) applications perform their I/O in bursts following a periodic pattern. Predicting such patterns can be very efficient for I/O contention avoidance strategies, including burst buffer management, for example. FTIO allows offline detection and online prediction of periodic I/O phases. FTIO uses the discrete Fourier transform (DFT), combined with outlier detection methods to extract the dominant frequency in the signal. Additional metrics gauge the confidence in the output and tell how far from being periodic the signal is. A complete description of the approach is provided here.
This repository provides two main Python-based tools:
ftio: Uses frequency techniques and outlier detection methods to find the period of I/O phasespredictor: Implements the online version of FTIO. It reinvokes FTIO whenever new traces are appended to the monitored file. See online prediction for more details. We recommend using TMIO to generate the file with the I/O traces.
Other tools:
ioplot: Generates interactive plots in HTMLioparse: Parses and merges several traces to an Extra-P supported format. This allows one to examine the scaling behavior of the monitored metrics. Traces generated by FTIO (frequency modls), TMIO ( msgpack, json and jsonl) and other tools (Darshan, Recorder, and TAU Metric Proxy) are supported.
Table of Contents
Join the Slack channel or see the latest updates here: Latest News
FTIO is available on PYPI and can be easily installed via pip. For the most recent
stable GitHub version, FTIO can be installed either automatically
or manually. For the development version with the latest code functionalities, FTIO can be
installed in the development mode.
As a prerequisite, for the virtual environment, python3.11-venv is needed, which can be installed on Ubuntu, for
example, with:
apt install python3.11-venvIf you want to contribute to the code, we advise that you install FTIO as mentioned under contributing.
FTIO is installed by default in a virtual environment. For the automated installation, simply execute the command:
# clone FTIO
git clone https://github.com/tuda-parallel/FTIO.git
cd FTIO
# uses by default python3
make install
# or using a specific python version,
# which is often needed on a cluster
make install PYTHON=python3.12
# or additionally install all optional packages
make full PYTHON=python3.12This generates a virtual environment in the current directory, sources .venv/bin/activate, and installs FTIO as a
module.
If you are working on an HPC cluster, you first need to load the Python module (e.g., module load python/3.12) and
eventually add ~/.loacl/bin to your PATH (e.g., export PATH=$PATH:~/.local/bin) in case it's not there yet.
If you don't need a dedicated environment, just call:
make ftio PYTHON=python3FTIO is available on PYPI and can be easily installed via pip:
pip install ftio-hpcThis instals FTIO in the most recently stable version (main branch).
Note
Note there are currently issues with pyDarshan on Mac and windows, that can be solved as mentioned here
Create a virtual environment if needed and activate it:
git clone https://github.com/tuda-parallel/FTIO.git
cd FTIO
python3 -m venv .venv
source .venv/bin/activateInstall all tools provided in this repository simply by using pip:
pip install .
#Or with external dependencies for improved performance
pip install '.[external-libs]'
#Or with external dependencies and style tools
pip install '.[external-libs,development-libs]'
#Or with external dependencies, style tools, and plot libs (to call `ioplot` with dash support)
pip install '.[external-libs,development-libs,plot-libs]'Note
You need to activate the environment to use ftio and the other tools using:
source path/to/venv/bin/activateNote
Note there are currently issues with pyDarshan on Mac and windows, that can be solved as mentioned here
By default, FTIO installs into an isolated virtual environment. The following steps guide you through retrieving and
configuring the latest development version with debug symbols and editable instal using the make debug target:
# 1. Clone the FTIO repository
git clone https://github.com/tuda-parallel/FTIO.git
cd FTIO
# 2. Switch to the development branch
git checkout development
# 3. Install in editable/debug mode (defaults to current python)
make debug
# To specify a different Python interpreter (e.g., on an HPC cluster):
make debug PYTHON=python3.12This process establishes a development environment that:
- Instantiates a virtual environment (
.venv/) in the project directory. - Activates the environment by sourcing the
.venv/bin/activatescript (i.e.,source .venv/bin/activate). - Installs FTIO in “editable” mode, ensuring that any modifications to the source code are immediately reflected upon import.
For installation instructions see installation.
To call ftio on a file execute:
ftio filename.extensionThere are three options to use ftio and predictor:
- Provide a supported file format to the tool. Supported extensions are
json,jsonLines,msgpack, anddarshan. For recorder, you provide the path to the folder instead offilename.extension. For more on the input format see supported file formats. There is also an option to provide a custom format. - Use the API. This is particularly good if you just want to experiment with the tool, or directly jump into using it with as little effort as possible.
- Send TCP messages over ZeroMQ (ZMQ) to the tools as described here. There is also an API example with
ZMQ and GekkoFS here. Usually,
predictoris used with ZMQ, as it makes little sense to useftiowith this option.
In all cases, various options can be provided to ftio and predictor. To see all available command line arguments,
call:
ftio -h
usage: ftio [-h] [-m MODE] [-s SOURCE] [-r RENDER] [-f FREQ]
[--memory_limit MEMORY_LIMIT] [-ts TS] [-te TE]
[-tr TRANSFORMATION] [-e ENGINE] [-rp]
[-o {z-score,dbscan,forest,lof,peak}] [-p {rpde,sf,corr,ind}]
[-ce] [-le LEVEL] [--wavelet WAVELET] [-t TOL] [-d]
[-re [RECONSTRUCTION ...]] [-np] [-n N_FREQ] [--fourier_fit] [-au]
[-ml] [-w {frequency_hits,data,adwin,cusum,ph}] [-hi HITS] [-v]
[--zmq] [--gui] [--zmq_source ZMQ_SOURCE]
[--zmq_address ZMQ_ADDRESS] [--zmq_port ZMQ_PORT]
[--zmq_port_reply ZMQ_PORT_REPLY]
[--filter_type {lowpass,highpass,bandpass}]
[--filter_cutoff FILTER_CUTOFF [FILTER_CUTOFF ...]]
[--filter_order FILTER_ORDER] [--tfpf TFPF]
[--stft_window STFT_WINDOW] [--debounce] [-bw]
[--burst_energy_fraction BURST_ENERGY_FRACTION]
[--phase-automaton] [--pa-method {cusum,ph,adwin,ksigma,none}]
[--pa-period-ratio RATIO] [--pa-no-rank-trigger]
[--pa-export PATH] [--sum] [--no_sum] [--avr] [--no_avr] [--ind]
[--no_ind] [-cf CUSTOM_FILE] [-x DXT_MODE] [-l LIMIT]
files [files ...]
There are several options available to enhance the frequency predictions from ftio. In the standard mode, the DFT is
used in combination with an outlier detection method. Additionally, autocorrelation can be used to further increase the
confidence in the results:
- DFT + outlier detection (Z-score, DB-Scan, Isolation forest, peak detection, or LOF)
- Optionally: Autocorrelation + Peak detection (
-auflag) - If step 2. is performed, the results from both predictions are merged automatically
See offline detection for more details.
Several flags can be specified. The most relevant settings are:
| Flag | Description |
|---|---|
| files | file, file list (file 0 ... file n), folder, or folder list (folder 0.. folder n) containing traces (positional argument) |
| -h, --help | show this help message and exit |
| -m MODE, --mode MODE | if the trace file contains several I/O modes, a specific mode can be selected. Supported modes are: write_async, read_async, write_sync, read_sync |
| -s SOURCE, --source SOURCE | the source of the files: tmio, or custom. See supported file formats |
| -r RENDER, --render RENDER | specifies how the plots are rendered. Either dynamic (default) or static |
| -f FREQ, --freq FREQ | specifies the sampling rate with which the continuous signal is discretized (default=10Hz). This directly affects the highest captured frequency (Nyquist). The value is specified in Hz. In case this value is set to -1, the auto mode is launched which sets the sampling frequency automatically to the smallest change in the bandwidth detected. Note that the lowest allowed frequency in the auto mode determine by the memory_limit |
| --memory_limit MEMORY_LIMIT | Memory limit in GB during discretization in case freq is passed with -1. Default is 0.5 GB. |
| -ts TS, --ts TS | modifies the start time of the examined time window |
| -te TE, --te TE | modifies the end time of the examined time window |
| -tr TRANSFORMATION, --transformation TRANSFORMATION | specifies the frequency technique to use. Supported modes are: dft (default), stft, astft, wave_disc, wave_cont. Experimental (require pip install "ftio[amd-libs]"): efd, vmd. See Frequency Methods. |
| -e ENGINE, --engine ENGINE | specifies the engine used to display the figures. Either plotly (default) or matplotlib can be used. Plotly is used to generate interactive plots as HTML files. Set this value to no if you do not want to generate plots |
| -rp, --runtime_plots | if set, shows the plot at runtime |
| -o OUTLIER, --outlier OUTLIER | outlier detection method: z-score (default), dbscan, forest, lof, or peak (find_peaks) |
| -p DETECTION, --periodicity_detection DETECTION | periodicity detection method after outlier detection: rpde (RPDE), sf (Spectral flatness), corr (Correlation), or ind (Correlation for individual periods). Default: none |
| -ce, --cepstrum | enable Cepstrum plotting for the DFT |
| -le LEVEL, --level LEVEL | specifies the decomposition level for the discrete wavelet transformation (default=3). If specified as auto, the maximum decomposition level is automatic calculated |
| --wavelet WAVELET | Wavelet to use. See pywt documentation for wavelet families: pywt.wavelist(kind="continuous") or pywt.wavelist(kind="discrete") (default "morl" for continuous and "db1" for discrete) |
| -t TOL, --tol TOL | tolerance value |
| -d, --dtw | performs dynamic time warping on the top 3 frequencies (highest contribution) calculated using the DFT if set (default=False) |
| -re, --reconstruction | plots reconstruction of top 10 signals on figure |
| -np, --no-psd | if set, replace the power density spectrum (a*a/N) with the amplitude spectrum (a) |
| -n N_FREQ, --n_freq N_FREQ | number of frequencies to extract. By default FTIO finds the dominant frequency. With this flag, up to "n_freq" can be extracted |
| --fourier_fit | perform Fourier basis fitting on the signal using frequencies set via --n_freq |
| -au, --autocorrelation | if set, autocorrelation is calculated in addition to DFT. The results are merged to a single prediction at the end |
| -ml, --machine_learning | if set, machine learning is enabled (api call only) |
| -w STRATEGY, --window_adaptation STRATEGY | online window adaptation strategy: frequency_hits, data, adwin, cusum, ph. For 'adwin', 'cusum', and 'ph', '--gui' is supported |
| -hi HITS, --hits HITS | specifies the number of hits needed to adapt the time window. A hit occurs once a dominant frequency is found |
| -v, --verbose | sets verbose on or off (default=False) |
| --zmq | avoids opening the generated HTML file since zmq is used |
| --gui | enables forwarding prediction data to the FTIO GUI dashboard. Start the GUI first with 'ftio-gui' |
| --zmq_source SOURCE | the source of zmq: TMIO, direct, etc. |
| --zmq_address ADDRESS | zmq address for communication |
| --zmq_port PORT | zmq port for communication |
| --filter_type TYPE | Type of filter to apply: lowpass, highpass, bandpass |
| --filter_cutoff CUTOFF | Cutoff frequency for low/high-pass filters or low and high cutoff for bandpass |
| --filter_order ORDER | Order of Butterworth filter |
| --tfpf TFPF | Number of time-frequency peak filtering iterations |
| --stft_window WINDOW | Window length for STFT analysis in samples or time (e.g., '20s'). If 0, it is automatically calculated based on the dominant frequency |
-bw, --burst_width |
Estimate per-period burst width and duty cycle using the shortest contiguous time window containing --burst_energy_fraction of each period's total energy (O(N), negligible cost). Results are printed alongside confidence and stored in prediction.burst_widths, burst_width_median, burst_width_min, burst_width_max, and duty_cycle. Supported by all frequency methods (DFT, STFT, ASTFT, CWT, DWT). |
| --burst_energy_fraction FRACTION | Energy fraction for burst width estimation (default: 0.95). The burst window is the shortest contiguous interval containing this fraction of the period's total energy. Only used with -bw / --burst_width. |
| --sum / --no_sum | sum plot: True (default) or False |
| --avr / --no_avr | avr plot: True (default) or False |
| --ind / --no_ind | ind plot: True or False (default) |
| -cf FILE, --custom_file FILE | passes a [path/filename.py] file containing the translation and pattern for a custom file format |
| -x DXT_MODE, --dxt_mode DXT_MODE | select data to extract from Darshan traces (DXT_POSIX or DXT_MPIIO (default)) |
| -l LIMIT, --limit LIMIT | max ranks to consider when reading a folder |
predictor has the same syntax as ftio. All arguments that are available for ftio are also available for
predictor.
There is a 8.jsonl file provided for testing
under examples.
On your system, navigate to the
folder examples/tmio/JSONL and call:
ftio 8.jsonlSeveral examples are provided under examples. See also the examples provided here for the different file formats.
Alternatively, the artifact folder contains several instructions and examples traces from the FTIO paper that can be simply downloaded as described here.
As ftio supports Darshan traces, you could download also traces
from https://hpcioanalysis.zdv.uni-mainz.de/ and execute FTIO on them as
described here.
For an online example with predictor, you can follow the instructions here
for HACC-IO.
Kindly see the instructions provided under docs/contributing.md.
Note
If you are a student from TU Darmstadt, kindly see these instructions.
We sincerely thank our contributors for their valuable contributions!
- Ahmad Tarraf: ahmad.tarraf@tu-darmstadt.de
Distributed under the BSD 3-Clause License. See LICENCE for more information.
FTIO is developed and maintained by the Parallel Programming group at TU Darmstadt. The main author of this project is Ahmad Tarraf.
For a full list of contributors, please see CONTRIBUTORS.md.
This work is a result of cooperation between the Technical University of Darmstadt and INRIA in the scope of the EuroHPC ADMIRE project.
@inproceedings{AT24_ftio,
author={Tarraf, Ahmad and Bandet, Alexis and Boito, Francieli and Pallez, Guillaume and Wolf, Felix},
booktitle={2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)},
title={Capturing periodic {I/O} using frequency techniques},
month=may,
year={2024},
pages={465-478},
publisher = {IEEE},
doi={10.1109/IPDPS57955.2024.00048}
}
-
A. Tarraf, A. Bandet, F. Boito, G. Pallez, and F. Wolf, "Capturing Periodic I/O Using Frequency Techniques," in 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS), San Francisco, CA, USA, May 2024, pp. 1–14.
-
A. Tarraf and F. Wolf, "Improving I/O Phase Predictions in Ftio Using Hybrid Wavelet-Fourier Analysis," Frontiers in High Performance Computing (Volume 3 - 2025), Feb, 2026 (https://doi.org/10.3389/fhpcp.2025.1638924)
-
Tarraf, A. Bandet, F. Boito, G. Pallez, and F. Wolf, “FTIO: Detecting I/O periodicity using frequency techniques.” arXiv preprint arXiv:2306.08601 (2023).