AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation

This repositorty is official code of AdaHOP: Adaptive Hadamard Transform with Outlier-Pattern-Aware strategy. AdaHOP achieves BF16 training quality at MXFP4 precision while delivering up to 3.6X memory compression and 1.46X kernel acceleration over BF16 training.

This is not an officially supported AMD product.

Overview

Hadamard transforms have become a key tool for stabilizing low-precision training, but existing methods apply them uniformly across tensors and computation paths. We show that this one-size-fits-all strategy is inherently limited: Hadamard smoothing reduces quantization error only when its direction is properly aligned with the operand's outlier structure. Through a systematic study of weights, activations, and gradients in LLM training, we identify three stable outlier patterns, Row-wise, Column-wise, and None, and show that each outlier pattern pair in matrix multiplication requires a distinct transform or outlier-handling strategy. We propose AdaHOP, Adaptive Hadamard transform with Outlier-Pattern-aware strategy, which applies Inner Hadamard Transform (IHT) when inner-dimension mixing properly suppresses the operands’ outliers, and selectively applies Outlier Extraction (OE) that extracts dominant outlier rows or columns into a high-precision path when it does not. With fused, hardware-aware Triton kernels, AdaHOP enables training from scratch at MXFP4 precision with BF16-level quality, while achieving up to 3.6× memory compression, 1.46× end-to-end training speedup over BF16.

Installation

1. Clone the repository

git clone

2. Launch Docker container

bash launch_LPT_accel_docker.sh

3. Install build dependencies

pip install meson-python pybind11 meson ninja

4. Install torchtitan locally

cd low-precision-training/
pip install --no-build-isolation --no-deps -e .

5. Download Hugging Face tokenizer for each models

python scripts/download_hf_assets.py --repo_id meta-llama/Llama-3.2-1B --assets tokenizer --hf_token=[YOUR_HF_TOKEN]

6. Run training

HIP_VISIBLE_DEVICES=0,1,2,3 NGPU=4 CONFIG_FILE="./torchtitan/models/llama3/train_configs/llama3_3b_mxfp4_adahop_lv2.toml" ./run_train_accel.sh

7. Run Eval (LM-eval)

HIP_VISIBLE_DEVICES=0 python -m torchtitan.eval --config torchtitan/models/llama3/eval_configs/llama3_3b_eval.toml --checkpoint_path [CHECKPOINT_PATH]

Configurations

See torchtitan/models/llama3/train_configs for examples.

Model Converters

Use quantize.linear.mx converter for MXFP4 training.

[model]
converters = ["quantize.linear.mx"]

MXFP4 Options

filter_fqns: Skip quantization if the layer name matches filter_fqns.
recipe_name:
- mxfp4_1d1d: Use 1x32 block for activations, 1x32 block for weights. (Only 1d1d is supported).
enable_mxfp4_fa: Use MXFP4 Attention or not (Work-In-Progress).
enable_mxfp4_gmm: Use MXFP4 GroupedMM or not.
enable_mxfp4_linear: Use MXFP4 Linear or not.
use_sr_grad: Use Stochastic Rounding for gradients or not.
use_hadamard: Apply Hadamard transform to the inputs of dW kernels or not.

[quantize.linear.mx]
filter_fqns = ["output"]
recipe_name = "mxfp4_1d1d"
enable_mxfp4_fa = false
enable_mxfp4_gmm = true
enable_mxfp4_linear = true
use_sr_grad = false

Hadamard Transform Options

Enable Hadamard transform for the MXFP4 Linear layer:

[quantize.linear.mx]
use_hadamard = true
use_randomized_hadamard = false  # Set to true for Randomized Hadamard Transform (RHT)

AdaHOP (Adaptive Hadamard Transform based on Outlier Patterns) Options

Enable AdaHOP calibration:

[quantize.linear.mx.calibration]
use_HT_calibration = true
calibration_steps = 30
visualize_outlier_patterns = true
visualization_save_folder = "calibration_visualizations"

AdaHOP layer-specific transform configuration based on detected outlier patterns:

[quantize.linear.mx.calibration.layer_transform_config]
"row-row" = "hadamard"
"row-none" = "inner_outlier_extract_left"
"row-col" = "inner_outlier_extract_right"
"col-row" = "hadamard"
"col-none" = "hadamard"
"col-col" = "full_precision"
"none-row" = "hadamard"
"none-none" = "hadamard"
"none-col" = "inner_outlier_extract_right"

Pattern pairs format: "pattern1-pattern2" where patterns can be "row", "col", or "none". The transform mode ("hadamard", "inner_outlier_extract_left", inner_outlier_extract_right, full_precision or "none") applies to all transforms (forward_y, backward_gw, backward_gx).

Citation

We provide all of details of AdaHOP in 'AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation' paper. If you find our code or HOT useful for your research, please consider citing:

AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation

@article{kim2026adahop,
  title={AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation},
  author={Kim, Seonggon and Khodamoradi, Alireza and Denolf, Kristof and Park, Eunhyeok},
  journal={arXiv preprint arXiv:2604.02525},
  year={2026}
}

License

Source code is made available under a BSD 3 license, however you may have other legal obligations that govern your use of other content linked in this repository, such as the license or terms of service for third-party data and models.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
fig		fig
scripts		scripts
tests		tests
torchtitan		torchtitan
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE_AMD		LICENSE_AMD
README.md		README.md
launch_LPT_accel_docker.sh		launch_LPT_accel_docker.sh
meson.build		meson.build
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
ruff.toml		ruff.toml
run_train_accel.sh		run_train_accel.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation

Overview

Installation

1. Clone the repository

2. Launch Docker container

3. Install build dependencies

4. Install torchtitan locally

5. Download Hugging Face tokenizer for each models

6. Run training

7. Run Eval (LM-eval)

Configurations

Model Converters

MXFP4 Options

Hadamard Transform Options

AdaHOP (Adaptive Hadamard Transform based on Outlier Patterns) Options

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation

Overview

Installation

1. Clone the repository

2. Launch Docker container

3. Install build dependencies

4. Install torchtitan locally

5. Download Hugging Face tokenizer for each models

6. Run training

7. Run Eval (LM-eval)

Configurations

Model Converters

MXFP4 Options

Hadamard Transform Options

AdaHOP (Adaptive Hadamard Transform based on Outlier Patterns) Options

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages