Skip to content

AMDResearch/AdaHOP

Repository files navigation

AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation

This repositorty is official code of AdaHOP: Adaptive Hadamard Transform with Outlier-Pattern-Aware strategy. AdaHOP achieves BF16 training quality at MXFP4 precision while delivering up to 3.6X memory compression and 1.46X kernel acceleration over BF16 training.

This is not an officially supported AMD product.

AdaHOP Pipeline

Overview

Hadamard transforms have become a key tool for stabilizing low-precision training, but existing methods apply them uniformly across tensors and computation paths. We show that this one-size-fits-all strategy is inherently limited: Hadamard smoothing reduces quantization error only when its direction is properly aligned with the operand's outlier structure. Through a systematic study of weights, activations, and gradients in LLM training, we identify three stable outlier patterns, Row-wise, Column-wise, and None, and show that each outlier pattern pair in matrix multiplication requires a distinct transform or outlier-handling strategy. We propose AdaHOP, Adaptive Hadamard transform with Outlier-Pattern-aware strategy, which applies Inner Hadamard Transform (IHT) when inner-dimension mixing properly suppresses the operands’ outliers, and selectively applies Outlier Extraction (OE) that extracts dominant outlier rows or columns into a high-precision path when it does not. With fused, hardware-aware Triton kernels, AdaHOP enables training from scratch at MXFP4 precision with BF16-level quality, while achieving up to 3.6× memory compression, 1.46× end-to-end training speedup over BF16.

Installation

1. Clone the repository

git clone 

2. Launch Docker container

bash launch_LPT_accel_docker.sh

3. Install build dependencies

pip install meson-python pybind11 meson ninja

4. Install torchtitan locally

cd low-precision-training/
pip install --no-build-isolation --no-deps -e .

5. Download Hugging Face tokenizer for each models

python scripts/download_hf_assets.py --repo_id meta-llama/Llama-3.2-1B --assets tokenizer --hf_token=[YOUR_HF_TOKEN]

6. Run training

HIP_VISIBLE_DEVICES=0,1,2,3 NGPU=4 CONFIG_FILE="./torchtitan/models/llama3/train_configs/llama3_3b_mxfp4_adahop_lv2.toml" ./run_train_accel.sh

7. Run Eval (LM-eval)

HIP_VISIBLE_DEVICES=0 python -m torchtitan.eval --config torchtitan/models/llama3/eval_configs/llama3_3b_eval.toml --checkpoint_path [CHECKPOINT_PATH]

Configurations

See torchtitan/models/llama3/train_configs for examples.

Model Converters

Use quantize.linear.mx converter for MXFP4 training.

[model]
converters = ["quantize.linear.mx"]

MXFP4 Options

  • filter_fqns: Skip quantization if the layer name matches filter_fqns.
  • recipe_name:
    • mxfp4_1d1d: Use 1x32 block for activations, 1x32 block for weights. (Only 1d1d is supported).
  • enable_mxfp4_fa: Use MXFP4 Attention or not (Work-In-Progress).
  • enable_mxfp4_gmm: Use MXFP4 GroupedMM or not.
  • enable_mxfp4_linear: Use MXFP4 Linear or not.
  • use_sr_grad: Use Stochastic Rounding for gradients or not.
  • use_hadamard: Apply Hadamard transform to the inputs of dW kernels or not.
[quantize.linear.mx]
filter_fqns = ["output"]
recipe_name = "mxfp4_1d1d"
enable_mxfp4_fa = false
enable_mxfp4_gmm = true
enable_mxfp4_linear = true
use_sr_grad = false

Hadamard Transform Options

Enable Hadamard transform for the MXFP4 Linear layer:

[quantize.linear.mx]
use_hadamard = true
use_randomized_hadamard = false  # Set to true for Randomized Hadamard Transform (RHT)

AdaHOP (Adaptive Hadamard Transform based on Outlier Patterns) Options

Enable AdaHOP calibration:

[quantize.linear.mx.calibration]
use_HT_calibration = true
calibration_steps = 30
visualize_outlier_patterns = true
visualization_save_folder = "calibration_visualizations"

AdaHOP layer-specific transform configuration based on detected outlier patterns:

[quantize.linear.mx.calibration.layer_transform_config]
"row-row" = "hadamard"
"row-none" = "inner_outlier_extract_left"
"row-col" = "inner_outlier_extract_right"
"col-row" = "hadamard"
"col-none" = "hadamard"
"col-col" = "full_precision"
"none-row" = "hadamard"
"none-none" = "hadamard"
"none-col" = "inner_outlier_extract_right"

Pattern pairs format: "pattern1-pattern2" where patterns can be "row", "col", or "none". The transform mode ("hadamard", "inner_outlier_extract_left", inner_outlier_extract_right, full_precision or "none") applies to all transforms (forward_y, backward_gw, backward_gx).

Citation

We provide all of details of AdaHOP in 'AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation' paper. If you find our code or HOT useful for your research, please consider citing:

AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation

@article{kim2026adahop,
  title={AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation},
  author={Kim, Seonggon and Khodamoradi, Alireza and Denolf, Kristof and Park, Eunhyeok},
  journal={arXiv preprint arXiv:2604.02525},
  year={2026}
}

License

Source code is made available under a BSD 3 license, however you may have other legal obligations that govern your use of other content linked in this repository, such as the license or terms of service for third-party data and models.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages