This repositorty is official code of AdaHOP: Adaptive Hadamard Transform with Outlier-Pattern-Aware strategy. AdaHOP achieves BF16 training quality at MXFP4 precision while delivering up to 3.6X memory compression and 1.46X kernel acceleration over BF16 training.
This is not an officially supported AMD product.
Hadamard transforms have become a key tool for stabilizing low-precision training, but existing methods apply them uniformly across tensors and computation paths. We show that this one-size-fits-all strategy is inherently limited: Hadamard smoothing reduces quantization error only when its direction is properly aligned with the operand's outlier structure. Through a systematic study of weights, activations, and gradients in LLM training, we identify three stable outlier patterns, Row-wise, Column-wise, and None, and show that each outlier pattern pair in matrix multiplication requires a distinct transform or outlier-handling strategy. We propose AdaHOP, Adaptive Hadamard transform with Outlier-Pattern-aware strategy, which applies Inner Hadamard Transform (IHT) when inner-dimension mixing properly suppresses the operands’ outliers, and selectively applies Outlier Extraction (OE) that extracts dominant outlier rows or columns into a high-precision path when it does not. With fused, hardware-aware Triton kernels, AdaHOP enables training from scratch at MXFP4 precision with BF16-level quality, while achieving up to 3.6× memory compression, 1.46× end-to-end training speedup over BF16.
git clone bash launch_LPT_accel_docker.shpip install meson-python pybind11 meson ninjacd low-precision-training/
pip install --no-build-isolation --no-deps -e .python scripts/download_hf_assets.py --repo_id meta-llama/Llama-3.2-1B --assets tokenizer --hf_token=[YOUR_HF_TOKEN]HIP_VISIBLE_DEVICES=0,1,2,3 NGPU=4 CONFIG_FILE="./torchtitan/models/llama3/train_configs/llama3_3b_mxfp4_adahop_lv2.toml" ./run_train_accel.shHIP_VISIBLE_DEVICES=0 python -m torchtitan.eval --config torchtitan/models/llama3/eval_configs/llama3_3b_eval.toml --checkpoint_path [CHECKPOINT_PATH]See torchtitan/models/llama3/train_configs for examples.
Use quantize.linear.mx converter for MXFP4 training.
[model]
converters = ["quantize.linear.mx"]filter_fqns: Skip quantization if the layer name matchesfilter_fqns.recipe_name:mxfp4_1d1d: Use1x32block for activations,1x32block for weights. (Only 1d1d is supported).
enable_mxfp4_fa: Use MXFP4 Attention or not (Work-In-Progress).enable_mxfp4_gmm: Use MXFP4 GroupedMM or not.enable_mxfp4_linear: Use MXFP4 Linear or not.use_sr_grad: Use Stochastic Rounding for gradients or not.use_hadamard: Apply Hadamard transform to the inputs of dW kernels or not.
[quantize.linear.mx]
filter_fqns = ["output"]
recipe_name = "mxfp4_1d1d"
enable_mxfp4_fa = false
enable_mxfp4_gmm = true
enable_mxfp4_linear = true
use_sr_grad = falseEnable Hadamard transform for the MXFP4 Linear layer:
[quantize.linear.mx]
use_hadamard = true
use_randomized_hadamard = false # Set to true for Randomized Hadamard Transform (RHT)Enable AdaHOP calibration:
[quantize.linear.mx.calibration]
use_HT_calibration = true
calibration_steps = 30
visualize_outlier_patterns = true
visualization_save_folder = "calibration_visualizations"AdaHOP layer-specific transform configuration based on detected outlier patterns:
[quantize.linear.mx.calibration.layer_transform_config]
"row-row" = "hadamard"
"row-none" = "inner_outlier_extract_left"
"row-col" = "inner_outlier_extract_right"
"col-row" = "hadamard"
"col-none" = "hadamard"
"col-col" = "full_precision"
"none-row" = "hadamard"
"none-none" = "hadamard"
"none-col" = "inner_outlier_extract_right"Pattern pairs format: "pattern1-pattern2" where patterns can be "row", "col", or "none". The transform mode ("hadamard", "inner_outlier_extract_left", inner_outlier_extract_right, full_precision or "none") applies to all transforms (forward_y, backward_gw, backward_gx).
We provide all of details of AdaHOP in 'AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation' paper. If you find our code or HOT useful for your research, please consider citing:
AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation
@article{kim2026adahop,
title={AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation},
author={Kim, Seonggon and Khodamoradi, Alireza and Denolf, Kristof and Park, Eunhyeok},
journal={arXiv preprint arXiv:2604.02525},
year={2026}
}
Source code is made available under a BSD 3 license, however you may have other legal obligations that govern your use of other content linked in this repository, such as the license or terms of service for third-party data and models.
