Tractor-Burden

Tractor-Burden is an ancestry-aware rare variant burden testing framework for admixed populations. Similar to the intuition behind Tractor, Tractor-Burden incorporates local ancestry information into rare variant association testing by aggregating ancestry-specific rare variant dosages within user-defined genomic regions and performing ancestry-specific burden tests.

Workflow Overview

For each ancestry, Tractor-Burden:

Loads ancestry-specific dosage and hapcount files
Filters variants based on annotation class and MAF thresholds
Aggregates rare variant dosages within genomic regions
Fits association models:
- Logistic regression for binary traits
- Linear regression for quantitative traits
Returns ancestry-specific burden statistics

Required Inputs

1. Annotation File

Tab-delimited annotation file containing variant information. We recommend annotating variants with VEP prior to running Tractor-Burden and filtering to high- and moderate-impact coding consequences (e.g., missense_variant, frameshift_variant, stop_gained, splice_acceptor_variant, splice_donor_variant, and related SnpEff/VEP consequence terms) using the --keep-annotations argument.

Required Columns

Column	Description
CHROM	Chromosome
POS	Position
REF	Reference allele
ALT	Alternate allele

Optional Columns

Column	Description
Gene_Name	Gene assignment used for gene-based aggregation
Annotation	Functional annotation specified with `--ann-col`

Example

CHROM   POS      REF ALT Gene_Name Annotation
1       12345    A   G   Gene_A      missense_variant
1       12500    T   C   Gene_A      frameshift_variant
1       13000    G   A   Gene_B      splice_acceptor_variant

2. Ancestry-Specific Dosage and Hapcount Files

Tractor-Burden requires ancestry-specific dosage and hapcount files generated from Tractor extract_tracts.py.

Example:

chr1.anc0.dosage.txt
chr1.anc1.dosage.txt
chr1.anc0.hapcount.txt
chr1.anc1.hapcount.txt

Dosage files contain ancestry-specific minor allele dosages for each sample and variant. Hapcount files contain the number of ancestry-specific haplotypes carried by each sample at each variant.

Usage

Files should be supplied in matching order:

--ancestry-names EUR AFR \
--dosage-files chr1.ancEUR.dosage.gz chr1.ancAFR.dosage.gz \
--hapcount-files chr1.ancEUR.hapcount.gz chr1.ancAFR.hapcount.gz

3. Phenotype File

Tractor-Burden requires a single tab-delimited phenotype file containing sample IDs, the phenotype column, and any covariates to be included in the association model.

Required Columns

IID
y

IID must match the sample IDs in the dosage and hapcount files.
y is the phenotype column.
- Binary traits should be coded as 0/1.
- Quantitative traits can take continuous values.
Tractor-Burden automatically detects whether the phenotype is binary or quantitative based on the values in y.

Example

IID      LDL   age  sex    y      global_ancestry_EUR  global_ancestry_AFR
1456427  105   86   0   -0.25          0.0846               0.9154
1604504   59   21   0   -1.57          0.0907               0.9093
2958502  125   87   0    0.26          0.1306               0.8694

Covariates

Any additional columns may be used as covariates via the --covariates argument.

For example:

--covariates global_ancestry_AFR age sex

Common covariates include:

age
sex
global ancestry proportions
principal components (PCs)
study-specific covariates

Aggregation Methods

Option 1: Gene-Based Aggregation

Use the Gene_Name column in the annotation file.

Example:

--gene-col Gene_Name

Variants assigned to the same gene will be collapsed into a single burden score.

Option 2: RVTESTS-Style Set File

Use a custom region definition file.

Example 1:

LDLR chr19:11089000-11200000

Example 2, for exonic regions only:

LDLR chr19:11089462-11133820,chr19:11089462-11133820,chr19:11089462-11133820,chr19:11089462-11133820,chr19:11089462-11133820

Run with:

--set-file gene_boundaries.set

Option 3: Flexible Region-Based Aggregation

Although Tractor-Burden is commonly used for gene-based rare variant aggregation, the framework can aggregate variants across any user-defined genomic regions. This enables Tractor-Burden analyses beyond coding variation into noncoding and regulatory regions.

Example:

Region1 chr1:100000-150000
EnhancerA chr2:250000-275000
CustomWindow chr5:1000000-1100000

Running Tractor-Burden

Tractor-Burden automatically detects whether the phenotype column y is binary or quantitative. Users do not need to specify the trait type.

Example Run

python tractor_burden_final.py \
  --annotation-file /path/to/annotations.tsv \
  --ann-col consolidated_annotation \
  --set-file /path/to/regions.refFlat.set \
  --ancestry-names EUR AFR \
  --dosage-files \
    /path/to/chr1.anc0.dosage.txt.gz \
    /path/to/chr1.anc1.dosage.txt.gz \
  --hapcount-files \
    /path/to/chr1.anc0.hapcount.txt.gz \
    /path/to/chr1.anc1.hapcount.txt.gz \
  --phenotype-file /path/to/phenotype.tsv \
  --out-tsv /path/to/tractor_burden_results.tsv \
  --keep-annotations \
    missense_variant \
    frameshift_variant \
    splice_acceptor_variant \
    splice_donor_variant \
    stop_gained \
    start_lost \
  --min-mac 1 \
  --maf-scope none \
  --covariates global_ancestry_AFR age sex \

Optional Filtering Parameters

We recommend performing standard rare variant QC before running Tractor-Burden to reduce runtime and memory usage.

Minimum Minor Allele Count

--min-mac 1

Minimum alternate allele count required for a variant to be included in burden testing.

Default: 1

Minor Allele Frequency Filtering

--maf-scope none

Controls how allele frequency filtering is applied.

Option	Description
`none`	No MAF filtering (only `--min-mac` is applied)
`ancestry`	Apply MAF filtering separately within each ancestry
`total`	Apply MAF filtering using the aggregate allele frequency across all ancestries

When using ancestry or total, frequency thresholds can be specified with:

Defaults:

--maf-scope none
--maf-lo 0
--maf-hi 0.01

Output

Tractor-Burden produces ancestry-specific burden association statistics for each tested region. Example:

chrom  gene      term         estimate      pval        neglog10p  mac  n_carriers  n_variants  m_genes_tested
19     LDLR      burden_AFR   0.411314      3.51e-07    6.45       134  125         59          1256
19     LDLR      burden_EUR  -0.033425      0.8660      0.062      25   25          24          1256

Output Columns

Column	Description
`chrom`	Chromosome containing the tested region
`gene`	Name of the tested region or set identifier
`term`	Ancestry-specific burden test (`burden_AFR`, `burden_EUR`, etc.)
`estimate`	Regression coefficient for the ancestry-specific burden term
`pval`	Association p-value
`neglog10p`	−log10(p-value)
`mac`	Minor allele count aggregated across all qualifying variants in that ancestry
`n_carriers`	Number of individuals carrying at least one qualifying variant in that ancestry
`n_variants`	Number of variants included in the burden test after annotation and filtering
`m_genes_tested`	Total number of tested regions in the analysis (per chromosome)

The estimate corresponds to the effect of a one-unit increase in ancestry-specific burden count. For binary traits, this is the logistic regression coefficient (log-odds scale). For quantitative traits, it represents the linear regression effect estimate.

Note: The gene column is a generic region identifier and may correspond to genes, pathways, regulatory elements, sliding windows, or any user-defined set supplied through the region definition file.

Additional Resources

Tutorial

A step-by-step walkthrough using publicly available 2-way African/African American (ASW) samples from the Thousand Genome Project is available here:

Tractor-Burden Tutorial

For detailed explanations of phasing, local ancestry painting, and extracting tracts, refer to:

Tractor Tutorial

Citation

If you use this pipeline in your research, please cite:

XYZ (bioRxiv)

Please direct questions to: pragati.kore@bcm.edu or elizabeth.atkinson@bcm.edu.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
figures		figures
tutorial info		tutorial info
LICENSE		LICENSE
README.md		README.md
tractor_burden.py		tractor_burden.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tractor-Burden

Workflow Overview

Required Inputs

1. Annotation File

Required Columns

Optional Columns

Example

2. Ancestry-Specific Dosage and Hapcount Files

Usage

3. Phenotype File

Required Columns

Example

Covariates

Aggregation Methods

Option 1: Gene-Based Aggregation

Option 2: RVTESTS-Style Set File

Option 3: Flexible Region-Based Aggregation

Running Tractor-Burden

Example Run

Optional Filtering Parameters

Minimum Minor Allele Count

Minor Allele Frequency Filtering

Output

Output Columns

Additional Resources

Tutorial

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Tractor-Burden

Workflow Overview

Required Inputs

1. Annotation File

Required Columns

Optional Columns

Example

2. Ancestry-Specific Dosage and Hapcount Files

Usage

3. Phenotype File

Required Columns

Example

Covariates

Aggregation Methods

Option 1: Gene-Based Aggregation

Option 2: RVTESTS-Style Set File

Option 3: Flexible Region-Based Aggregation

Running Tractor-Burden

Example Run

Optional Filtering Parameters

Minimum Minor Allele Count

Minor Allele Frequency Filtering

Output

Output Columns

Additional Resources

Tutorial

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages