TAFFISH tool app for RepeatModeler, a de novo transposable element family identification and modeling package.
This app packages RepeatModeler 2.0.9 from the official Dfam
TETools runtime. TETools is the
upstream-supported container route for RepeatModeler and includes
RepeatModeler, RepeatMasker, RMBlast, FamDB/Dfam data, RepeatScout, RECON,
LTR_retriever, GenomeTools, MAFFT, CD-HIT, UCSC twoBit helpers, and related
TE utilities.
- name:
repeatmodeler - command:
taf-repeatmodeler - kind:
tool - version:
2.0.9-r1 - image:
ghcr.io/taffish/repeatmodeler:2.0.9-r1 - TAFFISH app license: Apache-2.0
- upstream: RepeatModeler
2.0.9 - runtime base:
dfam/tetools:2.0@sha256:6081b4d3883eff478873cb94cd24addf540275365d4acc4446bb647a341e95e2 - native platform:
linux/amd64
taf install repeatmodelerShow TAFFISH app help:
taf-repeatmodeler --help
taf-repeatmodeler --version
taf-repeatmodeler --compileShow upstream RepeatModeler help and version:
taf-repeatmodeler -- -help
taf-repeatmodeler -- -version
taf-repeatmodeler RepeatModeler -help
taf-repeatmodeler BuildDatabase -help
taf-repeatmodeler RepeatClassifier -helpBuild a RepeatModeler database from an assembled genome FASTA:
taf-repeatmodeler BuildDatabase -name genome genome.faRun RepeatModeler on that database:
taf-repeatmodeler -database genome -threads 16Run the LTR structural discovery extension:
taf-repeatmodeler -database genome -threads 16 -LTRStructUse the produced repeat library with RepeatMasker:
taf-repeatmodeler RepeatMasker -pa 8 -engine rmblast -lib genome-families.fa genome.fataf-repeatmodeler is a normal command-mode TAFFISH tool app. Option-leading
arguments are passed to the default upstream command, RepeatModeler:
taf-repeatmodeler -database genome -threads 16
taf-repeatmodeler -- -helpIf the first argument is not an option, TAFFISH treats it as an executable in the same container. Use this for bundled helper commands:
taf-repeatmodeler BuildDatabase -name genome genome.fa
taf-repeatmodeler RepeatClassifier -consensi genome-families.fa
taf-repeatmodeler RepeatMasker -lib genome-families.fa genome.fa
taf-repeatmodeler rmblastn -version
taf-repeatmodeler famdb.py --help
taf-repeatmodeler gt -version
taf-repeatmodeler mafft --versionDo not write taf-repeatmodeler BuildDatabase ... expecting it to be a
RepeatModeler subcommand. BuildDatabase, RepeatClassifier, and
RepeatMasker are separate executables in the same runtime.
The Dockerfile starts from the official Dfam TETools image pinned by digest and adds only TAFFISH metadata, a working directory, a few stable helper symlinks for command mode, and build-time self-checks.
Runtime components checked by Dockerfile and smoke include:
- RepeatModeler
2.0.9 - RepeatMasker
4.2.4 - RMBlast
2.17.1+ - TRF
4.09 - GenomeTools
1.6.4 - MAFFT
7.471 - RepeatScout
1.0.7 - FamDB
3.0.0and Dfam 4.0 component data bundled by TETools - RECON, LTR_retriever
2.9.0, CD-HIT4.8.1, NINJA, RepeatAfterMe, and UCSC twoBit helper utilities
BLAST_USAGE_REPORT=false is set so RMBlast does not perform NCBI usage
reporting during TAFFISH runs. PAGER=cat is set so help output remains
non-interactive in terminals, CI, and flows.
RepeatModeler is designed for assembled genome sequences, not raw sequencing reads. The normal workflow is:
- Create a database with
BuildDatabase -name <db> genome.fa. - Run
RepeatModeler -database <db> -threads <N>. - Use the resulting repeat-family FASTA as a custom library, commonly with
RepeatMasker -lib <db>-families.fa.
For real assemblies, run from a fast local working directory with enough disk
space for temporary files. RepeatModeler creates an RM_<pid>.<date>/
directory and keeps it for audit and recovery.
Successful RepeatModeler runs commonly produce:
<database>-families.fa Consensus repeat family FASTA
<database>-families.stk Dfam-compatible Stockholm seed alignments
<database>-rmod.log Summarized RepeatModeler log
RM_<pid>.<date>/ Detailed run directory with rounds and intermediates
The FASTA library can be passed to RepeatMasker with -lib for genome
screening and masking.
This image follows the official TETools runtime and therefore includes the Dfam/FamDB component files bundled there:
/opt/FamDB-Dfam-4.0/Libraries/famdb/dfam40.0.h5
/opt/FamDB-Dfam-4.0/Libraries/famdb/dfam40.curated.consensus.0.h5
This is different from the lighter taf-repeatmasker app, where full
species/clade FamDB runs require user-provided Dfam component files. Here the
goal is a complete official RepeatModeler/TETools runtime, so the bundled
TETools FamDB/Dfam data are retained.
The app does not include RepBase, project-specific genomes, or user reference datasets. Those resources must be supplied by the user and used under their own license terms.
This app is native linux/amd64 only because the official TETools image is a
single amd64 image and includes x64 Linux binaries such as RMBlast and UCSC
helper tools. src/main.taf asks Docker and Podman to run with:
--platform linux/amd64
On arm64 hosts, such as Apple Silicon Macs, Docker and Podman can run it through amd64 emulation. That is not native arm64 support.
This app does not:
- provide a full repeat annotation flow;
- bundle RepBase or commercial/restricted repeat libraries;
- download external databases during normal execution or smoke tests;
- validate biological correctness on large genomes in smoke;
- claim native
linux/arm64support.
The smoke tests validate the packaged command surface, version binding,
RepeatModeler configuration paths, key helper availability, and a tiny offline
BuildDatabase run.
The TAFFISH app packaging files are licensed under Apache-2.0. The packaged upstream RepeatModeler software is licensed under OSL-2.1. The official TETools runtime bundles additional tools and Dfam/FamDB data under their own notices; those upstream and data licenses are not changed by this TAFFISH wrapper.
Please cite RepeatModeler and its component tools as requested by upstream for your analysis. Important upstream components include RepeatModeler, Dfam TETools, RepeatMasker, RMBlast/BLAST+, RepeatScout, RECON, TRF, LTR_retriever, GenomeTools, MAFFT, CD-HIT, and Dfam/FamDB data.
Upstream resources: