Quickstart

Transposable Element Repeat Result classifIER

Terrier is a Neural Network model to classify transposable element sequences.

It is based on ‘corgi’ which was trained to do hierarchical taxonomic classification of DNA sequences.

This model was trained using the Repbase library of repetitive DNA elements and trained to do hierarchical classification according to the RepeatMasker schema.

An online version of Terrier (using CPUs only) is available at https://portal.cpg.unimelb.edu.au/tools/terrier.

Latest Results

Terrier v0.4 was released and this was trained on RepBase31.04 (released 04-23-2026) which as substantially more sequences than the previous version (RepBase29.10) used to train the version of Terrier for the publication in Briefings in Bioinformatics (2025).

This has altered the performance of the model.

See the Latest Results page for the performance of both versions of Terrier on the test data.

Installation

Install using pip:

pip install bio-terrier

Warning

Do not try just pip install terrier because that is a different package.

Or install the latest version from GitHub:

pip install git+https://github.com/rbturnbull/terrier.git

Google Colab Version

Follow this link to launch a Google Colab notebook where you can run the model on your own data: colab badge2

Usage

To run inference on a FASTA file, run this command:

terrier --input INPUT.fa --output-fasta OUTPUT.fa

That will add the classification to after the sequence ID in the OUTPUT.fa FASTA file.

If you want to save the probabilities for all classes run this:

terrier --input INPUT.fa --output-csv OUTPUT.csv

The columns will be the probability of each classification and the rows correspond to each sequence in INPUT.fa.

You can also use a URL as the input:

terrier --input https://example.com/INPUT.fasta.gz --output-fasta OUTPUT.fa

If you want to output a visualization of the prediction probabilities:

terrier --input INPUT.fa --image-dir OUTPUT-IMAGES/

The outputs for the above can be combined together. For more options run

terrier --help

To see the options to train the model, run:

terrier-tools --help

Programmatic Usage

You can also use the model programmatically:

from terrier import Terrier

terrier = Terrier()
terrier(file="INPUT.fa", output_fasta="OUTPUT.fa")

Potential Use Case

A potential workflow is to use RepeatModeler first to generate a repeat library. Then you can use Terrier to attempt to classify the remaining unknown repeats. If you only want highly confident classifications from Terrier, you can set the threshold to 0.9 or higher. If you wish to have more coverage, then you can set the threshold lower (or keep it at the default value of 0.7). The modified repeat library can then be used with RepeatMasker to mask the repeats in your genome assembly.