Alignment Module

Rule mafft[source]

Aligns the protein (amino acid) ortholog with MAFFT.

Conda
name: mafft
channels:
  - bioconda
  - conda-forge
dependencies:
  - mafft=7.520

Rule get_cds_seq[source]

This rule creates an unaligned mfasta file of the corresponding nucleotide sequences.

Locates the original CDSs so that the aligned (amino acid) sequences can be translated back.

Conda
channels:
  - conda-forge
dependencies:
  - biopython
  - typer

Rule taxon_only[source]

Trim sequence IDs to taxon.

At the end, the sequence IDs need to be trimmed down to contain just the taxon identifier and produce clean output for the next stages.

Conda
channels:
  - conda-forge
dependencies:
  - coreutils
  - rich==12.4.1
  - python=3.9
  - typer

Rule thread_dna[source]

Back-translates the alignment to codons based on the CDS sequences, yielding a correspond alignment of nucleotide sequences.

https://jlsteenwyk.com/PhyKIT/usage/index.html#protein-to-nucleotide-alignment

The –stop argument keeps in stop codons which are otherwise removed.

Conda
channels:
  - conda-forge
dependencies:
  - coreutils
  - python=3.9
  - pip
  - pip:
    - phykit==1.17.0

Rule trim_alignments[source]

Trim multiple-sequence alignments using ClipKIT.

Conda
channels:
  - conda-forge
dependencies:
  - coreutils
  - python=3.9
  - pip
  - pip:
    - clipkit==2.2.4

Checkpoint list_alignments[source]

List path to alignment files into a single text file for use in PhyKIT.


Rule missing_taxa[source]
Conda
channels:
  - conda-forge
dependencies:
  - coreutils
  - rich==12.4.1
  - python=3.9
  - typer