Alignment Module
- Rule mafft[source]
Aligns the protein (amino acid) ortholog with MAFFT.
- Conda
name: mafft channels: - bioconda - conda-forge dependencies: - mafft=7.520
- Rule get_cds_seq[source]
This rule creates an unaligned mfasta file of the corresponding nucleotide sequences.
Locates the original CDSs so that the aligned (amino acid) sequences can be translated back.
- Conda
channels: - conda-forge dependencies: - biopython - typer
- Rule taxon_only[source]
Trim sequence IDs to taxon.
At the end, the sequence IDs need to be trimmed down to contain just the taxon identifier and produce clean output for the next stages.
- Conda
channels: - conda-forge dependencies: - coreutils - rich==12.4.1 - python=3.9 - typer
- Rule thread_dna[source]
Back-translates the alignment to codons based on the CDS sequences, yielding a correspond alignment of nucleotide sequences.
https://jlsteenwyk.com/PhyKIT/usage/index.html#protein-to-nucleotide-alignment
The –stop argument keeps in stop codons which are otherwise removed.
- Conda
channels: - conda-forge dependencies: - coreutils - python=3.9 - pip - pip: - phykit==1.17.0
- Rule trim_alignments[source]
Trim multiple-sequence alignments using ClipKIT.
- Conda
channels: - conda-forge dependencies: - coreutils - python=3.9 - pip - pip: - clipkit==2.2.4
- Checkpoint list_alignments[source]
List path to alignment files into a single text file for use in PhyKIT.
- Rule missing_taxa[source]
- Conda
channels: - conda-forge dependencies: - coreutils - rich==12.4.1 - python=3.9 - typer