Citation and Attribution
Orthoflow was created by Robert Turnbull, Jacob Steenwyk, Simon Mutch, Vinícius Salazar, Pelle Scholten, Joanne L. Birch and Heroen Verbruggen.
The preprint for Orthoflow is here:
Robert Turnbull, Jacob L. Steenwyk, Simon J. Mutch, Pelle Scholten, Vinícius W. Salazar, Joanne L. Birch, and Heroen Verbruggen. Orthoflow: hylogenomic analysis and diagnostics with one command, 04 December 2023, PREPRINT available at Research Square [https://doi.org/10.21203/rs.3.rs-3699210/]
In BibTeX format:
@article {orthoflow,
author = {Robert Turnbull and Jacob L. Steenwyk and Simon J. Mutch and Pelle Scholten and Vinícius W. Salazar and Joanne L. Birch and Heroen Verbruggen},
title = {{Orthoflow: Phylogenomic analysis and diagnostics with one command}},
year = {2023},
doi = {10.21203/rs.3.rs-3699210/v2},
abstract = {Species trees, which depict the evolutionary relationships among organisms, underlie many evolutionary studies. Phylogenomics, the use of genome-scale datasets for phylogenetic inference, is the current gold standard for species tree inference. The development, maintenance, and execution of phylogenomic workflows is challenging, requiring programming, data management skills, and familiarity with changing best practices. We introduce Orthoflow, a software wherein a single command automatically conducts end-to-end phylogenomic analysis—orthology inference and identification of phylogenomic markers, quality control, data matrix construction, diagnostics, and tree inference using supermatrix and supertree methods from multiple input data formats. To demonstrate the utility of Orthoflow, we successfully recapitulate the evolutionary relationships among 24 yeast species. Orthoflow increases the accessibility of researchers to conduct rigorous phylogenomic analysis flexibly. Orthoflow is freely available from PyPI (https://pypi.org/project/orthoflow/), Bioconda (https://anaconda.org/bioconda/orthoflow) and GitHub (https://github.com/rbturnbull/orthoflow) under the Apache License 2.0.},
journal = {Research Square}
}
More details to come.
Bibliography
The following articles and software packages were used in this workflow:
Peter J. A. Cock, Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, and Michiel J. L. de Hoon. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11):1422–1423, 03 2009. URL: https://doi.org/10.1093/bioinformatics/btp163, arXiv:https://academic.oup.com/bioinformatics/article-pdf/25/11/1422/944180/btp163.pdf, doi:10.1093/bioinformatics/btp163.
Deren A. R. Eaton. Toytree: A minimalist tree visualization and manipulation library for Python. Methods in Ecology and Evolution, 11:187–191, 2020. doi:10.1111/2041-210X.13313.
David M. Emms and Steven Kelly. Orthofinder: phylogenetic orthology inference for comparative genomics. Genome Biology, 20(1):238, 2019. URL: https://doi.org/10.1186/s13059-019-1832-y, doi:10.1186/s13059-019-1832-y.
Diep Thi Hoang, Olga Chernomor, Arndt von Haeseler, Bui Quang Minh, and Le Sy Vinh. UFBoot2: Improving the Ultrafast Bootstrap Approximation. Molecular Biology and Evolution, 35(2):518–522, 10 2017. URL: https://doi.org/10.1093/molbev/msx281, arXiv:https://academic.oup.com/mbe/article-pdf/35/2/518/24367824/msx281.pdf, doi:10.1093/molbev/msx281.
Subha Kalyaanamoorthy, Bui Quang Minh, Thomas K F Wong, Arndt von Haeseler, and Lars S Jermiin. Modelfinder: fast model selection for accurate phylogenetic estimates. Nature Methods, 14(6):587–589, 2017. URL: https://doi.org/10.1038/nmeth.4285, doi:10.1038/nmeth.4285.
Bui Quang Minh, Heiko A Schmidt, Olga Chernomor, Dominik Schrempf, Michael D Woodhams, Arndt von Haeseler, and Robert Lanfear. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Molecular Biology and Evolution, 37(5):1530–1534, 02 2020. URL: https://doi.org/10.1093/molbev/msaa015, arXiv:https://academic.oup.com/mbe/article-pdf/37/5/1530/33386032/msaa015.pdf, doi:10.1093/molbev/msaa015.
Jacob L Steenwyk, III Buida, Thomas J, Abigail L Labella, Yuanning Li, Xing-Xing Shen, and Antonis Rokas. PhyKIT: a broadly applicable UNIX shell toolkit for processing and analyzing phylogenomic data. Bioinformatics, 37(16):2325–2331, 02 2021. URL: https://doi.org/10.1093/bioinformatics/btab096, arXiv:https://academic.oup.com/bioinformatics/article-pdf/37/16/2325/39948152/btab096.pdf, doi:10.1093/bioinformatics/btab096.
Jacob L. Steenwyk, Thomas J. Buida, Carla Gonçalves, Dayna C. Goltz, Grace Morales, Matthew E. Mead, Abigail L. LaBella, Christina M. Chavez, Jonathan E. Schmitz, Maria Hadjifrangiskou, Yuanning Li, and Antonis Rokas. BioKIT: a versatile toolkit for processing and analyzing diverse types of sequence data. biorxiv, oct 2021. URL: https://doi.org/10.1101\%2F2021.10.02.462868, doi:10.1101/2021.10.02.462868.
Chao Zhang, Maryam Rabiee, Erfan Sayyari, and Siavash Mirarab. Astral-iii: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics, 19(6):153, 2018. URL: https://doi.org/10.1186/s12859-018-2129-y, doi:10.1186/s12859-018-2129-y.