ete-build
ete-build provides a unified interface to wrap the execution of phylogenetic workflows, comprising the reconstruction of gene trees and supermatrix-based species trees.
Highlighted features:
A single command can be used to configure and launch complex phylogenetic pipelines, covering sequence alignment reconstruction, trimming, model testing, tree inference and image rendering.
The supermatrix-based reconstruction mode permits to build and concatenate multiple sequence alignments with ease, simplifying the reconstruction of species trees based on multiple genes.
Advanced options allow to automatically switch from amino-acid to nucleotide alignments based on sequence identity, resuming the execution of workflows, or even testing multiple strategies in parallel.
ete-build comes with a number of predefined workflowsand application bindinds, which can also be extended or reconfigured
To infer a gene-based phylogeny, simply choose a workflow name (-w
) and provide a fasta file with the target sequences (-a
for amino-acids and -n
for
nucleotides).
ete3 build -w standard_raxml -a NUP62.aa.fa -o output_tree
After a few seconds, you should get a tree image like this:
The execution of a workflow will look like the following video:
Reconstructing a species tree based on several concatenated alignments requires the
selection of two workflows: i) a gene-tree workflow used to align the sequences
of each gene family (-w
), and ii) a workflow to concatenate and build a tree based on the
supermatrix alignment (-m
).
Sequences from all genes/proteins must be passed in a single fasta file, and a COGs (Cluster of Orthologous Groups) file will also be required. The COGs file must be a text file containing the same sequence IDs as in the input file. Each TAB delimited line will be considered a COG. For instance, the following example would define 3 COGs of size 3, 2 and 4 sequences respectively:
sp1_seqA sp2_seqA sp3_seqA sp1_seqB sp2_seqB sp1_seqC sp3_seqC sp4_seqC sp5_seqC
By default, the expected format for the sequence identifiers is
SpeciesCode_SequenceName
, but you can change this behavior
with --spname-delimiter
A simple run, using the above example data, would look like this:
ete3 build -a proteome_seqs.fa --cogs cogs.txt -o sptree1_results -m sptree_fasttree_100 -w standard_fasttree
There are many more options and worlflows that you can use. Check out our cookbook for example recipes!
While ete-build covers many common programs in phylogenetics, it does certainly not cover all. The current selection of tools responds to the direct feedback from the community of ETE users.
However, ETE does not make any commitment in favor of any specific tool. By contrast, ete-build is written in a modular way that permits adding new tools and attaching it to any workflow. From a development point of view, the team is open to any contribution for either new applications or workflow configurations.
The following files are used in the examples on the left
for gene tree workflows | for species tree workflows |
This page shows the basic usage of ete-build. More options and examples are available as a collection of notebooks.
Predefined workflow names:
Gene-Tree (-w):Supported software: