`ete-build`

ete-build provides a unified interface to wrap the execution of phylogenetic workflows, comprising the reconstruction of gene trees and supermatrix-based species trees.

Highlighted features:

A single command can be used to configure and launch complex phylogenetic pipelines, covering sequence alignment reconstruction, trimming, model testing, tree inference and image rendering.
The supermatrix-based reconstruction mode permits to build and concatenate multiple sequence alignments with ease, simplifying the reconstruction of species trees based on multiple genes.
Advanced options allow to automatically switch from amino-acid to nucleotide alignments based on sequence identity, resuming the execution of workflows, or even testing multiple strategies in parallel.
ete-build comes with a number of predefined workflowsand application bindinds, which can also be extended or reconfigured

Getting started

1. Reconstructing gene trees

To infer a gene-based phylogeny, simply choose a workflow name (-w) and provide a fasta file with the target sequences (-a for amino-acids and -n for nucleotides).

ete3 build -w standard_raxml -a NUP62.aa.fa -o output_tree

After a few seconds, you should get a tree image like this:

The execution of a workflow will look like the following video:

2. Reconstructing species trees

Reconstructing a species tree based on several concatenated alignments requires the selection of two workflows: i) a gene-tree workflow used to align the sequences of each gene family (-w), and ii) a workflow to concatenate and build a tree based on the supermatrix alignment (-m).

Sequences from all genes/proteins must be passed in a single fasta file, and a COGs (Cluster of Orthologous Groups) file will also be required. The COGs file must be a text file containing the same sequence IDs as in the input file. Each TAB delimited line will be considered a COG. For instance, the following example would define 3 COGs of size 3, 2 and 4 sequences respectively:

sp1_seqA   sp2_seqA    sp3_seqA
sp1_seqB   sp2_seqB    
sp1_seqC   sp3_seqC    sp4_seqC    sp5_seqC

By default, the expected format for the sequence identifiers is SpeciesCode_SequenceName, but you can change this behavior with --spname-delimiter

A simple run, using the above example data, would look like this:

ete3 build -a proteome_seqs.fa --cogs cogs.txt -o sptree1_results -m sptree_fasttree_100 -w standard_fasttree

3. I want to learn more...

There are many more options and worlflows that you can use. Check out our cookbook for example recipes!

About the set of supported tools and workflows

While ete-build covers many common programs in phylogenetics, it does certainly not cover all. The current selection of tools responds to the direct feedback from the community of ETE users.

However, ETE does not make any commitment in favor of any specific tool. By contrast, ete-build is written in a modular way that permits adding new tools and attaching it to any workflow. From a development point of view, the team is open to any contribution for either new applications or workflow configurations.

The following files are used in the examples on the left

for gene tree workflows	for species tree workflows
p53.fasta NUP62.aa.fa NUP62.nt.fa	cogs.txt proteome_seqs.fa

Extended documentation

This page shows the basic usage of ete-build. More options and examples are available as a collection of notebooks.

Predefined workflow names:

Gene-Tree (-w):

eggnog41
full_fast_modeltest
full_fast_modeltest_bootstrap
full_modeltest
full_modeltest_bootstrap
full_ultrafast_modeltest
full_ultrafast_modeltest_bootstrap
phylomedb4
soft_fast_modeltest
soft_fast_modeltest_bootstrap
soft_modeltest
soft_modeltest_bootstrap
soft_ultrafast_modeltest
soft_ultrafast_modeltest_bootstrap
standard_fasttree
standard_phyml
standard_phyml_bootstrap
standard_raxml
standard_raxml_bootstrap
standard_trimmed_fasttree
standard_trimmed_phyml
standard_trimmed_phyml_bootstrap
standard_trimmed_raxml
standard_trimmed_raxml_bootstrap

Supermatrix-Tree (-m):

sptree_fasttree_100
sptree_fasttree_85
sptree_fasttree_90
sptree_fasttree_95
sptree_fasttree_all
sptree_raxml_100
sptree_raxml_85
sptree_raxml_90
sptree_raxml_95
sptree_raxml_all

Supported software:

ClustalOmega
Muscle
Mafft
MCoffee (beta)
TCoffee (beta)
DAlign-tx
Trimal
Pmodeltest
Phyml
Raxml
Fasttree
Kalign
Prank
Probcons