ete-build

ete-build provides a unified interface to wrap the execution of phylogenetic workflows, comprising the reconstruction of gene trees and supermatrix-based species trees.

Highlighted features:

  • A single command can be used to configure and launch complex phylogenetic pipelines, covering sequence alignment reconstruction, trimming, model testing, tree inference and image rendering.

  • The supermatrix-based reconstruction mode permits to build and concatenate multiple sequence alignments with ease, simplifying the reconstruction of species trees based on multiple genes.

  • Advanced options allow to automatically switch from amino-acid to nucleotide alignments based on sequence identity, resuming the execution of workflows, or even testing multiple strategies in parallel.

  • ete-build comes with a number of predefined workflowsand application bindinds, which can also be extended or reconfigured

Getting started

1. Reconstructing gene trees

To infer a gene-based phylogeny, simply choose a workflow name (-w) and provide a fasta file with the target sequences (-a for amino-acids and -n for nucleotides).

ete3 build -w standard_raxml -a NUP62.aa.fa -o output_tree 

After a few seconds, you should get a tree image like this:

The execution of a workflow will look like the following video:

2. Reconstructing species trees

Reconstructing a species tree based on several concatenated alignments requires the selection of two workflows: i) a gene-tree workflow used to align the sequences of each gene family (-w), and ii) a workflow to concatenate and build a tree based on the supermatrix alignment (-m).

Sequences from all genes/proteins must be passed in a single fasta file, and a COGs (Cluster of Orthologous Groups) file will also be required. The COGs file must be a text file containing the same sequence IDs as in the input file. Each TAB delimited line will be considered a COG. For instance, the following example would define 3 COGs of size 3, 2 and 4 sequences respectively:

sp1_seqA   sp2_seqA    sp3_seqA
sp1_seqB   sp2_seqB    
sp1_seqC   sp3_seqC    sp4_seqC    sp5_seqC
        

By default, the expected format for the sequence identifiers is SpeciesCode_SequenceName, but you can change this behavior with --spname-delimiter

A simple run, using the above example data, would look like this:

ete3 build -a proteome_seqs.fa --cogs cogs.txt -o sptree1_results -m sptree_fasttree_100 -w standard_fasttree
          


3. I want to learn more...

There are many more options and worlflows that you can use. Check out our cookbook for example recipes!




About the set of supported tools and workflows

While ete-build covers many common programs in phylogenetics, it does certainly not cover all. The current selection of tools responds to the direct feedback from the community of ETE users.

However, ETE does not make any commitment in favor of any specific tool. By contrast, ete-build is written in a modular way that permits adding new tools and attaching it to any workflow. From a development point of view, the team is open to any contribution for either new applications or workflow configurations.

The following files are used in the examples on the left

for gene tree workflows for species tree workflows

Extended documentation

This page shows the basic usage of ete-build. More options and examples are available as a collection of notebooks.


Predefined workflow names:

Gene-Tree (-w):
  • eggnog41
  • full_fast_modeltest
  • full_fast_modeltest_bootstrap
  • full_modeltest
  • full_modeltest_bootstrap
  • full_ultrafast_modeltest
  • full_ultrafast_modeltest_bootstrap
  • phylomedb4
  • soft_fast_modeltest
  • soft_fast_modeltest_bootstrap
  • soft_modeltest
  • soft_modeltest_bootstrap
  • soft_ultrafast_modeltest
  • soft_ultrafast_modeltest_bootstrap
  • standard_fasttree
  • standard_phyml
  • standard_phyml_bootstrap
  • standard_raxml
  • standard_raxml_bootstrap
  • standard_trimmed_fasttree
  • standard_trimmed_phyml
  • standard_trimmed_phyml_bootstrap
  • standard_trimmed_raxml
  • standard_trimmed_raxml_bootstrap
Supermatrix-Tree (-m):
  • sptree_fasttree_100
  • sptree_fasttree_85
  • sptree_fasttree_90
  • sptree_fasttree_95
  • sptree_fasttree_all
  • sptree_raxml_100
  • sptree_raxml_85
  • sptree_raxml_90
  • sptree_raxml_95
  • sptree_raxml_all

Supported software:

  • ClustalOmega
  • Muscle
  • Mafft
  • MCoffee (beta)
  • TCoffee (beta)
  • DAlign-tx
  • Trimal
  • Pmodeltest
  • Phyml
  • Raxml
  • Fasttree
  • Upcomming:
  • Kalign
  • Prank
  • Probcons