ete-build
¶This recipe shows basic concepts for the use the ete-build tool. It also shows how to build a phylogenetic tree from scratch using predefined workflows.
ete3
ete3_external_apps
ete3-build requires several external programs to compute trees, sequence alignments and to perform other tasks. The recommended way to install those external tools is using the Conda pre-built package. The ete3_external_apps
is a meta-package that includes all the necessary tools pre-compiled for Linux and OS X. This has two main advantages:
It is a plug&play solution, requiring nos admin permissions and installing a full computing environment in a few minutes.
It ensures reproducibility. The set of tools is distributed under a common version number, so you can always come back to previous version if necessary.
If manual installation is necessary, you could also try the manual installation process by running ete3 upgrade-external-tools
, which will attempt to compile all tools from scratch in your local system. This is provided as a helper functionality, but any problem derived from the compilation of external software is out of the scope of the ETE toolkit.
Check installation instructions at http://etetoolkit.org/download/
%%bash
ete3 build check
A list of predefined worlflows can be found here, but you can always query from the command line for up-to-date options.
%%bash
ete3 build workflows genetree
Although ete-build
perform some basic checks on the input file, make sure that your data is correctly encoded as FASTA format.
It is recommended that the header of the FASTA file does not contain weird symbols and that sequence names are not duplicated (ete will raise an error otherwise).
You can use amino acids or nucleotide sequences.
For this example, we will use the NUP62 homologous amino-acid sequences:
%%bash
cat data/NUP62.aa.fa | head -n15
Only three parameters are required:
-a
to provide an amino acid sequence file (or -n
for nucleotides)-o
to define the output directory (should not exist, otherwise --resume
or --clearall
flags will be necessary)-w
to set the name of the workflow to be executed. For this example we will use the standard_fasttree
workflow.%%bash
ete3 build -w standard_fasttree -a NUP62.aa.fa -o NUP62_tree/ --clearall
After a few minutes you should get the process done and get a list of generated files and references to the software used. All results will be stored in the provided output directory, whose structure is the following:
%%bash
ls NUP62_tree/ -ltr
ete_build.cfg
is a copy of the configuration file used (including all workflow options, etc.)db/
and tasks/
are temporary directories used to run different processes. They are used in case you want to resume an analysis or if you need to debug any issue. Every job (i.e. FastTree, ClustalO, etc. will generate one or more directories in tasks
) standard_fastree
translates into a workflow using clustal Omega and Fasttree, and skipping model testing and alignment trimming, so the result dir is clustalo_default-none-none-fasttree_full
Note that a full path pointer to the final tree and alignment is also printed when ete3 finishes.
As we did not use the --noimg
flag, a tree and alignment image was generated automatically.
from IPython.display import Image
Image(filename='NUP62_tree/clustalo_default-none-none-fasttree_full/NUP62.aa.fa.final_tree.png')
You can also have a quick look at the resulting tree from your terminal using ete3 view --text
%%bash
ete3 view --text -t NUP62_tree/clustalo_default-none-none-fasttree_full/NUP62.aa.fa.final_tree.nw
or open an interactive interface to browse it:
`ete3 view -t NUP62_tree/clustalo_default-none-none-fasttree_full/NUP62.aa.fa.final_tree.nw`
or even generate an SVG figure from the command line
%%bash
ete3 view --image tree.svg -t NUP62_tree/clustalo_default-none-none-fasttree_full/NUP62.aa.fa.final_tree.nw
from IPython.display import SVG
SVG(filename='t0.tree.svg')
remember that the Python API allows you to do much more, from rooting, traversing or node manipulation to custom visualization.
Load the tree using the ete3
Python module, and operate with it as a Python object.
from ete3 import PhyloTree
tree = PhyloTree("NUP62_tree/clustalo_default-none-none-fasttree_full/NUP62.aa.fa.final_tree.nw")
tree.link_to_alignment("NUP62_tree/clustalo_default-none-none-fasttree_full/NUP62.aa.fa.final_tree.used_alg.fa")
tree.render("%%inline")
tree.get_common_ancestor("Phy004W8WJ_FALPE", "Phy004OQ34_STRCA").render("%%inline", layout="basic")