ete-evol
ete-evol is a tool that automates CodeML and Slr analyses by using pre-configured evolutionary models and directly producing a graphical representation of the results.
Highlighted features:
Pre-configured models include site (Yang et al. 2000; Massingham and Goldman 2005), branch (Yang and Nielsen 2002) branch-site (Zhang et al. 2005) and clade (Yang and Nielsen 2002; Bielawski and Yang 2004) models.
Multiple evolutionary models can be tested and visualized with a single command.
Input trees can be marked interactively using the ETE GUI
CodeML analyses can be fine tuned (use --codeml_help
for options)
Multithreading options allow running multiple tests in parallel
To run a ete3 evol
workflow we need a fasta file, a tree file and a model name.
Also, if branch, or branch-site models are called, extra information
should be passed in order to apply specific evolutionary constraints on sections of
the tree.
"Model names" correspond to several predefined evolutionary models that were based on previously published research (see table in next section, and PAML and Slr documentations).
Any of these predefined models can be modified easily with the command line,
for example --codeml_param getSE,1
.
Once run, the models can be compared with likelihood-ratio test (LRT). This can
be done either automatically (default), or using pre-defined comparisons
(--tests
option).
Several predefined evolutionary models can be called with the tool.
They can be passed sequentially, and run in parallel:
ete3 evol -t ECP_EDN_15.nw --alg ECP_EDN_15.fasta -o results1/ --models fb M2 SLR --cpu 3
Available predefined models are:
Model | Description | type | Citation |
---|---|---|---|
M0 | negative-selection | null | Ziheng Yang (2000) |
M1 | relaxation | site | |
M2 | positive-selection | ||
M3 | discrete | ||
M4 | frequencies | ||
M5 | gamma | ||
M6 | 2 gamma | ||
M7 | relaxation | ||
M8 | positive-selection | ||
M8a | relaxation | ||
M9 | beta and gamma | ||
M10 | beta and gamma + 1 | ||
M11 | beta and normal > 1 | ||
M12 | 0 and 2 normal > 2 | ||
M13 | 3 normal > 0 | ||
SLR | positive/negative selection | Tim Massingham (2005) | |
bsA | positive-selection | branch-site | Jianzhi Zhang (2005) |
bsA1 | relaxation | ||
bsB | positive-selection | Ziheng Yang (2002) | |
bsC | different-ratios | ||
bsD | different-ratios | Ziheng Yang (2002) Joseph P. Bielawski (2004) | |
b_free | positive-selection | branch | Ziheng Yang (2002) |
b_neut | relaxation | ||
fb | free-ratios | ||
fb_anc | free-ratios | branch_ancestor |
After a short time (2-5 minutes), CodeML/Slr input and output files should appear in the
results/
folder, including (if graphics capabilities were
available in your system) an image of your tree and the used alignment (Figure 2).
One of the main difficulty with automatizing CodeML for several trees
is to mark branches specifically for each model to run. ete3 evol
allows to
pass lists of marks to be applied on each wanted branch model.
The tree can be marked with the command line like this:
ete3 evol -t ECP_EDN_15.nw --alg ECP_EDN_15.fasta --models b_neut b_free --mark Papio_EDN,,,Orang_EDN -o results2/
The three comas mean that the tree will be marked from the common ancestor of the surrounding species. To mark at the ancestral node we would have to put only two comas.
This coma-code allows to mark easily any part of the tree, using single coma new
marks are created. Finally spaces between parameters under --marks
separate sets of marks. Each branch model will be run on each set of marks. For example:
ete3 evol -t ECP_EDN_15.nw --alg ECP_EDN_15.fasta --models b_neut b_free --mark Human_EDN,,,Hylobates_EDN,Macaq_EDN,,,Papio_EDN Macaq_ECP,,Macaq2_ECP,Human_ECP,,Goril_ECP -o results3/
this, will in the first place, mark the tree as in Figure 3, and in second place as in Figure 4.
When, as in the example above, consecutive marks are passed, the given models ("b_neut" and "b_free" in this case) will be run independently for each group of marks.
Extra features:
--leaves
and
--internals
. Note that these options are dangerous in the sense that may
be subjected to the statistical problem of multiple testing.
--interactive
, the GUI will pop up (Figure 3 and 4) and internal node will
be mark with single left click.
By default ete3 evol
will display the result of site models with histograms
(above and bellow the alignments), and the result of the free-branch model on the tree
changing its branch-lengths and showing the omega value as colored disks on the nodes
(Figure 2).
This image is saved as pdf in the output directory unless the --noimg
is
used. The option --view
allows to view this image in the ETE GUI and
interact with it. The GUI allows to browse all run models and compare
the different estimates of evolutionary rates. Finally new models can also be
run directly from the GUI (see the GUI section).
ete3 evol -t ECP_EDN_15.nw --alg ECP_EDN_15.fasta --models b_neut b_free --mark Papio_EDN,,,Orang_EDN -o results3/
In order to compare which of the different models computed fits better to the
input data, the ete3 evol
tool tests, with a LRT, any pairs of computed models.
By default it will do the most classical comparisons, as the ones included in
the table bellow.
Alternative | Null | Test |
---|---|---|
M2 | M1 | PS on sites (M2 prone to miss some sites) (Ziheng Yang (2000)) |
M3 | M0 | test of variability among sites |
M8 | M7 | PS on sites (Ziheng Yang (2000)) |
M8 | M8a | RX on sites |
bsA | bsA1 | PS on sites on specific branch (Jianzhi Zhang (2005)) |
bsA | M1 | RX on sites on specific branch (Jianzhi Zhang (2005)) |
bsC | M1 | different omegas on clades branches sites (Ziheng Yang (2002)) |
bsD | M3 | different omegas on clades branches sites (Ziheng Yang (2002), Joseph P. Bielawski (2004)) |
b_free | b_neut | foreground branch not neutral (w != 1) - RX if P<0.05 (means that w on frg=1) - PS if P>0.05 and wfrg>1 - CN if P>0.05 and wfrg>1 (Ziheng Yang (2002)) |
b_free | M0 | different ratio on branches (Ziheng Yang (2002)) |
Specific tests can also be calculated using the --tests
flag. For example:
ete3 evol -t ECP_EDN_15.nw --alg ECP_EDN_15.fasta --models M2 M1 b_free b_neut --leaves --tests b_free,b_neut -o results4/
Will give this result, with only the tests computed between branch models, not sites even if computed:
LRT Null model | Alternative model | p-value ------------------------------------------------------------ b_neut.2 | b_free.2 | 0.273900 b_neut.7 | b_free.7 | 0.487915 b_neut.9 | b_free.9 | 0.273900 b_neut.8 | b_free.8 | 0.047845* b_neut.3 | b_free.3 | 0.487915 b_neut.1 | b_free.1 | 0.487915 b_neut.5 | b_free.5 | 0.487915 b_neut.6 | b_free.6 | 0.273900 b_neut.4 | b_free.4 | 0.273900 b_neut.13 | b_free.13 | 0.088539 b_neut.12 | b_free.12 | 0.487915 b_neut.15 | b_free.15 | 1.000000 b_neut.14 | b_free.14 | 1.000000 b_neut.11 | b_free.11 | 0.109049 b_neut.10 | b_free.10 | 0.487915
Notes:
--node_ids
.
As previously mentioned ETE GUI's possibilities goes beyond the visualization of results, and allows to run "on the fly" any default evolutionary model (see Evolutionary models).
See Figure 5 for an example on how to mark a branch and run a model using the GUI.
--clear_all
: Clear any data present in the output directory.
--prev_models
: directory where pre-calculated models are stored,
followed by coma model-name.
example: --prev_models /path1/,M2 /path2/,M1
will load models from path1 under the name 'M2', and from path2 into 'M1'
--clean_layout
Other visualization option, with omega
values, synonymous rates and non-synonymous rates written on branches
--histface
: Type of histogram face to be used for site models.
If preceded by '+-' error bars are also drawn (can be bar, stick or curve).
--codeml_param
extra parameter to be interpreted by CodeML and modify the default settings of models.
available keywords are accepted [default values]:
[ 0] aaDist [ 0.0] alpha [ 0] cleandata [ 0] clock [ 2] CodonFreq [ 1] fix_alpha [ 0] fix_blength [ 0] fix_kappa [ 0] fix_omega [ 0] getSE [ 0] icode [ 2] kappa [ 0] Malpha [ 0] method [ 0] Mgene [ 0] model [ 8] ncatG [ *10] ndata [ 0] noisy [ 2] NSsites [ 0.7] omega [ out] outfile [ 0] RateAncestor [ 0] runmode [algn] seqfile [ 1] seqtype [1e-6] Small_Diff [tree] treefile [ 2] verboseexample: verbose,2 omega,1
--codeml_help
show detailed description on codeml parameters for
model configuration and exit.
--cpu
: Maximum number of CPU cores available in the execution host.
If higher than 1, tasks with multi-threading capabilities will enabled (if 0 all
available) cores will be used.
--codeml_binary
: path to CodeML binary.
--slr_binary
: path to Slr binary.
The following files are used in the examples on the left
This dataset is from a previously published work (Joseph P. Bielawski (2004)) where it was used as an example of differential selective pressure in the paralogous genes ECP and EDN in primates (Figure 1).
This page shows the basic usage of ete-evol. More options and examples are available as a collection of notebooks.
Fig 1. GUI in interactive marking "mode" with one branch marked.
Fig 2. Default view of the example with loaded models "M2", "SLR" and "fb".( Note, that omega values are represented in the node sizes and colors. Small blue disks for low omega values, to large red disks for large omega.
Fig 3. View of marks on two groups of branches (image corresponding to the GUI in
interactive marking "mode", available with the --interactive
).
Fig 4. View of marks on two ancestral nodes (image corresponding to the GUI in
interactive marking "mode", available with the --interactive
).
Fig 5. Default view of the example with loaded models "M2", "SLR" and "fb". Numbered arrows are marking the steps to follow to interactively run a branch-site model: 1- Mark a branch of the phylogeny, 2- select the model to run, and 3- click on the "Run" button.