This cookbook aims at reproducing the results about selection on sites presented in the work Yang Z (MBE, 2005) using ete-evol
.
The data files adapted from the examples in the PAML package, large dataset will be used.
This files are:
%%bash
ete3 view -t data/hiv-env/HIVenv.nw --alg data/hiv-env/HIVenv.fasta --alg_type compactseq \
-i data/hiv-env/HIVenv.png
from IPython.display import Image
Image(filename='data/hiv-env/HIVenv.png')
In particular we aim at reproducing the Table4:
from IPython.display import Image
Image(filename='data/lysozyme/Table4_YangZ_2005.png')
We are thus going to run 6 models.
First we run models M1 and M2:
%%bash
ete3 evol -t data/hiv-env/HIVenv.nw --alg data/hiv-env/HIVenv.fasta --models M0 M1 M2 M7 M8 -o /tmp/ete3-HIV/ \
--clear_all --cpu 5 -i data/hiv-env/HIVenv.png --histface bar +-curve bar +-bar
According to the LRT table, models allowing class of sites evolving at $\omega$ values over 1 are always more likely in this example.
The summary of model M2 and M8 display the sites (codon position in the alignment) found to be under positive selection in each model under the BEB estimate. These sites corrspond to the results presented in the paper.
from IPython.display import Image
Image(filename='data/hiv-env/HIVenv.png')