Reproduce codeml analysis from Yang Z (MBE, 2005)¶

This cookbook aims at reproducing the results about selection on sites presented in the work Yang Z (MBE, 2005) using ete-evol.

The data files adapted from the examples in the PAML package, large dataset will be used.

This files are:

data/lysozyme/HIVenv.fasta
data/lysozyme/HIVenv.nw

In [23]:

%%bash
ete3 view -t data/hiv-env/HIVenv.nw --alg data/hiv-env/HIVenv.fasta --alg_type compactseq \
-i data/hiv-env/HIVenv.png

In [25]:

from IPython.display import Image
Image(filename='data/hiv-env/HIVenv.png')

Out[25]:

In particular we aim at reproducing the Table4:

In [26]:

from IPython.display import Image
Image(filename='data/lysozyme/Table4_YangZ_2005.png')

Out[26]:

We are thus going to run 6 models.

First we run models M1 and M2:

In [29]:

%%bash
ete3 evol -t data/hiv-env/HIVenv.nw --alg data/hiv-env/HIVenv.fasta --models M0 M1 M2 M7 M8 -o /tmp/ete3-HIV/ \
--clear_all --cpu 5 -i data/hiv-env/HIVenv.png --histface bar +-curve bar +-bar

Using: /Users/fransua/.miniconda2/bin/ete3_apps/bin/Slr
Using: /Users/fransua/.miniconda2/bin/ete3_apps/bin/codeml

Running CodeML/Slr (5 CPUs)
  - processing model M0
  - processing model M1
  - processing model M2
  - processing model M7
  - processing model M8

LRT

               Null model |        Alternative model | p-value
     ------------------------------------------------------------
                       M0 |                       M7 | 0.000000**
                       M0 |                       M1 | 0.000000**
                       M0 |                       M8 | 0.000000**
                       M0 |                       M2 | 0.000000**
                       M7 |                       M8 | 0.000122**
                       M7 |                       M2 | 0.000129**
                       M1 |                       M8 | 0.000260**
                       M1 |                       M2 | 0.000276**

SUMMARY BY MODEL

 - Model M0
   * Average omega for all tree: 0.901

 - Model M7
   * Average omega for all tree: 0.548

 - Model M1
   * Average omega for all tree: 0.554
   * Sites significantly caracterized
      codon position |   category
     -----------------------------------------------------------
                 1   |   Relaxed (probability > 0.95)
                 4   |   Relaxed (probability > 0.95)
                 6   |   Relaxed (probability > 0.99)
                 9   |   Relaxed (probability > 0.95)
                13   |   Relaxed (probability > 0.95)
           15-  16   |   Relaxed (probability > 0.99)
           21-  22   |   Relaxed (probability > 0.99)
                24   |   Relaxed (probability > 0.99)
                26   |   Relaxed (probability > 0.99)
                28   |   Relaxed (probability > 0.95)
                29   |   Relaxed (probability > 0.99)
                31   |   Relaxed (probability > 0.95)
                36   |   Relaxed (probability > 0.99)
           39-  40   |   Relaxed (probability > 0.95)
                46   |   Relaxed (probability > 0.99)
                51   |   Relaxed (probability > 0.95)
                52   |   Relaxed (probability > 0.95)
           56-  57   |   Relaxed (probability > 0.95)
                59   |   Relaxed (probability > 0.99)
                66   |   Relaxed (probability > 0.99)
           68-  69   |   Relaxed (probability > 0.95)
                75   |   Relaxed (probability > 0.99)
                76   |   Relaxed (probability > 0.95)
                77   |   Relaxed (probability > 0.95)
                80   |   Relaxed (probability > 0.99)
           83-  84   |   Relaxed (probability > 0.95)
           85-  86   |   Relaxed (probability > 0.99)
                87   |   Relaxed (probability > 0.95)
                90   |   Relaxed (probability > 0.95)

 - Model M8
   * Average omega for all tree: 1.120
   * Sites significantly caracterized
      codon position |   category
     -----------------------------------------------------------
                26   |   Positively-selected (probability > 0.99)
                28   |   Positively-selected (probability > 0.95)
                51   |   Positively-selected (probability > 0.99)
                66   |   Positively-selected (probability > 0.99)
                87   |   Positively-selected (probability > 0.99)

 - Model M2
   * Average omega for all tree: 1.121
   * Sites significantly caracterized
      codon position |   category
     -----------------------------------------------------------
                28   |   Positively-selected (probability > 0.99)
                66   |   Positively-selected (probability > 0.95)
                87   |   Positively-selected (probability > 0.95)

According to the LRT table, models allowing class of sites evolving at $\omega$ values over 1 are always more likely in this example.

The summary of model M2 and M8 display the sites (codon position in the alignment) found to be under positive selection in each model under the BEB estimate. These sites corrspond to the results presented in the paper.

In [30]:

from IPython.display import Image
Image(filename='data/hiv-env/HIVenv.png')

Out[30]: