ETE Toolkit - GTPB course, Portugal 2010

Analysis and manipulation of phylogenomic data using ETE

Organized by the Gabaldon's lab at CRG and The Gulbenkian Training Programme in Bioinformatics (GTPB)

Location: Oerias, Portugal

Dates:from 23rd June to 25th June, 2010

COURSE DESCRIPTION: Phylogentic analyses are gradually reaching genomic scales. Nowadays, many resources and surveys encompass a large number of trees that, often, cannot be manually analyzed. Bioinformatics toolkits are intended to provide a flexible framework to deal with specific data in a programmatic way, thus facilitating the analysis of large collections of data. The Environment for Tree Exploration (ETE, http://ete.cgenomics.org) is a Python programming toolkit specially focused on dealing with hierarchical trees. It allows, for instance, to perform a number of operations on phylogenetic trees, as well as designing automatic pipelines. It also provides a highly customizable drawing engine, which can be used to create complex annotated tree images in an automatic way or to interactively explore single trees. Moreover, the ETE toolkit is not only limited to large scale analyses, as it can be used to easily develop specific tree analysis methods for single trees.

The purpose of this course is to provide an introduction to the analysis of phylogenetic trees. It will cover a broad range of tasks that are usually required in any phylogenomic analysis: tree rooting, prediction of orthology and paralogy relationships, tree annotation, calculating distances among sequences or species, tree pruning, trees comparison, and tree visualization. The use of large scale phylogenomic resources, such as PhylomeDB or Ensembl Compara, will be also tackled through examples and exercises. This course will be mostly practical and will be focused on solving real life examples.

Course Pre-requisites:

Course attendees are expected to have basic programming skills (not necessarily in Python, although it is recommended*). All exercises will consist on developing Python scripts to perform different analysis on phylogenetic trees using the ETE toolkit on a GNU/Linux environment.

*Important Note: NO introduction to Python programming is scheduled in the course. However, Python is a very intuitive language that can be learned quickly when you have programmed in other languages. As a reference, Chapters 3-7 and 9 from this tutorial would be more than enough to follow the whole course.

INSTRUCTORS:

Jaime Huerta-Cepas is a postdoc researcher within the Comparative Genomics group at the Centre for Genomic Regulation, headed by Toni Gabaldón. He got his PhD on human genome evolution [1] and large scale phylogentic analyses at the "Universidad Autónoma de Madrid" in 2008. Jaime is the main developer of the phylomeDB database [2], and the ETE toolkit [3]. His work focuses on applying large scale phylogenetic analyses to address different biological problems, such as understanding gene duplication, the evolution of gene expression, functional genome annotation, orthology and paralogy prediction, and the reconstruction of species Tree of Life.

Marina Marcet-Houben obtained her degree in Biochemistry in the Rovira i Virgili University (Tarragona, Spain) she presented her diploma in advanced studies in the evolutive genomics group at the same university. She is currently a last year PhD student in the Comparative Genomics lab at the Center for Genomic Regulation in Barcelona. Her main research interests are related to the use of large scale phylogenomics tools on the evolution of fungi [4] as well as in studies involving the robustness of species trees [5]. Marina is an active collaborator in the phylomeDB project and the main developer of TreeKo, a tool for comparing phylogenetic tree topologies.

[1] J. Huerta-Cepas, H. Dopazo, J. Dopazo and T. Gabaldón. The Human Phylome. Genome Biology 8:r109, 2007.

[2] Huerta-Cepas, J., Bueno, A., Dopazo, J., Gabaldon, T. PhylomeDB: A database for complete collections of gene phylogenies. Nucleic Acids Res. 2008 Jan. 36 (Database issue):D491-6.

[3] Jaime Huerta-Cepas, Joaquín Dopazo and Toni Gabaldón. ETE: A python Environment for Tree Exploration. BMC Bioinformatics. 2010, 11:24.

[4] Marcet-Houben M, Gabaldón T. Acquisition of prokaryotic genes by fungal genomes. Trends Genet. 2010 Jan;26(1):5-8.

[5] Marcet-Houben M, Gabaldón T. The tree versus the forest: the fungal tree of life and the topological diversity within the yeast phylome. PLoS One. 2009;4(2):e4357. Epub 2009 Feb 3.

COURSE PROGRAM:

Wed, June 23rd	Day #1
09:30 - 11:00	1. Introduction to phylogenomics 1.1 General phylogenetic pipeline 1.2 Phylogenomics: scope and applications 1.3 Dealing with large collections of trees: Phylomes
11:00 - 11:30	Coffee Break
11:30 - 12:30	1.4 Public phylogenomic resources: PhylomeDB, Ensembl Compara, TreeFam 1.5 Orthology and Paralogy prediction based on phylogenetic analysis 2. Installing ETE 2.1 Using ETE libraries 2.2 Interactive ETE sessions using ipython
12:30 - 14:00	Lunch Break
14:00 - 16:00	3. Tree basics 3.1 Reading and writing trees 3.2 Rooting, pruning, splitting and concatenating trees 3.3 Browsing tree topology: performing per node operations 3.4 Searching nodes by their attributes 3.5 Basic tree visualization: the ETE GUI
16:00 - 16:30	Tea Break
16:30 - 18:00	4. Comparing tree topologies: 4.1 Linking to the the Phylip package: consensus & treedist programs 4.2 Comparing tree topologies using Treeko 5. Tree annotation 5.1 Adding extra information to the tree nodes 5.2 Using the extended newick format 5.3 Using the PhyloXML format
Thu, June 24th	Day #2
09:30 - 11:00	6.Building clustering trees and phylogenetic profiles 6.1 Linking trees to numeric profiles and matrices 6.2 Using profiles as node properties 6.3 Visualizing trees with profiles: tree heatmaps and profiling plots
11:00 - 11:30	Coffee Break
11:30 - 12:30	7.Dealing with phylogenetic trees 7.1 Associating nodes with multiple sequence alignments 7.2 Species aware trees 7.2.1 Species guided rooting 7.2.2 Checking for monophyletic clades
12:30 - 14:00	Lunch Break
14:00 - 16:00	7.3 Detecting orthology and paralogy relationships 7.3.1 Tree reconciliation 7.3.2 The species overlap algorithm 7.3.3 Working with speciation and duplication events
16:00 - 16:30	Tea Break
16:30 - 18:00	7.3 Dating duplication events 7.5 Working with Species trees: 7.5.1 Using NCBI taxonomy tree
Fri, June 25th	Day #3
09:30 - 11:00	8.Programmatic tree visualization: createing custom tree pictures 8.1 Understanding ETE's language: Styles, Faces and Layouts 8.2 Creating custom layout functions 8.3 Modifying colors and general tree aspect (styles)
11:00 - 11:30	Coffee Break
11:30 - 12:30	8.4 Adding graphical information to nodes (faces) 8.4.1 External image faces 8.4.2 Text based faces 8.4.3 Profile faces 8.5 Rendering trees as PNG or PDF images
12:30 - 14:00	Lunch Break
14:00 - 16:00	9. Exploiting public phylogenomic resources 9.1. The PhylomeDB API (working with complete phylomes) 9.2. Working with Ensembl Compara trees 9.2.1 Reading Ensembl annotated trees
16:00 - 16:30	Tea Break
16:30 - 18:00	10 Advanced topics: 10.1 Creating custom image faces 10.2 Interoperability with other toolkits 10.3 Extending ETE 10.4 Your specific requests