PhyloTree class

class PhyloNode(newick=None, alignment=None, alg_format='fasta', sp_naming_function=<function _parse_species>, format=0, **kargs)

Bases: ete3.coretype.tree.TreeNode

Extends the standard TreeNode instance. It adds specific attributes and methods to work with phylogentic trees.

Parameters:
  • newick – Path to the file containing the tree or, alternatively, the text string containing the same information.
  • alignment – file containing a multiple sequence alignment.
  • alg_format – “fasta”, “phylip” or “iphylip” (interleaved)
  • format

    sub-newick format

    FORMAT DESCRIPTION
    0 flexible with support values
    1 flexible with internal node names
    2 all branches + leaf names + internal supports
    3 all branches + all names
    4 leaf branches + leaf names
    5 internal and leaf branches + leaf names
    6 internal branches + leaf names
    7 leaf branches + all names
    8 all names
    9 leaf names
    100 topology only
  • sp_naming_function – Pointer to a parsing python function that receives nodename as first argument and returns the species name (see PhyloNode.set_species_naming_function(). By default, the 3 first letter of nodes will be used as species identifiers.
Returns:

a tree node object which represents the base of the tree.

annotate_ncbi_taxa(taxid_attr='species', tax2name=None, tax2track=None, tax2rank=None, dbfile=None)

Add NCBI taxonomy annotation to all descendant nodes. Leaf nodes are expected to contain a feature (name, by default) encoding a valid taxid number.

All descendant nodes (including internal nodes) are annotated with the following new features:

Node.spname: scientific spcies name as encoded in the NCBI taxonomy database

Node.named_lineage: the NCBI lineage track using scientific names

Node.taxid: NCBI taxid number

Node.lineage: same as named_lineage but using taxid codes.

Note that for internal nodes, NCBI information will refer to the first common lineage of the grouped species.

Parameters:
  • taxid_attr (name) – the name of the feature that should be used to access the taxid number associated to each node.
  • tax2name (None) – A dictionary where keys are taxid numbers and

values are their translation into NCBI scientific name. Its use is optional and allows to avoid database queries when annotating many trees containing the same set of taxids.

Parameters:tax2track (None) – A dictionary where keys are taxid numbers and

values are their translation into NCBI lineage tracks (taxids). Its use is optional and allows to avoid database queries when annotating many trees containing the same set of taxids.

Parameters:tax2rank (None) – A dictionary where keys are taxid numbers and

values are their translation into NCBI rank name. Its use is optional and allows to avoid database queries when annotating many trees containing the same set of taxids.

:param None dbfile : If provided, the provided file will be used as a local copy of the NCBI taxonomy database.

Returns:tax2name (a dictionary translating taxid numbers into

scientific name), tax2lineage (a dictionary translating taxid numbers into their corresponding NCBI lineage track) and tax2rank (a dictionary translating taxid numbers into rank names).

collapse_lineage_specific_expansions(species=None, return_copy=True)

Converts lineage specific expansion nodes into a single tip node (randomly chosen from tips within the expansion).

Parameters:species (None) – If supplied, only expansions matching the species criteria will be pruned. When None, all expansions within the tree will be processed.
get_age(species2age)

Implements the phylostratigrafic method described in:

Huerta-Cepas, J., & Gabaldon, T. (2011). Assigning duplication events to relative temporal scales in genome-wide studies. Bioinformatics, 27(1), 38-45.

get_age_balanced_outgroup(species2age)

New in version 2.2.

Returns the node better balance current tree structure according to the topological age of the different leaves and internal node sizes.

Parameters:species2age – A dictionary translating from leaf names into a topological age.
get_descendant_evol_events(sos_thr=0.0)

Returns a list of all duplication and speciation events detected after this node. Nodes are assumed to be duplications when a species overlap is found between its child linages. Method is described more detail in:

“The Human Phylome.” Huerta-Cepas J, Dopazo H, Dopazo J, Gabaldon T. Genome Biol. 2007;8(6):R109.

get_farthest_oldest_leaf(species2age, is_leaf_fn=None)

Returns the farthest oldest leaf to the current one. It requires an species2age dictionary with the age estimation for all species.

Parameters:is_leaf_fn (None) – A pointer to a function that receives a node instance as unique argument and returns True or False. It can be used to dynamically collapse nodes, so they are seen as leaves.
get_farthest_oldest_node(species2age)

New in version 2.1.

Returns the farthest oldest node (leaf or internal). The difference with get_farthest_oldest_leaf() is that in this function internal nodes grouping seqs from the same species are collapsed.

get_my_evol_events(sos_thr=0.0)

Returns a list of duplication and speciation events in which the current node has been involved. Scanned nodes are also labeled internally as dup=True|False. You can access this labels using the ‘node.dup’ sintaxis.

Method: the algorithm scans all nodes from the given leafName to the root. Nodes are assumed to be duplications when a species overlap is found between its child linages. Method is described more detail in:

“The Human Phylome.” Huerta-Cepas J, Dopazo H, Dopazo J, Gabaldon T. Genome Biol. 2007;8(6):R109.

get_speciation_trees(map_features=None, autodetect_duplications=True, newick_only=False, target_attr='species')

Calculates all possible species trees contained within a duplicated gene family tree as described in Treeko (see Marcet and Gabaldon, 2011 ).

Parameters:autodetect_duplications (True) – If True, duplication

nodes will be automatically detected using the Species Overlap algorithm (PhyloNode.get_descendants_evol_events(). If False, duplication nodes within the original tree are expected to contain the feature “evoltype=D”.

Parameters:features (None) – A list of features that should be

mapped from the original gene family tree to each species tree subtree.

Returns:(number_of_sptrees, number_of_dups, species_tree_iterator)
get_species()

Returns the set of species covered by its partition.

iter_species()

Returns an iterator over the species grouped by this node.

ncbi_compare(autodetect_duplications=True, cached_content=None)
reconcile(species_tree)

Returns the reconcilied topology with the provided species tree, and a list of evolutionary events inferred from such reconciliation.

set_species_naming_function(fn)

Sets the parsing function used to extract species name from a node’s name.

Parameters:fn – Pointer to a parsing python function that receives nodename as first argument and returns the species name.
# Example of a parsing function to extract species names for
# all nodes in a given tree.
def parse_sp_name(node_name):
    return node_name.split("_")[1]
tree.set_species_naming_function(parse_sp_name)
species
split_by_dups(autodetect_duplications=True)

Returns the list of all subtrees resulting from splitting current tree by its duplication nodes.

Parameters:autodetect_duplications (True) – If True, duplication

nodes will be automatically detected using the Species Overlap algorithm (PhyloNode.get_descendants_evol_events(). If False, duplication nodes within the original tree are expected to contain the feature “evoltype=D”.

Returns:species_trees
PhyloTree

alias of PhyloNode

class EvolEvent

Basic evolutionary event. It stores all the information about an event(node) ocurred in a phylogenetic tree.

etype : D (Duplication), S (Speciation), L (gene loss),

in_seqs : the list of sequences in one side of the event.

out_seqs : the list of sequences in the other side of the event

node : link to the event node in the tree