Clustering module

class ClusterNode(newick=None, text_array=None, fdist=<function spearman_dist>)

Bases: ete2.coretype.tree.TreeNode

Creates a new Cluster Tree object, which is a collection of ClusterNode instances connected in a hierarchical way, and representing a clustering result.

a newick file or string can be passed as the first argument. An ArrayTable file or instance can be passed as a second argument.

Examples:
t1 = Tree() # creates an empty tree t2 = Tree( ‘(A:1,(B:1,(C:1,D:1):0.5):0.5);’ ) t3 = Tree( ‘/home/user/myNewickFile.txt’ )
get_dunn(clusters, fdist=None)

Calculates the Dunn index for the given set of descendant nodes.

get_leaf_profiles()

Returns the list of all the profiles associated to the leaves under this node.

get_silhouette(fdist=None)

Calculates the node’s silhouette value by using a given distance function. By default, euclidean distance is used. It also calculates the deviation profile, mean profile, and inter/intra-cluster distances.

It sets the following features into the analyzed node:
  • node.intracluster
  • node.intercluster
  • node.silhouete

intracluster distances a(i) are calculated as the Centroid Diameter

intercluster distances b(i) are calculated as the Centroid linkage distance

** Rousseeuw, P.J. (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20, 53-65.

iter_leaf_profiles()

Returns an iterator over all the profiles associated to the leaves under this node.

Allows to link a given arraytable object to the tree structure under this node. Row names in the arraytable object are expected to match leaf names.

Returns a list of nodes for with profiles could not been found in arraytable.

set_distance_function(fn)

Sets the distance function used to calculate cluster distances and silouette index.

ARGUMENTS:

fn: a pointer to python function acepting two arrays (numpy) as arguments.

EXAMPLE:

# A simple euclidean distance my_dist_fn = lambda x,y: abs(x-y) tree.set_distance_function(my_dist_fn)
ClusterTree

alias of ClusterNode