Clustering module

class ClusterNode(newick=None, text_array=None, fdist=<function spearman_dist>)

Bases: ete3.coretype.tree.TreeNode

Creates a new Cluster Tree object, which is a collection of ClusterNode instances connected in a hierarchical way, and representing a clustering result.

a newick file or string can be passed as the first argument. An ArrayTable file or instance can be passed as a second argument.

t1 = Tree() # creates an empty tree t2 = Tree( ‘(A:1,(B:1,(C:1,D:1):0.5):0.5);’ ) t3 = Tree( ‘/home/user/myNewickFile.txt’ )
get_dunn(clusters, fdist=None)

Calculates the Dunn index for the given set of descendant nodes.


Returns the list of all the profiles associated to the leaves under this node.


Calculates the node’s silhouette value by using a given distance function. By default, euclidean distance is used. It also calculates the deviation profile, mean profile, and inter/intra-cluster distances.

It sets the following features into the analyzed node:
  • node.intracluster
  • node.intercluster
  • node.silhouete

intracluster distances a(i) are calculated as the Centroid Diameter

intercluster distances b(i) are calculated as the Centroid linkage distance

** Rousseeuw, P.J. (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20, 53-65.


Returns an iterator over all the profiles associated to the leaves under this node.

Allows to link a given arraytable object to the tree structure under this node. Row names in the arraytable object are expected to match leaf names.

Returns a list of nodes for with profiles could not been found in arraytable.


Sets the distance function used to calculate cluster distances and silouette index.


fn: a pointer to python function acepting two arrays (numpy) as arguments.


# A simple euclidean distance my_dist_fn = lambda x,y: abs(x-y) tree.set_distance_function(my_dist_fn)

alias of ClusterNode