NCBITaxa class

class NCBITaxa(dbfile=None)

Bases: object

versionadded: 2.3

Provides a local transparent connector to the NCBI taxonomy database.

annotate_tree(t, taxid_attr='name', tax2name=None, tax2track=None, tax2rank=None)

Annotate a tree containing taxids as leaf names by adding the ‘taxid’, ‘sci_name’, ‘lineage’, ‘named_lineage’ and ‘rank’ additional attributes.

  • t – a Tree (or Tree derived) instance.
  • taxid_attr (name) – Allows to set a custom node attribute containing

the taxid number associated to each node (i.e. species in PhyloTree instances).

Parameters:tax2name,tax2track,tax2rank – Use these arguments to provide

pre-calculated dictionaries providing translation from taxid number and names,track lineages and ranks.

get_broken_branches(t, taxa_lineages, n2content=None)

Returns a list of NCBI lineage names that are not monophyletic in the provided tree, as well as the list of affected branches and their size.


get_descendant_taxa(parent, intermediate_nodes=False, rank_limit=None, collapse_subspecies=False, return_tree=False)

given a parent taxid or scientific species name, returns a list of all its descendants taxids. If intermediate_nodes is set to True, internal nodes will also be dumped.

get_fuzzy_name_translation(name, sim=0.9)

Given an inexact species name, returns the best match in the NCBI database of taxa names.

Parameters:sim (0.9) – Min word similarity to report a match (from 0 to 1).
Returns:taxid, species-name-match, match-score

Given a valid taxid number, return its corresponding lineage track as a hierarchically sorted list of parent taxids.


Given a list of taxid scientific names, returns a dictionary translating them into their corresponding taxids.

Exact name match is required for translation.


return a dictionary converting a list of taxids into their corresponding NCBI taxonomy rank


Given a list of taxids, returns a dictionary with their corresponding scientific names.

get_topology(taxids, intermediate_nodes=False, rank_limit=None, collapse_subspecies=False, annotate=True)

Given a list of taxid numbers, return the minimal pruned NCBI taxonomy tree containing all of them.

Parameters:intermediate_nodes (False) – If True, single child nodes

representing the complete lineage of leaf nodes are kept. Otherwise, the tree is pruned to contain the first common ancestor of each group.

Parameters:rank_limit (None) – If valid NCBI rank name is provided, the tree is

pruned at that given level. For instance, use rank=”species” to get rid of sub-species or strain leaf nodes.

Parameters:collapse_subspecies (False) – If True, any item under the species

rank will be collapsed into the species upper node.


Given a list of taxid numbers, returns another list with their corresponding scientific names.


Updates the ncbi taxonomy database by downloading and parsing the latest taxdump.tar.gz file from the NCBI FTP site.

Parameters:taxdump_file (None) – an alternative location of the file.