ete-ncbiquery

Fast and handy queries to the NCBI taxonomy database

Overview

ete-ncbiquery allows to download, parse and query a local copy of the NCBI taxonomy database.
  • Extract the NCBI tree topology for a given list of TaxIDs in newick format
  • Dump extended information for a given list of TaxIDs
  • Dump descendant taxa given a parent taxid or taxa name
  • Translate Scientific names into TaxIDs

Inline translation of taxids and taxa names

$ ete3 ncbiquery --search 9606 'Canis familiaris'  --info

# Taxid Sci.Name        Rank    Named Lineage   Taxid Lineage
9606    Homo sapiens    species root,cellular organisms,Eukaryota,Opisthokonta,Metazoa,Eumetazoa,Bilateria,Deuterostomia,Chordata,Craniata,Vertebrata,Gnathostomata,Teleostomi,Euteleostomi,Sarcopterygii,Dipnotetrapodomorpha,Tetrapoda,Amniota,Mammalia,Theria,Eutheria,Boreoeutheria,Euarchontoglires,Primates,Haplorrhini,Simiiformes,Catarrhini,Hominoidea,Hominidae,Homininae,Homo,Homo sapiens   1,131567,2759,33154,33208,6072,33213,33511,7711,89593,7742,7776,117570,117571,8287,1338369,32523,32524,40674,32525,9347,1437010,314146,9443,376913,314293,9526,314295,9604,207598,9605,9606
9615    Canis lupus familiaris  subspecies      root,cellular organisms,Eukaryota,Opisthokonta,Metazoa,Eumetazoa,Bilateria,Deuterostomia,Chordata,Craniata,Vertebrata,Gnathostomata,Teleostomi,Euteleostomi,Sarcopterygii,Dipnotetrapodomorpha,Tetrapoda,Amniota,Mammalia,Theria,Eutheria,Boreoeutheria,Laurasiatheria,Carnivora,Caniformia,Canidae,Canis,Canis lupus,Canis lupus familiaris    1,131567,2759,33154,33208,6072,33213,33511,7711,89593,7742,7776,117570,117571,8287,1338369,32523,32524,40674,32525,9347,1437010,314145,33554,379584,9608,9611,9612,9615
      

Dumping ncbi taxonomy tree connecting a list of taxa

$ ete3 ncbiquery --search 9606 'Mus musculus' 'Gallus gallus' 7227 --tree | ete3 view --ncbi --text

         /-Mus musculus - 10090
      /-|
   /-|   \-Homo sapiens - 9606
  |  |
--|   \-Gallus gallus - 9031
  |
   \-Drosophila melanogaster - 7227

You can also dump an annotated newick tree, each node including taxid, lineage_track, sci_name and rank features.

$ ete3 ncbiquery --search 9606 'Mus musculus' 'Gallus gallus' 7227 --tree > tree.nw

ete-ncbiquery can be combined with other UNIX commands and piped to ete-view in order to produce annotated tree images

$ cat species.txt

394
882
883
1140
1148
2110
2850
3055
3218
(...)

$ cut -f1 species.txt | ete3 ncbiquery --tree | ete3 view --ncbi --image ncbitree.png

Dumping descendant taxa tree

If only one taxid or taxa names is supplied, the tree containing all its descendant taxa will be dumped

      
$ ete3 ncbiquery --search hominidae --tree --collapse_subspecies | ete3 view --text

         /-Pongo pygmaeus - 9600
        |
        |--Pongo abelii - 9601
   /- /-|
  |     |--Pongo sp. - 9603
  |     |
  |      \-Pongo abelii x pygmaeus - 502961
  |
--|      /-Gorilla gorilla - 9593
  |   /-|
  |  |   \-Gorilla beringei - 499232
  |  |
  |  |   /-Homo sapiens - 9606
   \-|--|
     |   \-Homo heidelbergensis - 1425170
     |
     |   /-Pan troglodytes - 9598
      \-|
         \-Pan paniscus - 9597