New API for species data

by Chunlei Wu

Last week, we released a new feature to support querying genes beyond the species level, at levels of genus, family,...,phylum. As a companion of this new feature, we have now added a new API for developers to get the species data programatically:

GET http://mygene.info/v2/species/<taxid>

This API is simple. Here is a example:

request:

GET http://mygene.info/v2/species/9606

response:

HTTP/1.1 200 OK
Content-Length: 487
Content-Type: application/json; charset=UTF-8

{
    "_id": 9606, 
    "authority": [
        "homo sapiens linnaeus, 1758"
    ], 
    "common_name": "man", 
    "genbank_common_name": "human", 
    "has_gene": true, 
    "lineage": [
        9606, 
        9605, 
        207598, 
        9604, 
        314295, 
        9526, 
        314293, 
        376913, 
        9443, 
        314146, 
        1437010, 
        9347, 
        32525, 
        40674, 
        32524, 
        32523, 
        1338369, 
        8287, 
        117571, 
        117570, 
        7776, 
        7742, 
        89593, 
        7711, 
        33511, 
        33213, 
        6072, 
        33208, 
        33154, 
        2759, 
        131567, 
        1
    ], 
    "parent_taxid": 9605, 
    "rank": "species", 
    "scientific_name": "homo sapiens", 
    "taxid": 9606, 
    "uniprot_name": "homo sapiens"
}

A few fields from the returned object are explained below, and the rest should be self-explanatory:

  • has_gene: a flag to indicate if this taxonomy node is associated with any gene.
  • lineage: the list of taxonomy nodes traversing from the current node to the root of the taxonomy tree.
  • uniprot_name: organism name from uniprot

Two optional query parameters are available:

  • include_children: if passed as “true” or “1”, an additional "children" field will be returned, containing all children nodes of the current taxonomy node.
  • has_gene: if combined with "include_children=true" and passed as "true" or "1", returned "children" field will be filtered for children nodes associated with genes.

You can try the following examples by yourself:

GET http://mygene.info/v2/species/1239?include_children=true

GET http://mygene.info/v2/species/1239?include_children=true&has_gene=true

As you can guess, the returned "children" fields with "has_gene=true" are exactly the taxid filter passed to MyGene.info when you query genes with "include_tax_tree" parameter turned on.

Please also note that the "children" field will be truncated if the list is greater than 10,000.

Credits:

This feature is made possible through the project led by Greg Stupp at recent 2nd NoB Hackathon.