Query genes beyond species, at levels of genus, family,...,phylum

by Chunlei Wu
taxonomic rank

In a typical MyGene.info query, you can query genes for one or multiple species by providing a "species" parameter:

http://mygene.info/v2/query?q=cdk2&species=human

http://mygene.info/v2/query?q=cdk2&species=9606,10090

MyGene.info now allows you to query genes at the level beyond species, that is, you can now query for matching genes for any given genus, family, or even phylum from the taxonomy tree (effectively any node from the tree). For example, you can now query for "lytic enzyme" in any firmicutes (gram-positive bacteria, taxonomy id: 1239):

http://mygene.info/v2/query?q=lytic enzyme&species=1239&include_tax_tree=true

or, in Python:

mg = mygene.MyGeneInfo()
mg.query('lytic enzyme', species=1239, include_tax_tree=True)

Note that include_tax_tree=true parameter toggles the query against all taxonomy ids under 1239 node in the taxonomy tree (including 1239 itself). As comparison, the query without this parameter:

http://mygene.info/v2/query?q=lytic enzyme&species=1239

will return empty hits, as no genes are annotated at the level of firmicutes.

We expect this new feature will be particularly useful for the fields like evolutionary biology and microbiome. In fact, this was a requested feature from our users in those fields. So, please give a try and let us know your feedback.

The exact usage of this feature is summarized below:

  • species parameter accepts one or multiple taxnomony ids (multiple ids separated by commas)

  • passing include_tax_tree=true expands the query against any sub-nodes of passed taxids (from species parameter) in the taxonomy tree, including the taxids themselves.

  • Since you can pass any taxonomy id from the taxonomy tree, we will cap the expanded taxonomy id list to 10,000, so that it won't overload our servers.

Credits:

This feature is made possible through the project led by Greg Stupp at recent 2nd NoB Hackathon.