We're proud to announce the next version of MyGene.info! This 3rd version brings new features, fixes some issues, and can be reached using URL http://mygene.info/v3
. MyGene.info v2 will remain up and active while transitioning to v3. Stay tuned, we'll also post a step-by-step guide to migrate from v2 to v3.
Here's a brief list of changes, we'll discuss some of them in depth in next posts:
Refseq accession number with version
As some of you requested this feature (see We need your opinion), we now store accession number with version. You can search this information with and without the version, so the following requests will give the same results:
- http://mygene.info/v3/query?q=NM_001798.4&fields=refseq (including version
.4
) - http://mygene.info/v3/query?q=NM_001798&fields=refseq (no version)
- http://mygene.info/v3/query?q=refseq.rna:NM_001798.4&fields=refseq (explicit
refseq.rna
key with version) - http://mygene.info/v3/query?q=refseq.rna:NM_001798&fields=refseq (explicit
refseq.rna
key, no version)
...
"refseq": {
"genomic": [
"NC_000012.12",
"NC_018923.2",
"NG_034014.1"
],
"protein": [
"NP_001277159.1",
"NP_001789.2",
"NP_439892.2",
"XP_011536034.1"
],
"rna": [
"NM_001290230.1",
"NM_001798.4",
"NM_052827.3",
"XM_011537732.1"
],
...
Note: v2 doesn't store version, see http://mygene.info/v2/query?q=refseq.rna:NM_001798&fields=refseq
RNA-protein mapping
"refseq
", "accession
" and "ensembl
" now contains association between RNA and its protein product, within an added inner key "translation
", as show in the following example for gene ID 1017.
Note: if a RNA or protein accession number isn't available in the association, then it's not added to this list
http://mygene.info/v3/gene/1017?fields=refseq
{
"_id": "1017",
"refseq": {
...
"translation": [
{
"protein": "XP_011536034.1",
"rna": "XM_011537732.1"
},
{
"protein": "NP_001789.2",
"rna": "NM_001798.4"
},
{
"protein": "NP_439892.2",
"rna": "NM_052827.3"
},
{
"protein": "NP_001277159.1",
"rna": "NM_001290230.1"
}
]
}
}
_Note: v2 does provide this information, see http://mygene.info/v2/gene/1017?fields=refseq
"exons
" inner structure
Inner structure is now a list of dictionary. Each dictionary contains information about the exons with a "transcript
" key containing the accession number. "position
" inner key contains the different exons' positions.
http://mygene.info/v3/gene/1017?fields=exons
{
"_id": "1017",
"_score": 21.731894,
"exons": [
{
"cdsend": 55971625,
"cdsstart": 55967008,
"chr": "12",
"position": [
[
55966768,
55967124
],
[
55968048,
55968169
],
[
55968777,
55968948
],
[
55971043,
55971247
],
[
55971520,
55972789
]
],
"strand": 1,
"transcript": "NM_001290230",
"txend": 55972789,
"txstart": 55966768
},
...
}
Note: you can compare this structure with the actual v2, which uses a dictionary instead of a list of dictionary: http://mygene.info/v2/gene/1017?fields=exons
Better mapping between Ensembl and Entrez gene IDs
There are some annoying cases of one-to-many matches between Ensembl IDs and Entrez IDs, based on the mapping from Ensembl. For example, Ensembl gene ID ENSMUSG00000071350 associated to Entrez gene IDs 628705 and 239122. While these ambiguous mappings won't disappear completely, majority of them can be fixed by cross-checking the mappings from other sources. We worked hard to improve this mapping and remove discrepancy as much as we could. We'll post more about this soon.
- http://mygene.info/v3/query?q=ensembl.transcript:ENSMUST00000095775 returns only one result, for gene ID 239122, after disambiguation based on symbol Setdb2
- the same request http://mygene.info/v2/query?q=ensembl.transcript:ENSMUST00000095775 on v2 returns two results, for gene 239122 and 628705.
Querying "reporter" data source
Because some "reporter
" IDs are integers (e.g. Affymetrix HuGene_1-1 array), just like Entrez gene IDs, "reporter
" field now needs to be explicit in the query to avoid any confusion:
http://mygene.info/v3/query?q=reporter:2845421&fields=reporter
Change in dot.field notation default
The "dot.field" notation is when nested keys are returned using dot, like ["refseq.rna"]
, instead of nested structure, such as ["refseq"]["rna"]
. This behavior can be triggered using dotfield=1
in conjunction with fields
parameters. Default is now results are returned using nested structure, unless dotfield=1
is explicitly specified.
- http://mygene.info/v2/gene/1017?fields=refseq.rna will show dotfield notation:
{
"_id": "1017",
"refseq.rna": [
"NM_001290230",
"NM_001798",
"NM_052827",
"XM_011537732"
]
}
- http://mygene.info/v3/gene/1017?fields=refseq.rna will show nested structure
{
"_id": "1017",
"_score": 21.731894,
"refseq": {
"rna": [
"NM_001290230.1",
"NM_001798.4",
"NM_052827.3",
"XM_011537732.1"
]
}
}
Note: this change is only for annotation endpoint /gene
. Query endpoint /query
already defaults to nested structure.
We focus on your needs so you're more than welcome to give feedback, comment any of these changes and request more. Again, stay tuned for more about this new version!