New Data Release for MyVariant.info 201704

by Chunlei Wu

Another fresh data release for MyVariant.info is out! As we mentioned in the last data release, a refactored backend for data aggregation and updating is now in-place, which streamlines our process of keeping variant annotations up-to-date. In this data release, we have updated the data from ClinVar, dbSNP and dbNSFP to their latest versions, and also added the variant annotations from UniProtKB. Here are more details.

Data Sources Updated

Three popular data sources, ClinVar, dbSNP and dbNSFP data were updated to their latest (same version for both hg19 and hg38 assembly):

Some numbers for GRCh37/hg19 variants:

last release new release # of variants
in last release
# of variants
in new release
ClinVar 2017-03 2017-04 262,061 282,772
dbSNP 149 150 153,968,878 238,894,687
dbNSFP 3.3a 3.4a 82,366,649 82,366,524

Similarly, some numbers for GRCh38/hg38 variants:

last release new release # of variants
in last release
# of variants
in new release
ClinVar 2016-11 2017-04 262,254 282,956
dbSNP 149 150 153,745,925 335,499,682
dbNSFP 3.3a 3.4a 82,443,934 82,443,748

ClinVar, dbSNP and dbNSFP annotations are available under "clinvar" and "dbsnp", and "dbnsfp" subfields, respectively, for each annotated variant. MyVariant.info aggregates annotations from ClinVar, dbSNP, dbNSFP and other 12 sources for each variant, so you can access them all in one request.

A notable big change in this data release is the number of variants from dbSNP has increased significantly, from 154K in v149 to 335K in v150, almost doubled. The increased variants are mostly coming from the TopMed and HLI projects. You can see the full announcement here.

The total number of unique variants is now over 425M (424,515,266), compared to 341M (341,289,677) previously. More details about the variant data we provide from MyVariant.info are always available from our documentation. The programmatic access of this information is available from our metadata endpoint (and hg38 metadata).

New Data Sources Added

In this data release, we added variant annotations from UniProtKB, including the "humsavar.txt": an index of manually curated human polymorphisms and disease mutations from UniProtKB/Swiss-Prot. You can access the data under "uniprot" field. And note that "uniprot" field is only available for hg38 variants. Here are a few query examples:

curl 'http://myvariant.info/v1/variant/chr5:g.97171338A%3EG?fields=uniprot&assembly=hg38'
curl 'http://myvariant.info/v1/variant/chr4:g.88121708T%3EC?fields=uniprot&assembly=hg38'
curl 'http://myvariant.info/v1/variant/chr15:g.51464769A%3EC?fields=uniprot&assembly=hg38'

You can also reference a variant using UniProt's VAR id (also called ftid):

curl 'http://myvariant.info/v1/variant/VAR_042351?fields=uniprot&assembly=hg38'

That's all! And as always, feel free to reach us at help@myvariant.info or @myvariantinfo if you have any questions or feedback.