Wulab’s research area is focusing on novel data science approaches to promote open science and FAIR data-sharing, integrate large-scale heterogeneous data and knowledge, and thereby enable novel biomedical discovery.
Open Science and FAIR Data Sharing
A set of projects in this area:
-
BioGPS (http://biogps.org)
BioGPS is a gene portal for sharing public genomic dataset and integrating gene-centric annotations together with the gene-level genomic data.- For data providers, it helps to disseminate the genomic dataset to the broader research community.
- For data consumers, it helps to find and explore relevant dataset with integrated knowledge.
- For Web-lab researchers, it helps to explore gene-centric annotations and associated gene-level genomic patterns.
-
Data Discovery Engine (or DDE, https://discovery.biothings.io)
DDE is a project from the CD2H program, where Dr. Wu is a PI and co-chair of the Resource Discovery Core. This is a project to help research institutes to share their data in an open and interoperable way. Within the CD2H program, DDE helps CTSA member institutes to share datasets, software tools and people expertise within the CTSA program.With this project, we can help data providers to promote their dataset, and make their data more discoverable in both general search engines (like Google, Bing etc) and biomedical specific data portals (e.g. in CTSASearch, a search engine for CTSA program).
BioGPS project is to help host and distribute datasets, this DDE project is about how to make dataset metadata more discovery and promote their reuse and facilitate new collaborations. DDE helps either a data provider who wants to disseminate and promote their data, or a team/institute who wants to build a domain-specific data portal.
-
NIAID Data Ecosystem (https://data.niaid.nih.gov)
NIAID Data Ecosystem provides a discovery portal which enables users to find datasets from a wide range of repositories and offers a convenient one-stop-shop for discovery of data on infectious and immune-mediated diseases (IIDs). The Discovery Portal regularly collects metadata from all included repositories and indexes it in a searchable catalogue for anyone to explore.This project uses the above Data Discovery Engine tool as the backbone to standardize metadata schemas and harvest aggregated resource metadata. It's a domain-specific data portal project focusing on the infectious disease research.
Knowledgebase Integration
Large-scale knowledge integration is becoming a vital and enabling technology for novel discovery, especially when integrated with the latest knowledge inferencing algorithms. Our lab has built a set of tools to enable such large-scale knowledge graph integration.
-
- MyGene.info, MyVariant.info, MyChem.info and MyDisease.info, are a set of high-performance biomedical APIs for quick and integrated access of biomedical entity-centric knowledge. Just like BioGPS is for gene-centric knowledge targeting wet-lab researchers, MyGene.info is a gene-centric knowledge API targeting bioinformaticians, same for other BioThings APIs focusing on different biomedical entities. Collectively, BioThings APIs are handling over 30 million of requests every month with tens of thousands of users.
- BioThings SDK is a Software Development Kit (SDK) underlying all of our BioThings APIs. By open-sourcing this SDK, other researchers can use this package to build even more high-performance, up-to-date biomedical APIs. One example is the dozens of Translator Knowledge Provider APIs built with BioThings SDK by both our group and other collaborating groups: https://biothings.ncats.io.
-
SmartAPI (https://smart-api.info/) and BioThings Explorer (https://biothings.io/explorer/)
SmartAPI and BioThings Explorer are two projects built to allow researchers to query large scope of the knowledge graph hosted in the many individual knowledge APIs (like dozens of BioThings APIs and many other biomedical APIs registered at the SmartAPI registry) in a distributed way. Both projects are currently the part of the backbone components for the NCATS Translator program, with the use cases targeting precision medicine, drug repurposing etc.
Collaborations Enabling Biomedical Discovery
The tools developed at the Wulab foster the collaborations between data scientists and wet-lab researchers, which enable novel biomedical discovery. Some examples are available from the following selected publications:
- Functional annotation of the transcriptome of the pig, sus scrofa, based upon network analysis of an RNAseq transcriptional atlas KM Summers, SJ Bush, C Wu, AI Su, C Muriuki, EL Clark, HA Finlayson, ... Frontiers in genetics 10, 1355
- exRNA Atlas Analysis Reveals Distinct Extracellular RNA Cargo Types and Their Carriers Present across Human Biofluids OD Murillo, W Thistlethwaite, J Rozowsky, SL Subramanian, R Lucero, ... Cell 177 (2), 463-477. e15
- Diverse reprogramming codes for neuronal identity R Tsunemoto, S Lee, A Szucs, P Chubukov, I Sokolova, JW Blanchard, ... Nature 557 (7705), 375-380
- Stress-independent activation of XBP1s and/or ATF6 reveals three functionally diverse ER proteostasis environments MD Shoulders, LM Ryno, JC Genereux, JJ Moresco, PG Tu, C Wu, ... Cell reports 3 (4), 1279-1292
- Identification of serum-derived sphingosine-1-phosphate as a small molecule regulator of YAP E Miller, J Yang, M DeRan, C Wu, AI Su, GMC Bonamy, J Liu, EC Peters, ... Chemistry & biology 19 (8), 955-962
- A small molecule accelerates neuronal differentiation in the adult rat H Wurdak, S Zhu, KH Min, L Aimone, LL Lairson, J Watson, G Chopiuk, ... Proceedings of the National Academy of Sciences 107 (38), 16542-16547
Research Programs and Funding
The Wu Lab is proud to play key roles in these international or national research programs:
- Big Data to Knowledge
- Center for Data to Health
- Biomedical Data Translator
- National COVID Cohort Collaborative
With the acknowledgement to the funding support from: NCATS, NIGMS, NIAID and NHGRI.