COGNOMA - When data science / coding enthusiasts and researchers build something together

by Ginger Tsueng

One of our pride points for being able to pool, standardize, and share gene, variant, and other “BioThings” annotation data as a service, is that our service is fast! The reason that MyGene.info and MyVariant.info are made with speed in mind is that we want them to be useful to bioinformaticians and tool/resource developers alike! How can we tell if we’ve successfully provided a useful service?

One measure we LOVE, is when a user builds something useful or amazing with our service. Today we’d like to introduce COGNOMA which is an open source tool to predict cancer mutations from gene expression profiles, based on public TCGA data. What’s really interesting about COGNOMA is that it was created by a Philadelphia Meetup group (DataPhilly Meetup), the Childhood Cancer Data Lab at Alex's Lemonade Stand Foundation, and the Greene Lab at Perelman School of Medicine, University of Pennsylvania. @greenescientist (Casey Green) was kind enough to answer our questions.

In one tweet or less, introduce us to COGNOMA:

Wouldn’t it be nice if you could identify tumors with aberrant pathway activities from TCGA? Cognoma implements machine learning techniques to do just that in a user-friendly webserver.

What was the original intent behind COGNOMA (how did COGNOMA come about, how was the collaborative effort started)?

The DataPhilly (data science meetup in Philadelphia) and Code for Philly (programming for the social good) organizations decided that they wanted to contribute to a larger, shared effort to produce a usable product. This led to a few brainstorming sessions for projects, which eventually led to the Cognoma effort to implement workflows that we had been using in our research lab in a user-friendly webserver. Daniel Himmelstein led much of the development effort, which occurred over a roughly one-year period. Hundreds of people attended the event, and tens of folks had their code make it into master on one or more of the Cognoma repositories.

How has COGNOMA since improved (key improvements, not just GitHub commits)?

The user interface got a revamp about a year ago. It’s now a bit more friendly to get started.

Who is currently the intended audience for COGNOMA?

Our intended users are biologists with a slight computational bent who study cancers. We’re in particular hoping to aid folks who have a gene or pathway of interest that is regularly mutated in cancer, but where the processes at play can also be more complex than a simple mutation. These are the people who are most likely to benefit from being able to quickly train a gene expression classifier.

How does COGNOMA use MyGene.info or MyVariant.info services?

We found that it was annoying to maintain a searchable set of genes. We use the mygene.info API to allow users to search for genes as they are building a classifier. You can try this out for yourself at http://cognoma.org.

What are some of COGNOMA’s successes (news releases, papers published)?

There are two uses of the Cognoma workflow in the wild that we’re aware of. The PanCanAtlas projects for Ras and DNA Damage Repair used these workflows. I’m also aware of some subsequent work that has used similar workflows, though it’s still working its way through the publishing process.

What improvements are planned for COGNOMA?

We’d like to get some childhood cancer data into Cognoma as well. It was primarily designed around a single dataset, for which it’s currently using TCGA. Now that Alex’s Lemonade Stand Foundation’s Childhood Cancer Data Lab has taken over maintenance, we’d like to have it be useful for the childhood cancer community as well.