Making functional inferences from evolutionarily constrained regions with Aminode

by Ginger Tsueng

One of our pride points for being able to pool, standardize, and share gene, variant, and other “BioThings” annotation data as a service, is that our service is fast! The reason that MyGene.info and MyVariant.info are made with speed in mind is that we want them to be useful to bioinformaticians and tool/resource developers alike! How can we tell if we’ve successfully provided a useful service?

One measure we LOVE, is when a user builds something useful or amazing with our service. Today we’d like to introduce Aminode, a user-friendly webtool for the routine and rapid inference of evolutionarily constrained regions (ECRs). ECRs can be used to understand a protein’s structure or function, but the computational time and skills required to infer ECRs can detract from its value as a tool for biomedical researchers.

If ECRs sound like something of value to you, Aminode will likely serve you well with its user-friendly and elegantly simple/clean design.
aminode is minimalistic and elegant

Aminode was originally conceived by Kevin Chang, a rotating PhD student in the laboratory of Dr. Marco Sardiello at Baylor College of Medicine. Both Kevin and Marco were kind enough to answer our questions about Aminode.

In one tweet or less, introduce us to Aminode:
Aminode predicts which parts of the protein are relatively important or unimportant using evolution as an experimental tool.

What was the original intent behind Aminode (how did the Aminode team first come to learn of the potential value of ECRs to the biomedical research community and the issues they faced in trying to use ECRs)?
The value of analyzing evolutionary constrained regions in a protein was already well known to our lab. However, given the lack of specific tools to carry out the task, it took too much time and effort to produce just one graph showing ECRs for even a protein of average length. Our original intent was to simply reduce the time of analysis by creating a small webtool for in-lab usage. Users would input protein sequences in a text file, and the tool would return the proper calculations for creating a figure. However, as development continued, we saw an opportunity to scale up to the entire human proteome and automate the entire process! Now, a high-quality analysis is available for every single protein in the human proteome that has enough sequencing data from other vertebrate species.

How has Aminode since improved (key improvements, not just GitHub commits)?
Since the early days of a humble in-lab tool, Aminode has now taken off into a full-fledged database and webtool. ECR analysis is now available to everyone, instead of a select few with the right combination of experience in evolutionary biology and statistics. Additionally, we have added a custom tool where users can input their own protein sequences of interest to identify ECRs. This process can be scaled up to any group of proteins that have some evolutionary relationship, such as paralogs or distant family members. The more the data that is input, the more sensitive the resulting analysis will be.

Who is currently the intended audience for Aminode?
Anyone who has any interest in genes and proteins can use Aminode. However, our main target are scientists and medical practitioners. Scientists can use Aminode in a variety of ways— for example, functional experiments often call for tagging a protein without changing its function. A quick glance at an Aminode graph will show you which parts of the protein are not as important to protein function, which would be good targets for tagging.

Medical doctors, especially medical geneticists, would find Aminode very useful. Oftentimes patients will have variants (point mutations) of unknown significance. It is important to know whether or not these variants will be pathogenic (cause disease). By matching variant location to Aminode-generated protein evolutionary profiles, we can see whether any given variant falls into an Aminode ECR, an area that Aminode predicts to be important. After mapping all known pathogenic and non-pathogenic missense variants reported in UniProt, we found that pathogenic variants are much more likely than non-pathogenic variants to fall into Aminode ECRs. This can serve as a powerful predictive tool for variant analysts and genetic counselors.

How does Aminode use MyGene.info or MyVariant.info services?
Aminode has a page for each protein analyzed that contains the analysis of evolutionary constrained regions, raw data, download links, and a gene summary. Aminode shows information from MyGene.info so that users can have a short summary of what is known about the gene without having to search for it separately or leave the page.

What are some of Aminode’s successes (news releases, papers published)?
Aminode has been published in Scientific Reports, a Nature Research journal. It has also been featured in press articles by Rice University, Baylor College of Medicine and Batten Disease News. Additionally, from the usage metadata of the webtool we see that scientists in multiple institutions all over the world are using Aminode for their analyses. We look forward to reading research articles containing Aminode graphs as well as receiving feedback from users to improve the tool.

What improvements are planned for Aminode?
An inherent limit of Aminode is that its prediction models depend on the annotations of genes in multiple species—but these are not always complete or satisfactory and therefore have room for improvement. In its first release, Aminode used data from Ensembl. In the future, additional sources will be added to provide more comprehensive inter-species sequence information, and Aminode will be constantly updated with the latest sequencing information from all sources. Furthermore, we have plans to include variant analysis as a specific feature of Aminode to help assist with investigations into genetic disorders.