Share this story

UMass Chan scientists lead effort to annotate human genome

Collaborative effort yields nearly 1 million potential functional genomic elements

 
weng-zhiping-330.png
Zhiping Weng, PhD
 
moore-jill-330.png
Jill Moore, PhD’18

UMass Medical School researchers Zhiping Weng, PhD, and Jill Moore, PhD’18, and MD/PhD students Michael Purcaro and Henry Pratt are lead authors on the latest publication of data from the ambitious ENCODE project. Collaborating with other members of the ENCODE consortium, the UMMS team used computational biology to identify functional elements in the human genome. These elements act as switches, controlling when and where genes are turned on and how they are tuned. Results from their data analysis, published in the latest issue of Nature, identified 926,535 human candidate cis-regulatory elements (cCREs), which are regions of noncoding DNA that control neighboring genes. The full data set is now available to scientists in visual form at screen.encodeproject.org, a web tool also developed by the team.

“There are 3 billion base pairs in our genome and not every one of them has a known function,” said Dr. Weng, the Li Weibo Chair in Biomedical Research, professor of biochemistry & molecular pharmacology and director of the Program in Bioinformatics & Integrative Biology. “Identifying and annotating the specific regions of DNA that help control our genes is key to understanding the complexity of the genome and how it works.”

Only about 20,000 genes make up the protein coding portion of the human genome. Genes can be thought of as the primary workhorses of the genome, carrying instructions for making proteins, the large, complex molecules that do most of the work in cells and that are required for the body’s tissues and organs to do their respective jobs. Genes have been methodically studied down to the specific genetic code with which they encode their instructions. However, this leaves large swaths of DNA outside of these protein coding areas, many of which are known to affect health and promote disease.

“If our genome is like a car, then the protein coding part of the car is the engine,” said Weng. “It propels us forward. How we control and make use of that engine—accelerating, turning, braking—is controlled by other mechanisms. In the genome, one family of these mechanisms is the cis-regulatory elements that promote and enhance, turn on or off, and fine-tune our genes.”

Established in 2003, the ENCODE project—short for Encyclopedia of DNA Elements—is a global effort to understand how the human genome works. The goal is to develop an annotated encyclopedia of the functional elements—regions of DNA that code for molecular products or biochemical activities with roles in gene regulation—contained in the human genome. While much is known about protein coding genes, this only represents 2 percent of the entire genome. Far less is known about the other 98 percent of the genome, some of which helps control these genes. Working as an integral part of the ENCODE consortium during Phase III of the project, the UMMS team established a registry of a million candidate DNA “switches” from the human genome. This represents 7.8 percent of the genome that could potentially play an important role in how genes work.

The human body is made up of thousands of different cell types—liver cells, skin cells, neurons. Although all of these cells carry identical sets of DNA, these diverse cells carry out very different functions by using the information encoded in the genome differently. The DNA regions that turn genes on or off and tune the exact levels of activity are responsible for this diversity. They drive the formation of different cell types and control how they function in the body.

To find the different switches that lead to such a diverse array of cell types, the 500 plus scientists that make up ENCODE studied sets of biochemical features that are associated with the genetic switches that control genes. In total, researchers performed more than 6,000 biochemical experiments (4,834 involving human samples and 1,158 with mouse samples). They analyzed chromatin accessibility, histone modifications, DNA methylation, chromatin looping and a host of other assays, to pinpoint regions of the genome where chemical reactions associated with regulatory activity were occurring. Performed in more than 500 different cell types, these experiments yielded millions of locations in the human genome where these regulatory switches could potentially reside, from which the UMMS team established the Registry of cCREs.

The hope is that scientists will use these candidate areas to help establish potential links between regulatory switches and disease. For example, the ENCODE data could be employed to provide new insights into genome-wide association studies that connect areas outside of protein-coding genes that are associated with genetic diseases, explained Dr. Moore, a bioinformatician in the Weng Lab and project manager of the ENCODE Data Analysis Center.

Of the almost 1 million human cCREs identified, Weng and ENCODE collaborators tested 150 using functional assays to see if genetic changes in these areas might impact health. One area of interest, which resides near the neural gene AGAP1 and has been associated with schizophrenia, was shown to have regulatory activity in the brains of embryonic mice. Further functional testing can be performed on these elements to explore how and why they impact disease. Scientists can also use the candidate areas to compare against their genetic studies for health and disease. The Weng lab leads such effort in the PsychENCODE Consortium, a large-scale collaborative project like ENCODE that focuses on the role of regulatory elements in human brain development and psychiatric disorders.

To make use of all this data, Purcaro and Pratt developed online resources to share this information with members of the scientific community. SCREEN, short for Search Candidate cis-Regulatory Elements by ENCODE, allows scientists to visualize and interactively search the 926,535 human cCREs derived from the ENCODE data, along with ENCODE data and other rich annotations in more than one thousand biological samples.

“Over the last 10 years, genome-wide association studies into disease have identified many areas of potential interest outside of the protein coding genes,” said Weng. “This tool gives scientists a new and powerful way to explore if some of those disease-causing areas of the genome are in regulatory regions.”

The full ENCODE III findings are included in a collection of 14 papers in Nature, Nature Methods and Nature Communications, as well as a corresponding perspective piece in Nature by Weng and colleagues.