Vitae fall/winter 2008, Vol. 31 No. 1 


Discoveries from DataDiscoveryData

UMass Medical School's bioinformatics expertsuse the tools and power of computingto interpret vast amounts of information for biomedical breakthroughs. 

By James R. Fessenden

In today’s research labs, biomedical scientists accumulate a suffocating amount of data. A single experiment can elicit thousands of molecular data points per day in the form of genome sequences, genes, proteins and nucleosomes (the “packing unit” containing the DNA double helix) and the interactions among them. Organizing, interpreting and making sense of the information are tasks that increasingly fall to bioinformaticians, such as Professor of Biochemistry & Molecular Pharmacology Zhiping Weng, PhD, director of the new program in Bioinformatics and Integrative Biology (BIB) at UMass Medical School.

Until recently, all this information had created a log jam for researchers—the computational tools and computing power needed to synthesize data had been missing. “It used to be that gathering data was the bottleneck,” said Dr. Weng. “You might only get one or two data points per day from an experiment. Now you can get tens of millions of data points per day, and researchers are limited only by their ability to process all that data.”

Weng uses cross-discipline expertise in computer science, applied mathematics, statistical modeling and biology to detect patterns in a genomic sequence, predict gene functions and expression, and develop models detailing how the cells, genes and proteins of a biological system interact. With bioinformatics, clinical applications come sharper into view.

“There’s no question that for us in the labs, this is a hugely valuable resource,” said Michael R. Green, MD, PhD, Howard Hughes Medical Institute Investigator and the Lambi and Sarah Adams Chair in Genetic Research at UMMS. Dr. Green was an early advocate for a program in bioinformatics at the Medical School. “As technologies have advanced and the amount of information available to bench scientists has increased, the ability to analyze large amounts of data has become increasingly critical to current research.”

The field of bioinformatics has its roots in DNA sequencing, the mechanism used to determine the order of the nucleotide bases adenine, guanine, cytosine and thymine that make up DNA. Though bioinformatics has been around by one name or another since the 1970s, it wasn’t until the late 1990s, with the Human Genome Project and other DNA sequencing projects, that the term began to populate scientific journals. The availability of sequenced genomes has been revolutionizing all aspects of basic and clinical research since.

To get a sense of the amount of information that these projects produced, you need only look at the size of a human genome. It has six billion nucleotides which, when each is represented as a letter, amount to slightly more information than would fit on a standard, 750MB compact disk.  The influx of data has been greatly expedited in the last several years due to low-cost, high through-put DNA sequencing technologies. For example, the Solexa sequencer currently available at UMMS generates 1.5 trillion bytes of raw data per run. These and upcoming sequencing technologies will allow routine re-sequencing of human genomes, making personalized medicine possible by allowing treatments to be tailored according to the genetic background of the patient.

DiscoveryData3

Michael Green, MD, PhD (above), was an early advocate for a program in bioinformatics at UMass Medical School. “Part of our goal is to educate students in the most exciting areas of biomedical science, and bioinformatics is an area where there is a lot of demand in industry.” On page 13, Zhiping Weng, PhD, director of the new program in Bioinformatics and Integrative Biology, is in her data element.

Yet, most bench scientists are not trained to develop or apply the algorithms and computational tools that are needed to analyze the data. At the same time, computer scientists can write the algorithms but don’t necessarily know how to approach problems like sequence alignment without understanding the basic biology, according to Jonathan Wren, associate editor of the journal Bioinformatics.

Bioinformatics scientists, versed in both biology and computational sciences, are the people who can deal with this staggering information overload.
As chair of the Department of Biochemistry & Molecular Pharmacology, C. Robert Matthews, PhD, joined Dr. Green in advocating for a bioinformatics program at UMMS. At that time, both scientists were looking to hire bioinformaticians for their labs to help them in their experimental research and recognized that to attract top talent in the field, the Medical School needed to build a strong core of faculty in the area.

“As a researcher, you want other like-minded researchers who you can bounce ideas off of and discuss problems with,” said Dr. Matthews, the Arthur F. and Helen P. Koskinas Professor. “Having a program in bioinformatics on campus means there is a group of people keeping abreast of the latest techniques in the field, as well as propelling the field forward.”

On the academic side, bioinformatics has become an increasingly hot area, said Green, who is a professor of molecular medicine, biochemistry & molecular pharmacology and surgery. While many of the pioneers in the field, such as Weng, were trained in other disciplines, tomorrow’s leaders in the field will be coming out of academic programs such as the one at UMMS.

“Part of our goal is to educate students in the most exciting areas of biomedical science, and bioinformatics is an area where there is a lot of interest and demand in industry,” said Green.

Weng has already hired one faculty member to join her in BIB, Assistant Professor Konstantin Zeldovich, PhD. She plans to hire four more faculty over the next three years.


Organizing, interpreting and making sense of vast amounts of biomedical data are tasks that increasingly fall to bioinformaticians at UMass Medical School. “Having a program in bioinformatics on campus means there is a group of people keeping abreast of the latest techniques in the field, as well as propelling the field forward.”
–C. Robert Matthews, PhD


Bioinformatics and the Lab
As the program grows, faculty members in the bioinformatics program will collaborate with experimental researchers to develop unique computational tools that address their specific research interests. What’s more, and key to the power of bioinformatics, they will also perform their own research into biological problems and develop a hypothesis about them.

“We do experiments and feed bioinformaticians data to test our hypothesis,” said Professor of Biochemistry & Molecular Pharmacology Phillip D. Zamore, PhD, Howard Hughes Medical Institute Investigator and the Gretchen Stone Cook Chair in Biomedical Sciences. “But it also works the other way. They look at data and form a hypothesis that might explain the patterns they see, which we can then test in the lab.” Weng started to collaborate with Dr. Zamore soon after she arrived at UMMS.

“Letting the data speak for itself is a very important role for bioinformatics,” said Weng. “Experimentalists tend to focus on their own data and have preexisting notions about the data. Researchers in bioinformatics look at the data generated by many groups without bias and oftentimes can discover novel biology 
as a result.”

For example, using data sets from dozens of sequencing experiments done in different laboratories, Weng and graduate student Yutao Fu performed an integrative analysis and found a pattern that had been overlooked by all the data generators. They discovered that a human protein named CTCF—which normally binds to insulator elements in the human genome and prevents enhancer elements from activating the wrong genes—are flanked by well-positioned nucleosomes genome-wide. This phenomenon had never been reported for other DNA-binding proteins. From this observation, she formed a hypothesis: CTCF plays a role in positioning the nucleosomes.

To test the hypothesis, Weng approached Craig L. Peterson, PhD, professor of molecular medicine and biochemistry & molecular pharmacology, who said that his lab could easily test the reverse of the hypothesis—in other words, whether the pattern of nucleosomes around CTCF-binding sites remains the same without CTCF. The lab reconstituted nucleosomes in the absence of CTCF and observed a different pattern, suggesting that CTCF positions the nucleosomes. The study has been recently published in the high-profile, open access journal PLoS Genetics.

This kind of collaborative effort is a hallmark of research done at UMMS and is one of the reasons Weng was attracted to the University. “Collaboration brings out the best in both labs,” she said.

DiscoveryData2

Zhiping Weng, PhD (left), was attracted to UMMS from Boston University because of its reputation for partnership among researchers. She says that such “collaboration brings out the best in labs” focusing on bioinformatics and basic biomedical science. C. Robert Matthews, PhD (center), chair of the Department of Biochemistry & Molecular Pharmacology, and bioinformatician and Assistant Professor Konstantin Zeldovich, PhD, join her in discussing data generated through cross-discipline expertise in computer science, mathematics, statistical modeling and biology.

Besides detecting patterns by integrative analysis, bioinformatics scientists also explore ways to predict how changes on the molecular level might affect an organism. For instance, Dr. Zeldovich is investigating how protein structures influence biological evolution and how genetic mutations or changes to a particular gene sequence can change the reproductive rates of a virus. “There are some very interesting and testable predictions that emerge from this line of thinking,” he said.

Using mathematical computations, Zeldovich, who has a background in polymer physics, can predict how a mutation might affect how a protein functions and thereby make a virus unstable. This could have significant implications for drug development.  “Some antiviral therapies work by increasing the mutation rate of a virus,” said Zeldovich. “Using algorithms we can determine what mutation rates might make the virus unstable. Drugs targeted to achieve this mutation rate might have a higher level of success in destabilizing the virus.”

Perhaps one of the biggest opportunities—and challenges—for bioinformatics is modeling, or understanding how a biological system works on a molecular level. “For 50 years, biology has been using deductive reasoning to figure out how organisms work. We have to put all those pieces back together to see how it all fits,” said Matthews. “Bioinformatics can help us put Humpty Dumpty back together again.”

“There’s no question that bioinformatics is very important clinically,” said Terence R. Flotte, MD, dean of the School of Medicine, provost and executive deputy chancellor and professor of pediatrics. “The more we understand about how these processes relate to each other and how they might relate to disease, the closer we’ll be to developing clinical answers.”

Researchers are desperate for clinical answers to AIDS. There is a vast, interconnected network of 200 human and viral proteins involved in the life cycle of HIV, which causes AIDS. Scientists working in a lab would have to conduct thousands of experiments to determine how those proteins interact and what combination of proteins might be the most susceptible to treatment. “Not only would this be a tedious project, but you would have a hard time knowing where to start,” said Matthews. “But with bioinformatics, it’s possible that combination of variables can be reduced to a more manageable amount—perhaps as few as a dozen.”

Indeed, understanding how biological systems, such as the HIV life cycle, work may be the ultimate frontier for the field of bioinformatics. “We’re long past the point of the Renaissance man—one who understands many different fields,” said Wren 
of Bioinformatics. “With so much information, our best bet for really understanding what is going on will be done at a computational level. Bioinformatics will be the way to figure out how to do that.”

Back to contents