Mapping Transcription Regulatory Circuits in the Nematode C. elegans
Overall goal
We use a variety of experimental and computational systems biology approaches to map and characterize gene regulatory networks and to understand how regulatory circuitry controls animal development, function, and homeostasis. Ultimately, we aim to understand how dysfunctional networks affect or cause diseases like diabetes, obesity and cancer.
Differential gene expression and gene regulatory networks
The human genome contains ~25,000 predicted protein-coding genes. Most of these genes are differentially expressed in space and/or time and in response to environmental or pathological cues. As a result, each cell/tissue/organ in the body expresses a different subset of the total gene collection. The first and one of the most important levels of gene regulation is transcriptional: transcription factors (TFs) bind to cis-regulatory DNA sequences and activate or repress gene expression. While the mechanics of transcription have been studied intensely for the past 20 years or so, little is known about where, when and how each of the 25,000 genes is regulated and by which of the ~1500 predicted human TF(s).
The presence of large numbers of TF-encoding genes in metazoan genomes, the multiple protein-DNA and protein-protein interactions TFs engage in, together with the concerted action of multiple TFs per gene, suggests that complex gene expression patterns are the result of intricate transcription regulatory networks in which many TFs are connected to their target genes and to each other. Such networks can be represented as graph models in which "nodes" correspond to proteins or genes, and "edges" (i.e. links between nodes) represent functional or physical interactions between those proteins/genes (see Figure 1). Our first goal is to identify transcription regulatory networks by identifying interactions between TFs and their target genes (protein-DNA). Longer term, we aim to integrate these networks with other types of interactions such as those between microRNAs and their targets (RNA-RNA interactions), between different TFs (protein-protein interactions), between TFs and cofactors (protein-protein interactions) and, between RNA binding proteins and their targets (protein-RNA). We use various network properties, network motifs and other topological measures to udnerstand how transcription regulatory networks behave and how they are similar to or different from other types of networks.
Figure 1. Integrated regulatory networks contain transcriptional interactions (protein-DNA, black lines); post-transcriptional microRNA interactions (RNA-RNA, red lines); post-transcriptional RNA binding protein interactions (protein-RNA, dotted black lines) and dimerizing interactions (protein-protein, blue lines). Adapted from Walhout, Genome Research 2006.
Gene-centered, or gene-to-protein, methods for the identification of TF-target gene interactions
We have developed high-throughput, gene-centered (gene-to-protein) methods that can be used to map physical interactions between regulatory genomic regions and transcription factors (TFs). Specifically, we have adapted the yeast one-hybrid (Y1H) system for use in high-throughput settings and with single copy, complex DNA sequences as “bait” (Deplancke et al.,Genome Research 2004; Vermeirssen et al., Nature Methods 2007). This provides a complementary alternative to more popular TF-centered (protein-to-gene) methods such as chromatin immunoprecipitation (ChIP). Although powerful, ChIP assays suffer from conceptual and technical limitations. For instance, they are only suitable for broadly and/or highly expressed TFs for which high-quality antibodies are available. In contrast, Y1H assays can retrieve rare TFs in an unbiased, condition-independent manner. Importantly, gene-centered methods such as the Y1H system can be used to generate what we refer to as “TF binding profiles” for loci of interest – something that cannot be done using TF-centered methods, unless they are performed for all TFs of an organism and under all relevant developmental and environmental conditions (Figure 2).
Figure 2. There are two conceptually different approaches to identify physical interactions between transcription factors (TFs) and their target genes.
C. elegans as a model system
We predominantly use C. elegans as a model system to study the networks that control differential gene expression at a systems level because:
The complete C. elegans genome sequence is available and is predicted to contain ~20,000 protein-encoding genes, which is approximately the same number as in humans! We have identified 940 predicted TFs among these protein-coding genes (Reece-Hoyes et al., Genome Biology 2005; Vermeirssen et al., Nature Methods 2007).
The C. elegans genome is only 100 Mb, 30 times smaller than the human genome. Since exons are approximately equal in size and number, this means that the regulatory genomic space is much smaller in worms. Thus, we have less potential regulatory sequence to interrogate.
C. elegans is a relatively simple animal. Its development occurs in a stringently programmed manner and the entire lineage of the 959 somatic cells in hermaphrodites has been described, which allows the unambiguous identification of temporal and spatial gene expression patterns.
The animal is transparent, which allows us to follow development, phenotypic aberrations and gene expression patterns in real time using light microscopy (See Figure 3).
C. elegans is a genetically tractable organism and many convenient genetic techniques have been developed that allow the molecular dissection of biological processes. These include the generation of transgenic animals for gene expression studies, and RNA mediated interference (RNAi) for the examination of loss-of-function phenotypes (see Figure).
C. elegans has proven to be instrumental in understanding human biology because many genes, pathways and biochemical processes are highly conserved. For example, studies of oncogenic Ras and apoptotic pathways have been pioneered in C. elegans.
What have we learned?
By using genes expressed in the digestive tract (Deplancke et al., Cell 2006) or neurons (Vermeirssen et al., Genome Research 2007), we have mapped initial tissue-relevant transcription regulatory networks that are enriched for TFs that are themselves expressed in the tissue of interest.
We identified “TF hubs”, or TFs that bind a disproportional large number of promoters. These TFs are frequently essential for the survival of the animal, indicating that their highly connected network phenotype is relevant in vivo (Deplancke et al., Cell 2006).
We have identified a set of novel putative TFs that do not possess a recognizable DNA binding domain, but that robustly interact with promoters (Deplancke et al., Cell 2006).
Figure 3.C. elegans: superworm!!
(Image by Christian Grove)
We have identified “TF modules”, TFs that share many of their target genes. This has helped us to connect network architecture to network functionality (Vermeirssen et al., Genome Research 2007).
We have mapped an integrated transcriptional and post-transcriptional microRNA network and found that this network contains a feedback network motif in which TFs that bind a microRNA promoter are themselves regulated by that same microRNA. In addition, we introduce a novel network parameter that we name “flux capacity” that captures the high information flow capacity that TFs and microRNAs that participate in these feedback motifs often possess (Martinez et al., Genes & Development 2008).
In collaboration with the Ambros lab, we have generated a resource of transgenic C. elegans that express the green fluorescent protein (GFP) under the control of a microRNA promoter (Martinez et al., Genome Research 2008). This resource can be used to annotate microRNA function and to follow up on hypotheses generated by (integrated) network studies. Using this resource, we found that microRNAs that belong to the same family are more likely co-expressed than microRNAs that belong to different families. In addition, we found that several microRNAs are subject to post-transcriptional regulatory mechanisms.
Click below to view our YouTube video made in conjunction with our recent publication in Cell.
Phone: 508-856-4364 E-mail: Marian.Walhout@umassmed.edu Keywords:
Organisms - C. elegans,
Systems Biology,
Gene Expression,
Protein-DNA recognition
This is an official Page/Publication of the University of Massachusetts Worcester Campus Program in Gene Function and Expression 364 Plantation Street Worcester, MA 01605