Each of us possesses our own unique genetic code, a fact that presents a monumental conundrum: How does that one singular sequence of DNA dictate the creation and function of our multitudinous and varied cells. Your skin cells, muscle cells and fat cells all share the same genetic information, but perform wildly different roles. What defines and determines those functions?
The answer, in a word, is the epigenome, a Greek-derived word that literally means “above the genome.” The epigenome consists of all of the chemical compounds that modify or mark the genome in a way that tells DNA what to do, where to do it and when.
The study of the epigenome is a relatively young endeavor, and much is not known. One of the tools of the epigenome is DNA methylation, a process in which a methyl group is added to cytosine DNA nucleotides, marking genes for repression, silencing repetitive elements and making genomic imprinting possible.
In normal mammalian development, DNA methylation dramatically changes as new cell lineages emerge. “This complex remodeling is evidently essential for development, as loss of the machinery that established DNA methylation results in embryonic lethality,” said Gary C. Hon, PhD, a postdoctoral fellow at the Ludwig San Diego, based at UC San Diego.
In a new paper published online Sunday in Nature Genetics, first author Hon, senior author Bing Ren, PhD, a Ludwig scientist and professor of cellular and molecular medicine at UC San Diego and colleagues probe deeper into the mysteries of epigenetics, reporting on how DNA methylation changes in different kinds of tissue.
“We created very high resolution maps of DNA methylation for 17 diverse tissues in an individual mouse,” said Hon. “Interestingly, we found that if you look at DNA methylation with a wide angle lens, you’ll find that it is generally constant between different tissues. But if you zoom in, there are a large number of short regions that show very tissue-specific DNA methylation, and the vast majority of these regions happened at the many regulatory elements encoded in the genome that control the genes specifically to a tissue.”
The epigenome reveals the current state of a cell and, in embryonic cells, portions of it can reflect the cell’s potential future developmental paths – what it will be when it grows up. Ren, Hon and colleagues discovered, to their surprise, that in adult tissues, some of these regions of tissue-specific DNA methylation involved regulatory elements that were no longer active, but had been during development.
“In this way, the epigenome of each adult tissue is imprinted with the regulatory memory of its past,” said Hon.
The findings are fundamental science. They “do not have immediate clinical relevance. They simply help understanding of development,” said Hon. But they may also auger greater import in the future, bolstering the recognized importance of DNA methylation and providing “an epigenetic signature that can be used to find regulatory elements active in development, but which are no longer active in adult tissues.”
Such a signature might be helpful to understanding the origins of diseases that occur early in developing life, a necessary step before science can take action to prevent them.
A scanning electron micrograph of a human blastocyst (5 days after fertilization of the egg), revealing the inner cell mass that will become the embryo. Image courtesy of Yorgos Nikas, Wellcome Images
Life. Bits. Self.
The development of human life is an indisputable marvel of choreographed complexity: A single fertilized egg divides and multiplies, the resulting cells differentiating into the roughly 300 cell types required to build a human being.
Among the great and enduring questions of developmental biology is how exactly embryogenesis occurs. What process or plan directs differentiating cells to do what they do, to choose their pathways to becoming neurons, fat cells, hair cells or various hormone secreting cells?
In a paper published today in Cell, a multi-institutional team of scientists, including Bing Ren, PhD, head of the Laboratory of Gene Regulation at the Ludwig Institute for Cancer Research at UC San Diego and professor in the UCSD School of Medicine’s Department of Cellular and Cellular Medicine, describe how genes are turned on and off to direct early human development – and report novel genetic mechanisms that play key roles not just in normal development but perhaps in diseases like cancer as well.
Using large-scale genomics technologies, the researchers focused on two key processes in unprecedented detail. The first involves the tacking of methyl molecules to cytosine, one of the four DNA bases that comprise the genetic code; the second involves chemical modifications to proteins called histones, which provide the scaffolding used by winding DNA in cell nuclei.
Histone modification, the researchers found, is more commonly used to regulate genes in early embryonic development, switching them on and off as needed. “DNA methylation” tends to be used in the later stages of development when cells are increasingly locked into specific fates and functions.
“You can sort of glean the logic of animal development in this difference,” said Ren in a news release issued by the Ludwig Institute. “Histone methylation is relatively easy to reverse. But reversing DNA methylation is a complex process, one that requires more resources and is much more likely to result in potentially deleterious mutations.
“So it makes sense that histone methylation is largely used to silence master genes that may be needed at multiple points during development, while DNA methylation is mostly used to switch off genes at later stages, when cells have already been tailored to specific functions, and those genes are less likely to be needed again.”
The scientists also noted two other significant findings:
- The human genome is pocked with more than 1,200 regions kept consistently free of DNA methylation throughout development. Many master regulator genes reside in these regions, dubbed “DNA methylation valleys.” Interestingly, these regions were found to be abnormally methylated in colon cancer tissues.
- The identification of more than 103,000 “enhancers” or sequences of DNA that can boost the expression and suppression of genes.
Ren said the work creates a new information resource for biomedical research, not just for better understanding of early human development, but also of the many diseases that trace their roots to our own.
Boosting the Powers of Genomic Science
With two new methods, UC San Diego scientists hope to improve genome-wide association studies
As scientists probe and parse the genetic bases of what makes a human a human (or one human different from another), and vigorously push for greater use of whole genome sequencing, they find themselves increasingly threatened by the unthinkable: Too much data to make full sense of.
In a pair of papers published in the April 25, 2013 issue of PLOS Genetics, two diverse teams of scientists, both headed by researchers at the University of California, San Diego School of Medicine, describe novel statistical models that more broadly and deeply identify associations between bits of sequenced DNA called single nucleotide polymorphisms or SNPs and say lead to a more complete and accurate understanding of the genetic underpinnings of many diseases and how best to treat them.
“It’s increasingly evident that highly heritable diseases and traits are influenced by a large number of genetic variants in different parts of the genome, each with small effects,” said Anders M. Dale, PhD, a professor in the departments of Radiology, Neurosciences and Psychiatry at the UC San Diego School of Medicine. “Unfortunately, it’s also increasingly evident that existing statistical methods, like genome-wide association studies (GWAS) that look for associations between SNPs and diseases, are severely underpowered and can’t adequately incorporate all of this new, exciting and exceedingly rich data.”
Dale cited, for example, a recent study published in Nature Genetics in which researchers used traditional GWAS to raise the number of SNPs associated with primary sclerosing cholangitis from four to 16. The scientists then applied the new statistical methods to identify 33 additional SNPs, more than tripling the number of genome locations associated with the life-threatening liver disease.
Generally speaking, the new methods boost researchers’ analytical powers by incorporating a priori or prior knowledge about the function of SNPs with their pleiotrophic relationships to multiple phenotypes. Pleiotrophy occurs when one gene influences multiple sets of observed traits or phenotypes.
Dale and colleagues believe the new methods could lead to a paradigm shift in CWAS analysis, with profound implications across a broad range of complex traits and disorders.
“There is ever-greater emphasis being placed on expensive whole genome sequencing efforts,” he said, “but as the science advances, the challenges become larger. The needle in the haystack of traditional GWAS involves searching through about one million SNPs. This will increase 10- to 100-fold, to about 3 billion positions. We think these new methodologies allow us to more completely exploit our resources, to extract the most information possible, which we think has important implications for gene discovery, drug development and more accurately assessing a person’s overall genetic risk of developing a certain disease.”
Three UC San Diego Scientists Garner ENCODE Grants
Recently, the ENCyclopedia Of DNA Elements, otherwise known as ENCODE, made national news with the single-day publication in multiple journals of dozens of related papers intended to more fully flesh out the functional components of the human genome.
The findings were a big step, but the blueprint of human biology remains incomplete. This week, the National Human Genome Research Institute, part of the National Institutes of Health, announced new grants worth $30.3 million this year alone to expand and deepen the effort.
Three scientists at UC San Diego were among the recipients. Bing Ren, PhD, head of the Laboratory of Gene Regulation at the Ludwig Institute for Cancer Research at UCSD, and colleagues have been awarded $11.4 million over four years (roughly $2.86 million per year) to continue their work developing a working catalog of the mouse genome.
“The goal is to enhance use of this model organism in studying a wide range of tissues not readily accessible in the human (genome), and to tap into the power of comparative genomic analysis to increase understanding of the function of the human genome,” said an NHGRI official.
Earlier this year, Ren and colleagues published a paper in Nature that described mapping for the first time a significant portion of the functional sequences of the mouse genome. Specifically, they looked at genome regions containing cis-regulatory elements, key stretches of DNA that appear to regulation the transcription of genes. Misregulation of genes can result in diseases like cancer.
In addition to Ren’s grant, Gene Yeo, PhD, assistant professor of cellular and molecular medicine, and Xiang-Dong Fu, PhD, professor of cellular and molecular medicine (both, along with Ren, are members of the Institute of Genomic Medicine) are part of a team headed by Brenton Graveley, PhD, of the University of Connecticut Health Center that was awarded a four-year, $9.3 million grant to analyze human RNA transcripts to identify protein-binding sites and investigate their function. Proteins that bind to RNA can directly regulate protein production from RNA molecules, as well as affect protein production by regulating degradation of RNA molecules. The project is ENCODE’s first production scale effort to map protein-binding sites in RNA.
Parsing a process of life
Transcription is the first step in gene expression, the process by which information contained in a gene is used to make functional products, such as proteins. It’s fundamental to life and, not surprisingly, extraordinarily complicated.
In the July 22, 2012 issue of Nature Structural & Molecular Biology, Dong Wang, PhD, assistant professor in the Skaggs School of Pharmacy and Pharmaceutical Science, and colleagues further elucidate how transcription is altered by some forms of cytosine.
Cytosine, of course, is one of the four main bases that comprise DNA and RNA (along with adenine, guanine and thymine; uracil replacing thymine in RNA). There are at least five forms of cytosine in human DNA. Wang and colleagues have discovered that two recently identified forms of cytosine, known as 5fC and 5caC, significantly reduce the transcription rate in vitro.
The finding, said Wang, suggests that some forms of cytosine (and perhaps other players yet-to-be-identified) may provide another layer of regulation and fine-tuning to the transcription process. By slowing the activity of RNA polymerase II, a major transcriptional enzyme, 5fC and 5caC may make it easier for other enzymes, proteins and factors to play their parts in the larger act of gene expression.
Photo: Structure of RNA Polymerase II, a key enzyme in mammalian cells that catalyzes the transcription of DNA into messenger RNA, the molecule that in turn dictates the order of amino acids in proteins. Courtesy of National Institute of General Medical Sciences.
Beyond Base-Pairs: Mapping the Functional Genome
Regulatory sequences of mouse genome sequenced for first time
Popularly dubbed “the book of life,” the human genome is extraordinarily difficult to read. But without full knowledge of its grammar and syntax, the genome’s 2.9 billion base-pairs of adenine and thymine, cytosine and guanine provide limited insights into humanity’s underlying genetics.
In a paper published in the July 1, 2012 issue of the journal Nature, researchers at the Ludwig Institute for Cancer Research and the University of California, San Diego School of Medicine open the book further, mapping for the first time a significant portion of the functional sequences of the mouse genome, the most widely used mammalian model organism in biomedical research.
“We’ve known the precise alphabet of the human genome for more than a decade, but not necessarily how those letters make meaningful words, paragraphs or life,” said Bing Ren, PhD, head of the Laboratory of Gene Regulation at the Ludwig Institute for Cancer Research at UC San Diego. “We know, for example, that only one to two percent of the functional genome codes for proteins, but that there are highly conserved regions in the genome outside of protein-coding that affect genes and disease development. It’s clear these regions do something or they would have changed or disappeared.”
Chief among those regions are cis-regulatory elements, key stretches of DNA that appear to regulate the transcription of genes. Misregulation of genes can result in diseases like cancer. Using high-throughput sequencing technologies, Ren and colleagues mapped nearly 300,000 mouse cis-regulatory elements in 19 different types of tissue and cell. The unprecedented work provided a functional annotation of nearly 11 percent of the mouse genome, and more than 70 percent of the conserved, non-coding sequences shared with other mammalian species, including humans.
As expected, the researchers identified different sequences that promote or start gene activity, enhance its activity and define where it occurs in the body during development. More surprising, said Ren, was that the structural organization of the cis-regulatory elements are grouped into discrete clusters corresponding to spatial domains. “It’s a case of form following function,” he said. “It makes sense.”
While the research is fundamentally revealing, Ren noted it is also just a beginning, a partial picture of the functional genome. Additional studies will be needed in other types of cells and at different stages of development.
“We’ve mapped and understand 11 percent of the genome,” said Ren. “There’s still a long way to march.”
“Birth of DNA (Epigenetics)” by Zdenko Herceg
Deciphering DNA’s hidden code
Reading the genetic “Book of Life” is not easy, an observation scientists learn all of the time. Consider the well-known nucleobases that comprise DNA. There are only four: adenine, thymine, guanine and cytosine (plus uracil, which is found in RNA). It turns out, however, that cytosine comes in two modified forms: 5-methylcytosine (5-mc) and 5-hydroxymethlcytosine (5-hmC). The versions look almost alike, but affect genes in very different ways.
In a paper published in the journal Cell today, researchers at the University of Chicago, the Ludwig Institute for Cancer Research at UC San Diego and Emory University describe a new technique for reading the particular differences in cytosine, an achievement that has ramifications for better understanding fundamental life processes.
These two modifications of cytosine “regulate gene expression that has broad impact on stem cell development, various human diseases such as cancer, and potentially neurodegenerative disease,” said Chuan He, a professor of chemistry at the University of Chicago. “They may even shape the development of the human brain.”
He, with Bing Ren, PhD, head of the Laboratory of Gene Regulation at the Ludwig Institute for Cancer Research at UC San Diego, and colleagues developed a method called TAB-Seq that directly measures 5-hmC and produced the first map of the entire genome of 5-hmC at single-base resolution. Ren applied TAB-Seq to human embryonic stem cells; Peng Jin of Emory applied the method to mouse embryonic stem cells.
The work is expected to have a significant impact upon the field of epigenetics, which looks at changes in gene expression caused by factors other than alterations in the actual DNA. 5-mC and 5-hmC appear to be major epigenetic players. 5-mC is generally found on genes that are turned off; it helps silence genes that aren’t supposed to be turned on. Conversely, 5-hmC appears to be abundant on active genes, especially in brain cells.
“This is a major breakthrough in that TAB-Seq allows precise mapping of all 5-hydroxymethylcytosine sites in a mammalian genome using well-established, next-generation DNA sequencing methods,” said Joseph Ecker, a professor at the Salk Institute for Biological Studies, who was not involved in the Cell study. “The study showed very clearly that deriving useful knowledge about this poorly understood epigenetic regulator requires determination of the exact locations of 5-hmC with base-level accuracy. I expect that their new method will immediately become widely adopted.”
While much has been written about The Human Genome Project over the years, the full story may just now be unfolding. Literally.
Inside each normal, nucleated cell, a lot of DNA is tightly packed. Uncoiled, it would stretch almost nine feet. Magnified 1,000 times to better see it, the length would be three kilometers – the equivalent distance of the Lincoln Memorial to the capital of Washington, D.C.
In a paper published today in the journal Nature, Bing Ren, PhD, head of the Laboratory of Gene Regulation at the Ludwig Institute for Cancer Research at UC San Diego, and colleagues describe for the first time how different parts of DNA are actually folded next to each other inside a cell’s nucleus.
Question: Why is it important to know where genes are positioned within the nucleus?
Answer: One thing that we know is there are between 20 and 30 thousand genes in the human genome, but in any one cell, only a subset of them are turned on. We know that one important factor in deciding which of the genes in the genome are turned on in any one cell is where they are positioned in the nucleus, and where they are positioned relative to other parts of the genome. For example, we know that there are certain regions of the genome that are called enhancers, and these act like switches that turn on genes. The trick is that the enhancer that is responsible for turning on a gene may not be located right next to the gene, and in fact may be some distance away, almost like a light switch that turns on the lights in another room.
So we know that at least one of the ways that these enhancers work is that they are brought in close physical proximity to the gene they regulate by bending the DNA and causing a large loop to form in the genome. This “looping” allows the enhancer to work in turning on its target gene. So it is this kind of physical association of different parts of the genome that can play a critical role in deciding which genes a cell turns on, which is what we were hoping to learn something about in our study.
Q: You call these identified regions “topological domains.” What do they look like? Does their structure explain how they work?
A: With regards to what the topological domains look like, we can’t say for certain, but it is something that we are interested in. What we know from what we have found is that the topological domains are regions of the genome that are tightly self-associated. It’s as if these domains are parts of our genome that are wound up like a ball of yarn, and that our genome is composed of many of these domains, over and over again, but that each of the domains appears to be relatively separate from each of the neighboring domains, like many balls of yarn linked together.
Q: What’s the significance of your finding that these domains are highly conserved and appear ancient in origin?
A: We think the finding that these domains are conserved in evolution is really interesting. In the case of humans and mice, these are organisms that are believed to be separated by 65 million years of evolution, yet we can see that in the parts of the human and mouse genomes that are analogous to each other, the structures can be remarkably similar.
In addition, there was another group that showed recently that the genome of fruit flies, which are even further away on the evolutionary tree, are arranged with a similar topological domain structure. Exactly what all this means isn’t entirely clear, but it suggests that this strategy of organizing genomes into topological domains was something that was hit on very early in animal evolution, and appears to be something that has been retained quite strongly. This suggests that this is an effective way for animals to organize their genomes. Perhaps by segregating our genome into these domains, it is easier to regulate the function of different regions of the genome. We are hoping that as we learn more about how these domains function, this may allow us to make better predictions about why they have been retained so well in evolution, and this may tell us something about how we and other organisms have evolved our genomes.
Q: How are these findings likely to be used by other researchers?
A: We hope our paper will be a good resource for other scientists studying genome function. For example, what we have done is to essentially create large-scale maps of how the genome is folding up in embryonic stem cells and differentiated cells. As I mentioned earlier, we know that the way that certain genes may be turned on are via these long range “loopings” between enhancers and distant genes.
We hope these maps of how the genome is folding up and interacting may give researchers clues about which regions of the genome may regulate which genes, and this is important for understanding how any particular gene is normally regulated, and how that regulation may be altered in disease.
A five-day-old human blastocyst.
Researchers at the Ludwig Institute for Cancer Research, the University of California, San Diego School of Medicine and the Toronto Western Research Institute peel away some of the enduring mystery of how zygotes or fertilized eggs determine which copies of parental genes will be used or ignored.
In developing humans and other mammals, not all genes are created equal – or equally used. The expression of certain genes, known as imprinted genes, is determined by just one copy of the parents’ genetic contribution. In humans, there are at least 80 known imprinted genes. If a copy of an imprinted gene fails to function correctly – or if both copies are expressed – the result can be a variety of heritable conditions, such as Prader-Willi and Angelman syndromes, or diseases like cancer.
In the Cell paper, a team of scientists, led by Bing Ren, PhD, head of the Laboratory of Gene Regulation at the Ludwig Institute for Cancer Research at UC San Diego, describe in greater detail how differential DNA methylation in the two parental genomes set the stage for selective expression of imprinted genes in the mouse. Differential DNA methylation is essential to normal development in humans and other higher organisms. It involves the addition of hydrocarbon compounds called methyls to cytosine, one of the four bases or building blocks of DNA. Such addition alters the expression of different genes, boosting or suppressing them to help direct embryonic growth and development.
The process is sometimes called epigenetic regulation. Epigenetics is the study of factors influencing inheritance beyond the genes themselves. “DNA is just half the story,” said Ren, who also heads the San Diego Epigenome Center, one of four centers established by the National Institutes of Health to focus on epigenetics research.
“Understanding how these limited imprinted regions control regulation can help us better understand how certain diseases happen,” said Ren, a professor of cellular and molecular medicine in the UC San Diego School of Medicine. “That can help us develop better diagnostic tools for detecting genetic abnormalities and perhaps learn how to predict whether something bad will happen.”
DNA mismatch repair (MMR) is the body’s system for recognizing and fixing mispaired bases (adenine with thymine, guanine with cytosine) that occur during genetic replication and recombination. It’s a vital process because it eliminates mutations that can result in defects and the development of different cancers.
It’s also a bit of a mystery, in part because no one has ever actually seen the system at work. Until now. In a paper published in the November 23 issue of the journal Cell, Richard Kolodner, PhD, a member of the Ludwig Institute for Cancer Research, professor of medicine and member of the Moores Cancer Center at UC San Diego School of Medicine and colleagues use fluorescent visualization techniques to show what’s happening in vivo for the first time.
The researchers studied live cells of Saccharomyces cerevisiae, or Baker’s yeast. “MMR in yeast and humans is exactly the same,” said Kolodner. “Yeast and humans use the same proteins and the repair process involves the same steps in each organism.”
They focused on the Msh2-Msh6 and Mlh1-Pms1 protein complexes, known to play a role in MMR. “We saw that the MMR protein that detects errors in the DNA linked to the proteins that replicate the DNA,” said Kolodner. “We also saw that when the first MMR protein encountered an error in the DNA, it assembled a second MMR protein onto the DNA to initiate the repair process.”
The discovery reveals for the first time a key mechanism used by MMR proteins to find individual damaged sites in DNA among the vast numbers of non-damaged DNA sites in a cell. Having a better understanding of how DNA is repaired could ultimately lead to future therapies to assist the process or treat existing cancers.
“Inherited defects in MMR genes cause one of the most common forms of inherited cancer susceptibility known,” said Kolodner. “In addition, a significant number of non-inherited cancers have MMR defects that play a role in their development. And sometimes in cancers that have become resistant to chemotherapy, the acquired resistance is due to selection for a MMR defect during treatment.”
Co-authors of the paper with Kolodner were Arshad Desai, Catherine E. Smith, Christopher S. Campbell and Hans Hombauer, all at the Ludwig Institute and UC San Diego School of Medicine.