Decoding the Dark Matter of the Human Genome

Decoding the Dark Matter of the Human Genome
This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.

In 1994, researchers from Harvard and Stanford published a paper in which they described three mice: one was yellow and fat, one mottled and fat, and the last one was brown and lean. An ordinary image, except for one thing: despite being so different, all three mice were genetically identical.

If their genes were exactly the same, what was causing such striking differences in the mice?

At the time, Karissa Sanbonmatsu--staff scientist at Los Alamos National Laboratory--was working on plasma physics, and she had no idea that one day she would tap into this mystery. Even though she started from a completely different field, from the very beginning she was obsessed by one question: What distinguishes life from matter?

"In order to answer that question, the first place to look is the ribosome," Karissa explains. "It's the oldest molecule found in life."

And for a reason: all living cells are made of proteins, and ribosomes are the "factory" inside the cell where these proteins are made.

The breakthrough came in 2003, when the Q Machine, at the time the second fastest supercomputer in the world, was built at Los Alamos National Laboratory. Using the Q Machine, Karissa and colleagues were able to run the largest simulation ever performed until then in biology, allowing them to be the first team to publish an atomic structure of a ribosome in 2004.

This milestone set the foundation for a deeper understanding of the ribosome. Possible future applications, for example, include making new cancer therapies based on how ribosomes differentiate in healthy versus cancerous tissue.

In the meantime, a new, emerging field had been revolutionizing the way we think of genetics and inheritance: epigenetics. The three lab mice from 1994 were one example of how, by switching genes on and off, genetically identical individuals could have different observable characteristics ("phenotypes"). Epigenetics is the field that studies the mechanisms by which the environment can trigger these on/off gene patterns (called gene expression patterns), and how these modifications can be passed on to the next generation.

Both animal and human studies have shown that traits acquired by the parents, such as stress responses or the ability to store fat, can be passed on to their offspring. While DNA remains unaltered, what triggers these changes in phenotype is the activation or deactivation of genes--in other words, whether certain genes produce the proteins they code for.

But how are genes turned on or off? Specific factors regulate whether a gene is expressed (turned on) or silenced (turned off). These factors are recruited by RNA, the single-stranded molecule implicated in numerous cellular processes, from coding and decoding genes to protein synthesis.

When they were first discovered, RNA and DNA molecules that didn't code for proteins were dubbed the "dark matter" of the genome because their function was unknown. Today we know that these molecules can affect gene expression and even change traits by turning on or off certain genes.

That RNA had the power to turn genes off has been known since the early 2000s, when small RNAs were used to create mice whose cells had one particular gene silenced. Larger RNA molecules that don't code for any specific protein can also be found in different sizes inside the cell. Called long non-coding RNAs (lncRNA), they are present in great numbers in stem cells and embryos and are essential in many developmental processes.

"RNA could be the missing link in epigenetics," Karissa explains. "Ribosomes are made of RNA, and so, for me, the leap from ribosomes to lncRNAs was a natural one."

In order to understand how lncRNAs can turn genes on and off, scientists first need to unveil their molecular structure. Can lncRNAs assume different shapes, or 3D structures, and change function accordingly, or are they bidimensional molecules? Karissa and colleagues are determined to solve the puzzle. The same techniques used to resolve the ribosome structure in 2005 can be applied to lncRNAs, but because of their larger size, the team will need faster and better computational tools than the ones they used 10 years ago.

Luckily, next-generation supercomputing is underway at Los Alamos with the construction of Trinity, a machine fast enough to accommodate simulations of 3D atomic structures. This is where Karissa and colleagues are planning to run their lncRNA models.

Revealing the shape of lncRNAs would be a breakthrough. But for Karissa and her team, another even more ambitious project is on the way: "Thanks to the amazing resources offered by Trinity, we will be able to run the first atomistic simulation of human chromatin, the big 'yarn' of DNA and proteins that sits inside the cell nucleus."

This means simulating the 3D structure of three billion base pairs, plus all the proteins the DNA is wrapped around! All genes reside inside the chromatin, and this is where they are activated or deactivated. Therefore, solving the 3D structure of the chromatin will shed new light on the epigenetic mechanisms that regulate gene expression.

Many diseases are characterized by altered gene expression. For example, DNA-repairing genes are turned off in cancer cells, while genes that promote replication are over-expressed. Understanding the mechanisms that lead to these altered on/off patterns and how to reverse them can pave the way to new therapies and more efficient treatments--a bright future indeed for molecules once dismissed as the genome's dark matter.

Elena E. Giorgi is a computational biologist in the Theoretical Division (Theoretical Biology group) at the Los Alamos National Laboratory and the author of the science fiction thrillers Chimeras, Mosaics, and Gene Cards. This content was reviewed by the Los Alamos National Laboratory and approved for release under LA-UR 16-20749. For more information, please contact the Los Alamos National Laboratory Communication Office.

Neomorphic agouti mutations in obese yellow mice. Duhl DM, Vrieling H, Miller KA, Wolff GL, Barsh GS. Nat Genet. 1994 Sep;8(1):59-65. PMID: 7987393

Atomic model of the Thermus thermophilus 70S ribosome developed in silico. Tung CS, Sanbonmatsu KY. Biophys J. 2004 Oct;87(4):2714-22. PMID: 15454463.

Simulating movement of tRNA into the ribosome during decoding. Sanbonmatsu KY1, Joseph S, Tung CS. Proc Natl Acad Sci U S A. 2005 Nov 1;102(44):15854-9. Epub 2005 Oct 25. PMID: 16249344

Structural architecture of the human long non-coding RNA, steroid receptor RNA activator. Novikova IV1, Hennelly SP, Sanbonmatsu KY. Nucleic Acids Res. 2012 Jun;40(11):5034-51. doi: 10.1093/nar/gks071. Epub 2012 Feb 22. PMID: 22362738

Towards structural classification of long non-coding RNAs. Sanbonmatsu KY. Biochim Biophys Acta. 2016 Jan;1859(1):41-5. doi: 10.1016/j.bbagrm.2015.09.011. Epub 2015 Oct 30. PMID: 26537437

Before You Go

Popular in the Community