* 'Junk' DNA runs the genes, $196 million international study finds
* Project offers 'guidebook to the human genome'
* May transform personal DNA analysis, drug discovery
By Sharon Begley
NEW YORK, Sept 5 (Reuters) - In the largest single batch of discoveries about human DNA since the completion of the human genome project in 2003, 442 scientists in labs across three continents released 30 studies jam-packed with finds on Wednesday.
The discoveries, representing what the journal Nature calls the "guidebook to the human genome," range from the esoteric - what is a gene? - to the practical - that just 20 gene switches may underlie 17 seemingly unrelated cancers, giving companies a workable number of drug targets.
The studies come from a $196 million project called the Encyclopedia of DNA Elements, or ENCODE, whose goal is to take the babel produced by the human genome project - the sequence of 3.2 billion chemical "bases" or "letters" that constitute the human genome - and make sense of it.
"We understood the meaning of only a small percentage of the genome's letters," said Dr. Eric Green, director of the National Human Genome Research Institute, which paid for the bulk of the study.
A 'PARTS LIST' FOR HOMO SAPIENS
ENCODE was launched in 2003 to build a complete "parts list" for Homo sapiens by identifying and pinpointing the location of every stretch of the genome that does something - "a reference map of all the functional elements in the human genome," said geneticist Joseph Ecker of the Salk Institute for Biological Studies in La Jolla, Calif.
The best-known elements in the genome are the 21,000 or so genes that specify what proteins a cell makes. The dopamine receptor gene makes dopamine receptors in brain cells, for instance, and the insulin gene makes insulin in the pancreas.
Only about 1 percent of the genome codes for proteins, however, and the challenge has been to figure out the function of the other 99 percent, which for years was termed "junk DNA" because it did not code for proteins.
The ENCODE scientists are biology's version of the Occupy movement, said Mark Gerstein of Yale University in New Haven, Conn., who led one of the ENCODE teams: "For years everyone focused on the 1 percent. ENCODE looks at the 99 percent."
In examining the overlooked part of the genome, the ENCODE scientists discovered that about 80 percent of the DNA once dismissed as junk performs a biological function. Primarily, the not-so-junky DNA constitutes the most sophisticated control panel this side of NASA's, with some 4 million bits of DNA controlling all the rest.
"The 'junk' DNA, the 99 percent, is actually in charge of running the genes," said Gerstein.
This regulation can influence both normal genes and aberrant ones, affecting the likelihood of disease.
That's because "transcription factors" and other regulatory elements - proteins made by this controlling DNA - hopscotch across each cell's double helix, binding to it in a way that turns genes on and off or up and down like a toddler who has just discovered light switches and dimmer dials.
EFFECT ON PERSONAL GENOME SEQUENCING
The power of the gene-control elements may explain why simple personal DNA sequencing sometimes concludes that people are at risk for diseases they never get or misses the warning signs of those they do develop. If the switches quiet an unhealthy gene, "it might reduce levels of proteins that have some nasty effect," said Ecker. But if they mess up a normal gene, then someone can develop a DNA-based illness nevertheless.
How complicated is the genetic regulatory system? There are nearly 4 million gene switches in the major human organs, with about 200,000 acting in any given kind of cell, such as in heart muscle.
"Our genome is simply alive with switches: millions of places that determine whether a gene is switched on or off," said Ewan Birney of the European Molecular Biology Laboratory-European Bioinformatics Institute and one of the leaders of ENCODE.
The gene-control system means drug companies may have to look in new locations for influential genes. In one paper, scientists at the University of Washington in Seattle found that most of the DNA variants previously linked to 400-plus diseases lie in regulatory regions often far from the "disease gene."
"Disease-associated genetic changes are concentrated in the switches," UW's John Stamatoyannopoulos told the press briefing.
As a result, genome analyses that look for glitches only in "diabetes genes" or "cancer genes" or any other "disease genes" are likely to miss those that cause disease by changing when, where and how genes are turned on.
That discovery "will change how we understand the genetic basis for disease and open up new avenues for therapy," said Stamatoyannopoulos. For instance, in 17 very different cancers "just 20 regulatory factors turn up over and over," he said. That suggests that drugs targeting only those, and not the hundreds of targets drug companies are now pursuing, might treat many cancers.
ENCODE may also simplify the search for therapies for Crohn's disease. Genetic studies have identified about 100 places in the genome that affect a person's risk of developing the autoimmune disorder, way too many for drug companies to investigate, Birney said. But only a handful of transcription factors control those regions, presenting a workable number of targets by which to silence disease-causing DNA.
The UW group also found that some seemingly unrelated diseases share regulatory circuitry, including rheumatoid arthritis, type-1 diabetes, lupus and other autoimmune diseases. That suggests that one drug might be effective against all of them.
With ENCODE, said Stamatoyannopoulos, "we're exposing previously hidden connections between diseases."
ENCODE has also shown that a gene is not the simple stretch of DNA that makes a protein, as students are taught. Instead, the functional unit is an amalgam of sequences from both strands of the double helix, interleaved like two halves of a deck of cards in the hands of a Vegas dealer.
Nature is making all of the ENCODE research freely available, at www.nature.com/encode and through an iPad app.
(Reporting by Sharon Begley; Editing by Julie Steenhuysen and Douglas Royalty)
Correction: A previous version of this article used the term "dopamine" instead of "dopamine receptor." This error has been corrected.