The Blog

Further Thoughts on the ENCODE/Junk DNA Debates

The interpretation of the ENCODE data as invalidating the "junk DNA" hypothesis was not a "media failure." It was the explicit conclusion of the ENCODE project scientists published in.
This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.

Last week, there were two HuffPost blogs on the release of the ENCODE (Encyclopedia of DNA Elements) project data: mine and one by colleague Michael White. We took opposing positions on the question of whether the ENCODE results are compatible with the "junk DNA" explanation for so much mobile repetitive DNA in our genomes. Such disagreements are normal in evaluating the importance of scientific evidence.

By demonstrating the connection between replication and transposition of DNA transposons in 1979 (Shapiro 1979), I was in part responsible for the idea, supported by White, that a sufficient explanation for the presence of mobile repeats is their ability to replicate and amplify in the genome. Two 1980 Nature papers expressed this hypothesis (Doolittle and Sapienza 1980; Orgel and Crick 1980).

When I corrected a small misstatement in White's blog about one of the 1980 Nature papers, it elicited the following comment from diogeneslamp0:

White: "junk DNA...was not based on what scientists didn't know, but rather on what they did know"

Shapiro: "This is not exactly correct."

It is exactly correct. The idea that human DNA was largely limited in its functionality was based on applying simple principles of population genetics to what we knew about the genome. If every baby has 50-150 new mutations, it's impossible for all 50-150 mutations to be deleterious; that would cause the extinction of man.

Diogenes' comment is typical of those from many defenders of the "junk DNA" idea and reflects the kind of absolute thinking that has no place in real science. Nothing is ever "exactly correct." All measurements have uncertainties, and all explanations have a limited life span.

The scientific enterprise is inherently tentative. We cannot know how our understanding of phenomena will change with unexpected new results. Contrary to what White argued, the interpretation of the ENCODE data as invalidating the "junk DNA" hypothesis was not a "media failure." It was the explicit conclusion of the ENCODE project scientists published in Nature:

One of the more remarkable findings described in the consortium's "entrée" paper is that 80% of the genome contains elements linked to biochemical functions, dispatching the widely held view that the human genome is mostly "junk DNA."

My point in writing this rethinking of the ENCODE/Junk DNA debate is not to re-argue the substantive merits. Instead, my goal is to illustrate how unscientific are expressions of certainty and permanence in arguments about interpretations of natural phenomena, like the static views expressed by Diogenes.

I am fortunate to be able to do this examination of scientific reasoning from a perspective on molecular genetics that dates back to the early 1960s, before we knew about repetitive DNA. At that time, we were just beginning to assimilate the operon model (Jacob 1961) and learn about the regulation of genome expression, replication, transmission and repair. Thinking was still focused on the "one gene-one enzyme" (Beadle 1948) view of heredity. The idea of Barbara McClintock that mobile segments could serve as "controlling elements" (McClintock 1952) to regulate genome function was far outside accepted molecular biology discourse.

The (re)discovery of mobile and repetitive DNA in the late 1960s by the molecular work of Britten and Kohne (Britten 1968) and the genetic analysis of scientists working on viruses and transposable elements in bacteria, Drosophila and other experimental organisms (Bukhari 1977) came as rude shock to the gene-centered thinking of the times. Further shocks came from discoveries about exons and introns (Chambon 1981), distant enhancer elements (Tjian 1995), and epigenetic control by chromatin formatting (Holliday 1989).

The ENCODE scientists have learned that it is wise to avoid interpreting the data from a fixed view of genome organization. That is why they speak of "DNA Elements" rather than genes or any other artificial categories. They tend to restrict themselves wisely to operationally defined features, such as transcription start sites (TSSs) and splice sites at exon-intron boundaries.

Diogenes and like-minded people argue that we knew enough in the 1970s to understand the basic principles of genome organization. They do not accept that the flood of new information from genome sequencing and the kind of methodologies exemplified by the ENCODE project will fundamentally alter our genetic concepts. While they are certainly entitled to these opinions, I think we have to recognize that they are nothing more than that -- simply opinions that fly in the face of scientific history.

There are really no fixed notions in science (Kuhn 1962). Gravity is often cited as a case of something definite. While our experience of the force of gravity is continual, our explanations of how it works have changed fundamentally over time. Wendell Read has pointed out in several comments on my blogs that Newton had to invoke an ad hoc "action at a distance," while Einstein accounted for gravity as a curvature of four-dimensional space-time. Meanwhile, physicists are busy working on a quantum theory of gravity.

As I like to repeat, if Newton could not get it right, what hope is there for the rest of us? Vannevar Bush wrote that science is an "endless frontier." We never get final answers. But then, we never run out of fascinating questions to ask.


Beadle, G. W. (1948). "The genes of men and molds." Sci Am 179(3): 30-39.

Britten, R., Kohne, DE (1968). "Repeated sequences in DNA. Hundreds of thousands of copies of DNA sequences have been incorporated into the genomes of higher organisms." Science 161: 529-540.

Bukhari, A. I., J.A. Shapiro, and S. L. Adhya (Eds.) (1977). DNA insertion elements, plasmids and episomes Cold Spring Harbor, New York, Cold Spring Harbor Press.

Chambon, P. (1981). "Split genes." Sci Am 244(5): 60-71.

Doolittle, W. F. and C. Sapienza (1980). "Selfish genes, the phenotype paradigm and genome evolution." Nature 284(5757): 601-603.

Holliday, R. (1989). "A Different Kind of Inheritance." Scientific American 260(6): 60-73.

Jacob, F., Monod, J (1961). "Genetic regulatory mechanisms in the synthesis of proteins." J Mol Biol 3: 318- 356.

Kuhn, T. S. (1962). The Structure of Scientific Revolutions Chicago, Univ. of Chicago Press.

McClintock, B. (1952). "Controlling elements and the gene." Cold Spring Harb Symp Quant Biol 21: 197-216.

Orgel, L. E. and F. H. Crick (1980). "Selfish DNA: the ultimate parasite." Nature 284(5757): 604-607.

Shapiro, J. A. (1979). "Molecular model for the transposition and replication of bacteriophage Mu and other transposable elements." Proc Natl Acad Sci U S A 76(4): 1933-1937.

Tjian, R. (1995). "Molecular Machines that Control Genes." Scientific American 272(2): 54-61. 7817187.

Popular in the Community