I previously mentioned attending the European Molecular Biology Organization meeting on "Evolution in the Time of Genomics" this past May in Venice. There were several interesting talks about the functional and evolutionary roles of mobile DNA repeats. They provided good evidence that many of these repeats have helped establish genome transcription networks. One such presentation was that of Michele Caselle from the University of Turin, which appeared last month in the journal BMC Genomics.
Michele's paper is particularly relevant now because a number of commentators on my first ENCODE blog raised questions about what it means for repetitive DNA to "format" the genome as a sophisticated information storage and retrieval system.
The answer to this question lies partly in a fundamental change in our understanding of genome coding. Rather than having traits encoded by unitary "genes," we now realize that all cell and organism characters are expressed from networks of coding sequences whose expression is coordinated by shared transcriptional control signals. The control signals are one class of the generic formatting elements Sternberg and I discussed in our 2005 paper.
How do such genomic coding networks and their shared control signals evolve? It would take an exquisitely long time for this to happen by accumulating random single nucleotide substitutions one at a time in succession near each appropriate coding sequence, as conventional theory supposes. There is also the additional difficult question of how such changes could furnish a selective advantage until a sufficient number of control sites had already been established.
A far more efficient way to build up a coordinated expression network is to distribute pre-existing transcription signals within a short period to the dispersed coding regions in the genome. How could this happen? Well, it just turns out that mobile DNA repeats have exactly the right properties to distribute similar signals rapidly to multiple regions in the genome. Mobile elements provide an immeasurably more efficient network evolution process than random mutations and selection.
As Michele and his colleagues point out, the genomic dispersal of mobile DNA repeats occurs in bursts. We recognize the bursts by finding families of elements with related sequences but different levels of nucleotide substitutions. The oldest families have the highest levels of divergence and are widely shared by broadly related phylogenetic groups, while the youngest families have very similar sequences and are limited to one or a few more recent lineages.
In mammals, for example, many short reverse-transcribed dispersed repeats (SINEs), are specific for each family (rodents, carnivores, primates, etc.) while the MIR elements (Mammalian-wide Interspersed Repeats) are heterogeneous in sequence and shared by all mammalian groups, including marsupials. So we know the MIR dispersal burst occurred very early in mammalian history.
Could the MIR elements have played a role in early mammalian network evolution? Michele and his collaborators noted that tests for DNA bound by the Estrogen receptor alpha (ERalpha) transcription activator identified a large number of MIR elements:
"We conjecture that the special affinity of ERalpha for the MIR class of TEs [transposable elements] could be at the origin of the important role assumed by ERalpha in Mammalians."
Estrogen hormone regulation is an essential feature of development and reproduction in all mammals. Because mammalian development and reproductive physiology differ from those of other vertebrates, estrogen-response networks would need to be reorganized when mammals first diverged in evolution.
In addition to learning that genome networks, not single genes, encode phenotypic traits, we also learned in the 1980s and 1990s that transcription factor regulatory proteins work in combinations. This feature of transcriptional regulation made an important prediction for the MIR/ERalpha network hypothesis.
If the hypothesis is correct, MIRs that bind ERalpha should contain binding signals for other transcription factors known experimentally to interact with ERalpha. That is what Michele and his collaborators found:
"We then searched for TFBSs [transcription factor binding sites] within the TEs overlapping ChIP-seq peaks. We found a strong enrichment of a few precise combinations of TFBS. In several cases the corresponding Transcription Factors (TFs) were known cofactors of ERalpha, thus supporting the idea of a co-regulatory role of TFBS within the same TE. Moreover most of these correlations turned out to be strictly associated to specific classes of TEs thus suggesting the presence of a well-defined "transposon code" within the regulatory network."
This new paper is just one of many, published before and after my 2005 reviews with Sternberg, which indicate an evolutionary genome formatting role for dispersed DNA repeats ("Novel and mobile cis-regulatory sites and CRMs"). This formatting role was also the subject of my very first HuffPost blog, where I reported on more than 280,000 functional elements in the human genome derived from mobile elements.
So there has accumulated considerable evidence outside the ENCODE project that mobile DNA repeats are important both in the organization and in the rapid evolution of complex genomes. I suspect that many readers and commentators resistant to abandoning the conventional views of these repeats as "selfish" elements present only due to their capacity for replication are not familiar with this literature. They should look at it. The data will only grow with time as we come to appreciate ever more the sophisticated processes involved in genome regulation and genome evolution through natural genetic engineering.