The Blog

On the Genome, the Markets and Data

Compared to a generation ago, there is an oceanic amount of data around. Perhaps there's simply too much data; that we lack the deep understanding (not processing power: understanding) of what's important or not in that data to effectively master it.
This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.

The notion, reported by the New York Times on Thursday, that genomics is about to go mainstream with relatively cheap sequencing, has stirred up the latent enthusiasms for salvation-through-information that always lurks.

I don't doubt John Markoff's piece about Complete Genomics, a Silicon Valley startup that claims to be able to use semiconductor-manufacturing techniques to make sequencing affordable to consumers. One should be skeptical, however, about the larger claims that surround mainstream sequencing, which Markoff sums up quite neatly: "The promise is that low-cost sequencing will lead to a new era of personalized medicine, yielding new approaches for treating cancers and other serious diseases." Progress, he adds, toward those goals has been "glacial," but that's about to end because the same kind of processing power that brings us smartphones and PCs will now result in "new understanding and new medicines [that] will arrive at a quickening pace."

Markoff tops off his argument with a resounding quote not from a physician or a molecular biologist but from a representative of the digital world. "For all of human history, humans have not had the readout of the software that makes them alive," said Larry Smarr, director of the California Institute of Telecommunications and Information Technology, a research center that is jointly operated by the University of California, San Diego and the University of California, Irvine, who is a member of Complete Genomics scientific advisory board. "Once you make the transition from a data poor to a data rich environment, everything changes."

Larry Smarr? All right. What bugs me about this quote? First, Smarr has every reason in the world to hype Complete Genomics and the enterprise generally. Second, does everything have to change again? Third, consider that glib metaphor "the software that makes us alive." I assume he's speaking of the genome. Simplistic comparisons of the genome to software have been tossed around since the promotional efforts to get government money to sequence the bugger began; it's a particular weakness of journalists and flacks, of which I admit an occasional indulgence. But, of course, one of the things that folks discovered after the first genomes were sequenced -- the smart folks in molecular biology already knew -- is that the genome is not a readout or a set of instructions in the usual sense of the word. The genome does provide the code to assemble proteins (there's a lot of other stuff there too, including junk); and some maladies can be traced to flaws, deletions or problems in that code. But the genome is not determinative. It exists within a complex biological system, replete with pathways and feedback mechanisms of surpassing subtlety and intricacy. Simply knowing everyone's code is not a short corridor to that trendiest of healthcare trends, "personalized medicine," or necessarily "new medicines at a quickening pace." (Ezra Klein tackles the implications of sequencing on personalized medicine here, based on the assumption that the genome is ultimately predictive. Maybe yes, maybe no.)

For anyone who has been hanging around healthcare for a while, you've heard all this before, again and again and again, particularly about cancer. The revolution is at hand. Ignorance has been dismissed. The new age dawns. The magic locks have been sprung. What gives the cheap-sequencing story pop, of course, is that you can bring Silicon Valley, with all its successes, onto the stage. Those guys know what they're doing! (And the converse: The medical crowd needs help.) Biology, in short, can be reduced to data. "Everything changes." Again, I'm not going to say that advances won't be made with low-cost sequencing, though the people who do the research have had access to genomic sequencing for a decade now. More and more data will undoubtedly become available. And physicians may get powerful tools to untangle that data. And they may unravel the heretofore mysterious sources of disease, breakdown and death. Or, they may discover that more data simply produces more complexity, more ambiguity, deeper mysteries. And the grinding struggle against diseases like cancer and aging will continue.

Question: If going from a data poor to data rich environment makes all the difference, why can't we seem to have a clue when our financial system, say, is heading over the cliff? Well, perhaps we remain mired in a data poor environment? Compared to a generation ago, there is an oceanic amount of data around, and finance has better computers and more highly trained risk managers than just about any industry anywhere. But, you say, that information isn't shared, or transparent; regulators in particular lack the tools (or budgets or brains) to handle it. Perhaps. But perhaps there's a larger problem. Perhaps there's simply too much data; that we lack the deep understanding (not processing power: understanding) of what's important or not in that data to effectively master it; and, even worse, because humankind lurks at the center of most financial transactions, that yesterday's data might have odd effects on today's decisions, setting up feedback loops that drive us deep into the murky psychology of individuals where we can argue free will with, say, St. Augustine. Maybe data isn't everything.

In last week's Financial Times, Gillian Tett makes many of those points, touching on matters from Greece (why don't we have the data to know what would happen if Greece collapsed into disorderly default?) to the repo markets (the data now exists, but it takes a huge effort to master it). Tett is slightly more optimistic than I am on these questions. She believes that "recent innovations in IT in theory [aha, theory] makes it easier than ever to track complex global data flows." (Perhaps, but even if the theory works out, and remains valid, data itself remains a potent political volleyball: recall how something as relatively simple as setting LIBOR rates went off the track in the crisis -- and how long it's taken to fix it.) Tett argues that real progress requires political will and resources to build the systems and man the monitors. But then she gets gloomy. "Sadly, that remains patchy, at best. And don't expect the banks to take the lead."

What we know about data is that it spawns the demand for more data. Where's the line between data poor and data rich? We are not talking here about designing iPads or remarkably powerful chips. The real issue here is in understanding how complex human systems, psychological and biological, work. The notion that cheap sequencing will produce vast amounts of data that will reveal the secrets of biology is exactly like the belief that historical data in enough abundance will allow us to predict the economic future. Embedded in both of those beliefs is a fundamental misunderstanding of the phenomenon we're dealing with and more than a trace of old-fashioned reductionist determinism. Yes, vastly greater processing will produce greater understanding over time. But it will not be easy. And it will not bring on a new age.

Robert Teitelman is editor in chief of The Deal magazine.