It's Not Just the Genes

It's been 10 years since an international consortium of scientists successfully completed the mapping of the human genome -- a world-changing project that couldn't have happened without public and private support. The feat neatly coincided with the 50th anniversary of the description of DNA's double helix by Nobel laureates James Watson and Francis Crick. These are incredible achievements, and today, we couldn't conceive of the future of medicine without them.

Equally unfathomable is a view of medicine that doesn't take into account the trove of clinically relevant information available for any individual person, and for all people more generally. DNA holds (among other things) your personal architectural blueprint, but unto itself, it is a fairly static factor, the genome in your normal, healthy cells changing very little over the course of your life. We know with certainty that DNA alone is not a categorical predictor of disease. The BRCA1 or BRCA2 gene mutation, for example, signals significantly higher risk for breast cancer in women, but people with the same defective gene often have remarkably different outcomes. Researchers and the public are asking: why is that?

Whether disease manifests, the age at which it manifests, and how severe it becomes, all depend on a multitude of other factors and the dynamic interplay between them: RNA, metabolites, proteins, healthy and diseased tissues, insulin and cholesterol levels, weight, age, gender, tobacco use, and toxic exposures -- to name just a few. To achieve a comprehensive understanding of disease so that we can better diagnose and treat it, researchers must examine a hierarchy of levels -- multiple scales -- of all observable characteristics. Each element -- from molecules, to cells, to tissues, to organs, to the person, and then to the community at large -- and the flow of information between these elements, is a biological data point at a particular point in time, and the observation of many millions of these elements in a given individual and population of time is where "big data" meets medicine.

In a relatively new field called multiscale biology, big data is helping us learn from a massive body of information, as well as the networks they form that define the complex biological processes at play within any living system, that we could not have understood if we stuck with looking at smaller collections of factors in isolation.

Throughout the history of medicine, we have attempted to explain type 2 diabetes and many other common human diseases using simple linearly ordered pathways, starting with a defective gene and ending with the disease. Or it begins with an environmental factor--obesity or tobacco use--and ends with disease. Today, instead of looking at common diseases as caused by a single gene and environmental factor, multiscale biologists consider a given person's internal and external networks of molecular and higher level traits, which increases the scales of data exponentially.

We collect data from de-identified information keyed into electronic medical records, publicly and privately available data warehouses in the United States and around the world, and from biobank repositories, some of which are patient identified. Over 25,000 patients have opted in to Mount Sinai's biobank, agreeing to make their health and disease information available to researchers, and -- uniquely -- to potentially be contacted by researchers for more insights or by clinicians to participate in clinical trials that may benefit their form of a given disease. Their participation is a real gift to modern medicine that will feedback to improve their overall wellbeing.

At Mount Sinai, using big data, we have widened our research lens from the more rare forms of disease -- such as early-onset Alzheimer's disease -- to the more common forms. Much research in Alzheimer's and other diseases has focused on rare cases because they are easier to identify and compare. Very recently, however, Multiscale biologists at Mount Sinai leveraged big data and supercomputing power to identify a network of genes involved in the inflammatory response in the brain that is a crucial mechanism driving the most common form of late-onset Alzheimer's disease. In a paper that appeared in the journal Cell [PDF], scientists analyzed the DNA and gene expression for 376 patients with late-stage Alzheimer's disease and modeled, mathematically, the biological network model defining this disease. This yielded many first-ever insights: a unified map that integrated the key genes involved in the disease and the biological pathways that those genes control. This new understanding of Alzheimer's disease will help identify new therapeutic targets.

Big data is also starting to guide clinical decision making, particularly in heart disease and cancer. As we embrace the new view that every cancer tumor is as unique as the person it presents in, there has been a corresponding shift in our approach to treatment. Rather than prescribe drugs that are generally optimal for a certain type of tumor, we have started exploring the delivery of therapeutics that are more precisely aligned to a particular individual's tumor. In a current research study, we are generating genomic data and performing patient-specific mutational analyses that are helping Mount Sinai researchers uncover what is happening in a given patient's tumor. With this information in hand, different personalized treatments are pursued, such as developing fly models to replicate that patient's tumor, and then screening such models against existing treatments to determine which drug may be most appropriate for that particular tumor. These studies are ongoing to demonstrate this type of approach can make a difference in patient outcomes, but it is the type of approach that ten years ago was unimaginable.

As we celebrate the 10th anniversary of the human genome, it is a good time to establish goals for the next decade. To accelerate the use of big data in the exam room, medicine needs people like Steve Jobs who can create intuitive interfaces linked to supercomputers that doctors can use, with ease, in the exam room. We also need a free-flowing exchange of information among researchers. Historically scientists have been proprietary about their work, but in isolation, it isn't nearly as powerful or insightful as it can be when it's mixed and analyzed with many other datasets.

Biobanks also need more information from people, both sick and healthy. In order to truly understand disease, we need to fundamentally understand how the body works. If you know how any system works -- an engine, a light bulb, a cell phone -- you can better understand how it breaks down and from there, how to fix it or prevent the damage in the first place.

Together with continued public and private support, we will advance our understanding the genome in the larger context of the human experience, and use that knowledge to predict, prevent, and more precisely treat disease.