Saving the World with Big Data and the Global Seed Vault

This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.

Last February, I was given the very rare opportunity to personally tour the global seed vault located in the Arctic Circle in Svalbard, Norway at the invitation of its creator, Dr. Cary Fowler. The seed vault houses over 850,000 plant seed varieties. This “doomsday” facility will ensure that no stored species are lost in the event of a regional or global crisis.

Speaking with Dr. Cary Fowler in the seed vault.
Speaking with Dr. Cary Fowler in the seed vault.
Lloyd Marino

We have a big problem. We’re fast approaching an era of too many people and not enough resources. There are seven billion people inhabiting the planet today (and estimates say that number could reach 11 billion by 2100). Feeding the existing population will require a 15 percent increased food production, but such an increase has become much more difficult in the face of the havoc created by global climate change, including hotter, drier seasons, and wild weather swings from drought to flood. Climate change is now such a grave threat to agriculture that experts estimate it will reduce food production by two percent every decade. Already our climate is now changing faster than most plants, especially staples such as corn and wheat, can adapt, further endangering our already limited resources. Consequently, we need to produce more food on less land, with less water and less energy.

The 1940s to the late 1960s, the Green Revolution spurred the growth of agricultural production worldwide, through advanced irrigation systems, pesticides, synthetic nitrogen fertilizer, and improved crop varieties. The Green Revolution saved millions of people; one study found that people in the developing world consumed 25% more calories after the Green Revolution. Today, we need another revolution—one that develops and strengthens a global, rational system of plant resource conservation. Along with water, plants may be our most important natural resource providing us with oxygen, medicine, clothing fiber and, importantly, food.

Imagine what would happen if certain vital crops disappeared? (Disease killed the most common type of banana in the middle of the 20th century and the replacement is endangered today.) Each extinct plant could deprive the world of a future medical cure.

Seed banks are one important solution. These are secure storage facilities that house varieties of seeds for plant researchers and breeders to fertilize new plants and grow improved versions with greater disease, climate, or insect resistance. Less frequently, farmers can request seeds to breed new crops or restore a lost old variety to life. “For plant scientists and farmers to breed the traits that can resist higher temperatures and long-term droughts into new plants, they need access to genetic diversity, which the banks provide,” says American agriculturalist Dr. Cary Fowler. Stored correctly, seeds can remain viable for decades or even centuries. In the unlikely event of disease or climate change destroying all the seeds of a given crop, the world could count on the collection at seed banks to provide the source material for the breeding of new varieties.

There are currently about 1,400 seed banks worldwide, but the most famous is the Svalbard Global Seed Vault, which is a global repository and backup for all other seed banks. At any one time, it contains the seeds of over 5,000 plant species – more than 850,000 individual samples. Almost every country in the world has deposited seeds.Svalbard is run by an agreement among the Norwegian Ministry of Agriculture and Food, the Nordic Genetic Resources Center, and the Global Crop Diversity Trust, an independent international organization whose sole mission is ensuring the conservation of global plant diversity.

Big Data to the Rescue

Although the Crop Trust believes that conserving crop diversity provides the foundation for food security and that seed banks provide a haven for genetic materials that can be retrieved 10, 20, 50 or more years in the future, they’re not without their problems. Budget cuts, equipment failure and normal human error have lost samples—estimates are at least one a day—imperiling the world’s seed population. The Crop Trust is extremely concerned that these different varieties are made available to the world’s scientists and breeders, and is even more concerned that these scientists and breeders have the best tools to use this diversity effectively to prepare agriculture for the immense challenges of climate change and population growth. This starts with information.

Big Data, which has revolutionized many aspects of our daily lives, from medicine to commerce, could prove to be a major help, especially in the area of genetic resource conservation and utilization. In an age when anybody can take a cheek swab to check his or her DNA, little is known about plant genetics. Indeed, locating the genes for favorable plant traits— such as resistance to drought, disease or insect— so they can be bred into new varieties once required scientists to laboriously sift through the billions of base pairs in a single genome. Even the traditional process of cross-pollinating plants with the most desirable traits, selecting the new hybrids with the most promise and then repeating the process is costly, labor intensive, and time consuming.

How can Big Data help? “Marker-assisted breeding” allows scientists to determine in a lab, within a matter of days, which progeny of the crossbreeding and which seeds contain the most desirable traits. Now, scientists can get an analytical readout from a machine, without waiting for the plants to grow, shortening development time.

Big Data also can be used to create an online searchable database of global seed collections so plant breeders and scientists can seek out and then develop desirable traits for future varieties. “The challenge to creating a searchable database is first identifying all the genetic variations that exist within a crop species worldwide by sequencing and comparing the genomes of hundreds of thousands of different varieties, both wild and domestic,” says Dr. Fowler. The next task is to figure out which genetic variations control or influence which traits. To accomplish this, scientists grow seedlings both in controlled laboratory conditions and in the field, using an automated process to photograph them on a regular basis. These images allow scientists to determine each plant’s observable traits, or phenotype.

Here’s where Big Data comes into play. The phenotype information, images representing roughly 100 terabytes of unstructured data, is integrated with the vast database of sequenced genetic information that has been compiled from all the existing varieties. Analytics programs sift through this massive data set to determine which minute differences among the billions of nucleotides in each genome are associated with which traits.

That’s not the end of the process, however, because all of the work done to this point has been limited to just one particular type of environment, such as a rain forest. And since plant genes interact with conditions in that rain forest — the same seed will produce a different plant in Louisiana than it will in Costa Rica — scientists have to determine and incorporate in their data how genomes perform across a wide range of environments. Only then can a breeder know which specific hybrid is best for a particular area.

Early forms of such a giant database already exist. GENESYS serves as a global portal to a third of the world’s data on plant genetic resources for food and agriculture. This enables scientists and breeders to look for individual seed varieties and locate them in a seed bank so they can request a sample. Similarly, Diversity Seek works to improve crop diversity and yield by building a unified information management platform for accessing data in seed banks and genebanks. This data will help plant builders and biological researchers better understand how specific genes influence plant performance..

Simply banking seeds in case of a future catastrophe is not enough to save diversity, however. Since most global plant diversity is not included in seed banks, we need ways to save plants in nature and in agriculture. Big Data solutions can help us track of wild crop species around the world, including new varieties, offering a complete, real time catalogue. Right now, many wild crop relatives—species that could even be resistant to diseases, pests and extreme weather changes – are fast becoming extinct. Just recently, a wheat rust mutation wiped out thousands of acres of crops across Africa. These epidemics might have been stopped if farmers were planting more diverse varieties of crops. With the advent of global warming, farmers need diversity more than ever because it’s still unclear what varieties can flourish in these increasingly harsh conditions. Big Data could also be useful in linking large-scale sequencing and phenotyping data, making plant breeding faster, more efficient and cheaper.

Agriculture also is taking advantage of data aggregation technology. Ironically, Monsanto, a company that has become the poster child for corporate greed and evil, has also completely revolutionized the agriculture business by using Big Data.

Monsanto has a Big Data software product, FieldScripts, that helps determine soil productivity and yield. Using data analytics, Monsanto-owned Climate Corporation can help protect farmers from drastic weather events with crop insurance and improve yield through data analytics. Though farmers say their field is as much art as science—knowing what and when to plant is second nature for farmers—they recognize the importance of science to cope with shifting weather patterns and climate change. Data scientists who study crop growth and weather patterns can advise farmers when and when not to plant crops, saving them millions of dollars.

Going Forward

How does all this play in the real world? A 2012 survey of soybean farmers showed a rapid payback to using these Big Data technologies, including a 15% savings on seed, fertilizer, and chemicals. In 2014, the New York Times profiled an Indiana farmer, who expanded his farm from 700 acres to more than 20,000 acres in part by using sensors and large-scale data analysis. All told, the farmer’s return on investment increased by 50 percent compared with conventional farming.

Big Data is the common thread running through all these efforts, emphasizing the need to make information of all types available to the largest number of people. As Bill Gates said in a speech before a recent G-8 Conference on Open Data for Agriculture, “To reap the benefits of Big Data, it’s important to ensure this is publicly available and shared with research and development partners. Only then will we be able to create a rich data ecosystem to support the knowledge-intensive and location-specific enterprise of agriculture.”

One possible mechanism to share data across a world of growers―from farmers with acres of land to apartment dwellers with a windowsill garden―is to create an open database in which anyone can contribute information about the success of their crops. The database could ask for information about the seeds planted, such as if it is a new varietal, soil acidity, and the weather and watering amounts. Data analytics could then map where each type of seed has flourished and suggest the best seeds for each user’s conditions.

Just as a handful of tiny seeds can grow to a forest of tall trees, little bits of information, when processed with the tools of Big Data, can become hugely successful strategies. At a time when plant species are dying out and too many people suffer from hunger and malnutrition, Big Data can help humans preserve global diversity, create new plants optimized to thrive in a warming world, and increase crop yields to feed more people. Just as seed banks help preserve existing seeds and plant varieties, Big Data can help create new ones.