Could the Birth of a Word Prevent the Death of a Language?

Click here to read an original op-ed from the TED speaker who inspired this post and watch the TEDTalk below.

When I first saw Deb Roy's TEDTalk, "The Birth of a Word," I was struck by the power of data-rich research to reveal important information, opening up the door for machines to learn more effectively. However, I was also taken aback by something else -- how large amounts of data such as this could help human beings learn more effectively. In fact, it reminded me that enough data can even keep an endangered language from falling into extinction.

To quote from the National Geographic Enduring Voices Project, "Every 14 days a language dies. By 2100, more than half of the more than 7,000 languages spoken on Earth -- many of them not yet recorded -- may disappear, taking with them a wealth of knowledge about history, culture, the natural environment, and the human brain."

Could big data prevent humanity from losing two languages each month? In my new book, Found in Translation, I wrote about the amazing story of the Wampanoag, a Native American group that was dead for 150 years. A member of the Wampanoag tribe, Jessie Little Doe Baird, decided to revitalize her ancestral language. The language is now spoken by children today, and while still endangered, has literally come back from the dead.

There was one critical ingredient for the success of revitalizing this language -- the existence of data in that language. In the case of the Wampanoag language, this data came in the form of numerous historical documents written in Wampanoag -- land deeds, legal contracts, and religious texts -- along with English versions. The terminology and grammatical data embedded in these translated documents enabled Ms. Baird and her community to restore the language and use it in everyday life. Now, just imagine what they could have done with 90,000 hours of video.

Obviously, the members of the Wôpanâak Language Reclamation Project were critical to making this transformation happen. However, without the data itself, no matter how strong the community's motivation, the language revival would not have been possible. In other words, with a large enough dataset, all a language really needs is a community of motivated users in order to stay alive -- or even to come back from the dead.

The loss of a language typically signifies the loss of human knowledge. The knowledge within a given linguistic or culture group is far more important than the language itself, but this too is typically lost when a language dies. That kind of knowledge can be recovered if enough data is collected. This is why efforts such as those of K. David Harrison and the Living Tongues Institute are so important -- not because of inherent value in the languages themselves, but because of the value of the people who speak those languages and the knowledge their communities have cultivated over many years.

Ironically, it might be through big data that small languages find the greatest ability to fight back. More and more big technology companies are putting their power behind endangered languages. Microsoft's Local Language Program has a long history of empowering less common languages -- the company made Windows available in Cherokee this past December. Google unveiled the Endangered Languages Project last June. But these languages will need much more investment and support -- and data -- before the knowledge they contain will ever be secure.

As we consider the broad-reaching implications of Roy's research, let's remember that big data projects not only enable machines to communicate in human-like ways. They can enable human beings to continue to use human languages -- and to help human knowledge continue to evolve.

Ideas are not set in stone. When exposed to thoughtful people, they morph and adapt into their most potent form. TEDWeekends will highlight some of today's most intriguing ideas and allow them to develop in real time through your voice! Tweet #TEDWeekends to share your perspective or email to learn about future weekend's ideas to contribute as a writer.