Bad AI Comes From Bad Data

Bad AI Comes From Bad Data
This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.

For as long as there have been science fiction movies and television series there have been bad robots. Gort. Robby. Colossus. HAL 9000. Ash. Maximillian. Cylons. Terminators. IG-88. Decepticons Lore. Bishop. Ultron. To name a few.

These stories warn us of a future where intelligent machines have turned against their human creators. Their sentient sensibilities have developed a mind of their own. Hollywood has given us plenty of reasons to fear artificial intelligence. Bad AI usually means very bad things for humans.

But that’s science fiction and artificial intelligence is actually part of computer science. Though it’s understandable that the two often get confused. We’ve had robots and maniacal machines in fiction longer than we’ve had them in fact. Today, we continue to see the growing importance of artificial intelligence – but also the potential perils.

While there are many definitions of artificial intelligence, there are two that most would agree on: Artificial intelligence is the theory and development of computer systems able to perform tasks that normally require human intelligence. Artificial intelligence enables machines to process information and learn from data.

That last definition has a lot of relevance in a modern context. Forms of AI like machine learning have a massive dependency on existing data to inform their algorithms and recommendations. Get the wrong data or bad data and suddenly the AI goes haywire.

This got me to thinking. Perhaps all those science fiction interpretations of AI were on to something. What if the HAL 9000 and all those other machines-gone-bad were the result of bad data. A few too many algorithms under the influence of bad inputs. That would indeed explain a lot of things.

It would also suggest that the future of AI and the benefits from it are directly tied to the quality of your data. I know that talking about data hygiene is not the most exciting topic in the world. That’s why I’ve banned that moniker and have been referring to it as "data health for a while now.

Data health is good for you. Data health is good for your organization. Data health is how we prevent bad AI. Within Data health there are three key components: Data Quality. Data Completeness. Data Design. Attention to these three areas in the right proportion are necessary building blocks to the better use of data.

Data quality is all about the accuracy and correctness of data. In the nonprofit sector, things like address quality, data entry quality, deceased suppression, and other continuous cleaning of the data is very important. If you start out with bad data, then it only gets worse from there, never better. This is especially true when data is being fed into artificial intelligence capabilities.

Data completeness is the degree to which we have a total picture of donors, volunteers, activists, and other key constituent groups. This is a combination of self-reported information in addition to appended demographic data. Over time, the goal is to have a broader understanding of constituents interests, engagements, and preferences.

Together, data quality and completeness are about 90% of the data health challenge. And yet, many of the claims of data struggle are design related. The plethora of custom fields in a system that can’t be normalized or made meaningful. The skeletons in the data closet from decisions made in the past about how data collection and management have been handled poorly.

Fixing data design issues will take more time and toil, but addressing data quality and completeness can be started without delay. The sooner that data health is prioritized, valued, and championed then the greater chances for benefits to the organization. Not to mention the critical role this plays in the use of AI capabilities today and into the future.

Popular in the Community


What's Hot