Airbnb's Open Source Approach to Machine Learning

Airbnb's Open Source Approach to Machine Learning
This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.

Machine learning (ML) is moving fast and it’s easy to be dazzled by its capabilities; however, very few people (relative to the vast population) understand how ML actually works at its core and are able to build working models. The question then becomes: how do we scale ML platforms and make them accessible to more executives and mid-size businesses that didn’t graduate with a PhD in machine learning?

This is a question that has inspired Airbnb’s Elena Grewal since she joined the peer-to-peer online startup and was promoted to Data Science Lead. With an initial background in education (she got her PhD in Education from Stanford), it seems almost a given that she’d be drawn towards expanding knowledge access for all. Grewal also takes a good-sense business perspective, which calls for increasing output and competition in the industry. In her own words, “We’re constantly thinking about how do we build out infrastructure that enables every single team to use machine learning and build smarter products.”

Trends in open-source systems have made accessibility to ML a no-brainer, and release of software from the likes of Facebook, Google, IBM and many other big- and mid-sized players have created a much more open playing field for developers. Open-source machine learning systems are based on application program interfaces (APIs), a set of routines, protocols and tools for building software applications. An API specifies how software components should interact and are used when programming graphical user interface (GUI) components.

In its own open-source effort, Airbnb has made available its own open-source proprietary technology - Aerosolve - that enables anyone with enough interest to use and build products through the ML platform. Grewal says that her team is always thinking, “How can we do this in a unique and novel way so that there aren’t necessarily special teams that are working on machine learning, but every team is working with machine learning…all one needs then is an understanding of a problem to be solved, a proper way to frame that problem, and the open-sourced technology to solve it.”

“Aerosolve: Machine Learning for Humans” was first published in June 2015 by Hector Yee and Bar Ifrach, a software engineer and data scientist for Airbnb respectively. Airbnb’s objective in designing the platform was to create a predictive pricing model for hosts, and part of that aim was building a ML model that was easy to interpret.

As illustrated in the graph below, Aerosolve works more or less as follows: humans can form and encode a hypothesis about a correlation in the data before looking at the actual data (red slope); the black slope is the belief of the model after learning from both the human hypothesis and billions of real data points. The model is the reconciler, correcting any misinformed assumptions against market data, while continuing to allow input of human beliefs about one or another variable, which feeds into an iterative cycle. The resulting pricing model has hundreds of thousands of parameters that are tuned over time.


Airbnb also provides a demos page that includes a time-lapse GIF of a teaching algorithm learning to paint in pointillism style.


Since its initial launch, Aerosolve has been used by Airbnb for a variety of other algorithms - from automatically generating maps of local neighborhoods, to image analysis (professional photographers, for example, prefer brighter and lighter house images, while guests tend to prefer warmer, cozier photographs), to assessing demand based on a wide variety of factors (including special events, seasonal trends, number of reviews, etc.). While originally created for Airbnb’s teams, the technology is now available to anyone, and Grewal comments that we should stay tuned for more accessible, open-source technologies from the Airbnb data science team.

We might call the proliferation of an open-source mindset and culture an effort in the democratization of machine learning. True, some companies don’t have enough data or have corrupt or inaccurate data sets that need to be cleaned, but that’s not so much an an issue of lack of access as the need to track and collect data consistently and uniformly.

According to Merriam-Webster, the second of two given definitions applies in this case: “to make it possible for all people to understand (something).” Access to machine learning platforms like Aerosolve, machine learning toolkits (such as the immensely popular scikit-learn for Python), and machine learning MOOCs through sites like Udacity, Coursera and others, are beginning to help even the playing field by making the information and tools accessible to anyone who wants to understand how ML systems work and use them to build smart products and data analysis tools.

Open-source has its limits, however, and some companies want more robust platforms. There’s a need for companies that create and sell these next-level services, with Microsoft’s Cortana Intelligence Suite, IBM’s Watson, and Algorithmia being a few key players in the industry. Yet open-source machine learning platforms are too often underrated for their potential capabilities. C-level executives, marketing and creative teams, and developers not looking to ‘reinvent the wheel’ are likely to find a viable and affordable ML solution amongst the growing number of ML platforms available on the web.

Go To Homepage

Popular in the Community