Wikipedia Pages Predict Movie Success, Hungarian Scientists Claim

Movie execs, take note. Next time you find yourselves wringing your hands about how a big-budget movie will fare at the box office, just check Wikipedia. Researchers might just have found a way to use the super-popular online encyclopedia to predict whether a film will go big or bust.

A trio of Hungarian social scientists say they’ve built a model that should be able to predict a movie’s success up to a month in advance using only publicly accessible data from and The researchers used Wikipedia and BoxOfficeMojo data from 312 already-released films to reverse-engineer their algorithm.

The Hungarian researchers claims that, with algorithm in hand, they need only five publicly accessible points of data to make a pretty good financial hit-or-miss prediction for any film. From they need the number of theaters the movie will be released in, an obvious factor in calculating a movie’s overall box office revenue. From the film’s Wikipedia page, they need four pieces of data provided by the Wikimedia foundation:

1) The number of users who edited the page pre-release.

2) The number of edits made to the page.

3) The number of page views recorded on the page.

4) The “collaborative rigor” of the page, i.e. the number of edits made to the page when multiple subsequent edits by the same contributor are counted as a single edit.

The results, published on Nov. 5, show the team’s model is pretty successful at predicting cinematic hits. Where the team’s model fails, though, is at predicting the relative success of non-hit movies; the Hungarian model couldn’t tell the difference between moderately successful films and complete flops.

There's an interesting factor related to this. One feature of the box office data is that it is bimodal -- it has two peaks. So lots of movies are successful and lots of movies are moderately successful. In between there is a trough. None of the Wikipedia activity measures have this kind of twin-peaked behaviour. So it's not really a surprise that it correlates with only some of the box office data.

You can check out the correlation in the team's graph of Wikipedia-enabled box office predictions versus actual results below:

wikipedia box office movie graph

Similar predictive success with films has been found elsewhere on the Internet. In 2010, researchers at HP’s Social Computing Lab published a study claiming that Twitter activity at the eve of a movie’s release could foreshadow the film’s subsequent box office failure or success.

It's all part of the so-called "big data" movement in tech, the trendy term used to refer to “the internet-fueled explosion of enormous data sets that can be analyzed for trends and correlations.”

Despite the promise of the Wikipedia model, however, many questions remain. The Hungarian predictive algorithm thus far has only been tested on movies released before 2011; the researchers have yet to show that they can make accurate predictions for films that have not yet succeeded or failed. And, ominously for the Hungarian team, the very similar Twitter study they say inspired them was debunked in March 2012 by scientists at Princeton.

Still, if the team’s model can truly predict movie hits and misses up to a month in advance, it could prove a potential goldmine for the film industry, which currently makes pre-release decisions on predictions largely unquantified and rarely exact.



Top 17 Websites Of 2012