Tuesday, December 2, 2008

Netflix Prize

From the New York Times Magazine comes this very interesting article about the "Netflix Prize", an open competition put on by Netflix. Its a lot like the "X Prize", in that the company has declared an open competition and is prepared to richly reward whoever accomplishes a set task. Except instead of building something boring like a spaceship, Netflix wants to slightly improve a series of mathematical algorithms, which is way more sexy and likely to get you laid.

The goal is to improve the accuracy of the Netflix recommendation software by 10%, and whoever can accomplish this will win the grand prize of $1 million. At first glance it doesn't seem that 10% should be so hard, but you have to realize that this is a 10% improvement from the point at which the original Netflix programmers stopped trying to refine their model. The marginal difficulty of even this small improvement is very high and despite the large prize and competitive teams working around the globe the closest anyone has gotten is about 9.4% improvement. Even this seems pretty good considering the daunting task of creating new algorithms which improve upon all the old ones already in use.

The biggest hurdle in getting to the goal appears to be coming from hard to rate independent films, which are generally very polarizing and harder to predict based on other movie preferences. Singled out as the biggest confounding DVD is the indie flick "Napoleon Dynamite", which by some estimates accounts for up to 15% of the uncertainty left in the models. "Dynamite" is a movie which most viewers either love or hate, and the overabundance of 1 and 5 star reviews, which apparently have little apparent relation to a rater's other movie preferences, wreaks havoc on the prediction algorithms. It would be easy if the world was easily split into "Dynamite" lovers and haters, but there is no apparent relationship between a Vote for Pedro and a rater's taste in other films.

Several other indie films which received wide release are also a problem for the programmers, which is probably fitting if you think about it. All these films thrive of being odd, unpredictable and "quirky" for quirky's sake, so its no surprise that they have to go and screw with perfectly good logical algorithms. On the other side of the coin, its not too surprising that studio films fit so nicely into such algorithms, since I'm convinced that most of them are conceived, written and green-lit by a computer.

No comments: