March Madness: Predictive Analytics Style
Every year, during the second week of March, over 40 million Americans – including the President – fill out over 70 million brackets, which entails choosing a winner for each of the games to be played in the NCAA Mens’ Basketball tournament.
According to Nate Silver of FiveThirtyEight, the odds of predicting a perfect March Madness bracket this year are 1 in 1,610,543,269. In other words, it is more likely that any given person will get struck by lightning, score a hole-in-one, have the same birthday as his or her spouse, or win a gold medal at the Olympics than choose every March Madness game outcome correctly.
While I can’t speak for President Obama’s methodology this year, I can confirm that March Madness is incrementally more fun with data involved. Here at 6sense, we’re a bunch of data scientists, engineers, and predictive analytics advocates. So we decided to ask each other, and you, how can you beat a data scientist at March Madness?
In my journey for the answer, I uncovered a few overarching themes and analyzed key takeaways for data geeks – or data geek wannabes – everywhere. While this year’s March Madness is already underway, which is what has allowed for the following observations, store these tips for next year or whip them out and apply them to your marketing and sales strategy.
Leverage available data
Expert analyst Nate Silver has an impressive record, correctly predicting the tournament winners for two of the last three years as well as being extremely “well-calibrated, meaning the teams [he’s] listed as (for instance) 70 percent favorites have in fact won about 70 percent of the time.” When he begins, he collects data from the 5 best algorithm-based sources available to him, like ESPN’s Basketball Power Index for example, and the 2 best human rankings lists including preseason ranking from the Associated Press and the coaches poll.
There are various types of data available for analysis. Leveraging as many sources as possible increases the accuracy of predictions, as different algorithms may include different elements when assigning rankings.
After gathering data, Silver then lists, for each of the 68 teams in the tournament, the rankings from each of the 7 sources and adjusts them based upon external factors like player injuries and travel time to the game.
Incorporating external context to a data set can help take factors into consideration that might have either been overlooked by an algorithm or changed since the algorithm was created. Pairing a diverse pipeline of data with external analysis will bring you to the top of your bracket faster than you could even predict (get it?).
The final product of Nate Silver’s analysis is this interactive graphic that changes during the course of the tournament. As you see, the predictions become increasingly certain as each round passes. Correctly choosing outcomes with lower odds, like 1:64 in round one, is much more difficult than when the odds are higher, like in the Final Four.
While, for the usual sports fan, it’s not possible to change predictions midway through the tournament, this is allowed – and encouraged – in most other situations. In fact, this methodology is analogous to how 6sense software picks up buyer intent signals incrementally and re-scores leads and contacts based on the latest information.
Every day, there’s a possibility that more data will come out that will help you make a more accurate prediction, whether it’s the fact that a game was lost and one team was eliminated, making it more likely for another team to move forward, or information about actions a potential customer has taken that might have an impact on her measured intent.
Whether you’re predicting NCAA basketball, lightning strikes, or how likely buyers are to purchase your product, these strategies should be ever-present.
So, what do you think? Will your bracket beat our data scientists (and Nate Silver) this year?