DataHack 2015: 65 coders, 14 teams and a lot of data
Last weekend, 6sense hosted DataHack 2015, our inaugural hackathon. The 6sense tech team invited 65 data scientists and developers for a 24-hour coding and modeling session that was mostly powered by pizza and Red Bull. We use big data and machine learning in our predictive intelligence platform and we wanted DataHack to be a micro-version of the work we do every day.
There were 14 teams, and each was tasked with picking a public data set, selecting a provided data modeling/analysis tool and mining the data for insights. They were tasked to develop machine-learning scenarios, code and data visualizations. And had 24 hours to do it.
The hackathon ended at 8pm on Saturday, and the 14 teams were ready to present their work to an esteemed panel of 6sense judges. Here are the winners of DataHack 2015:
1st Place – FoodFate
Using Yelp’s dataset of 1.6M reviews, 500k tips by 366k users for 61k businesses, team FoodFate examined the restaurant scene in Phoenix, Arizona. Their goal was to correlate characteristics data, such as location, cuisine, hours, ambience, noise level and Wi-Fi accessibility to the score of the restaurant.
Why they won:
- They used multiple data sets
- They cleaned and normalized the data
- They built predictive models
- And they used those models in production
All in just 24 hours.
2nd Place – 5Sense
5Sense (no relation to 6sense) used Reddit’s comment data set for their project. 5Sense aimed to analyze the most commonly used words in Reddit comments and map how trending phrases correlated with current events at the time of the post. Out of the two most commonly used words on Reddit comments, we can only share one: “love.”
Why they won:
- They had a strong presentation
- They developed a good toolset for their project
- They dug deeply into their dataset
3rd Place – Celebrity Facial Recognition
Third place went to a one-man team. Paul used Google’s image search to build a machine-learning model that could run facial recognition for celebrities like Katy Perry and Justin Bieber. The operative idea being that the model only worked well if someone had thousands of images of their face available online. The model then attempted to predict if a photo presented to it was a celebrity or not.
Why he won:
- He used image data
- He developed a neural-network learning model
- He built a future-oriented model that improved with time and inputs
Judges’ Favorite – Fran
The judge’s favorite, Fran, was another lone rider, Rainier. Rainier parsed through 18,000 of his own text messages to find the associations between his different contacts based on mentions. He then created a data visualization to show his network.
Why he won:
- He factored in account data privacy
- He built a great visualization of the data
- He ran a sentiment analysis API
We Loved It
The 6sense dev and data-science teams were on hand late into the night and early the next morning to mentor attendees, set up extra routers and give quick tutorials on data analytics tools. Some intrepid souls even stayed overnight! It was wonderful to meet so many members and aspiring members of the big data community, and we can’t wait to do it again. Gear up for DataHack 2016!