DataTaunew | comments | leaders | submitlogin
1 point by kiyoto 3306 days ago | link | parent

This blog article illustrates that there is more to data science than just machine learning. If you read the blog post, a bulk of the time was spent on two phases not enough data science literature covers:

1. Exploratory data analysis to formulate hypotheses: in AirBnB's case, it came from the observation that their product is a two-sided market, and if there is any pattern in how hosts accept guests. A great case of statistics meets empathy as well ("What would I prioritize if I were a host?")

2. Feature engineering: I bet they spent a LOT of time trying to smooth out the data to tame the noise that they describe. All of this is essentially data pre-processing involving SQL and various scripts.

3. Then, finally machine learning! Here, they ended up going with L2-regularized Logistic Regression, a tried and true method with a modern twist.

Thanks for a very illuminating write-up!




RSS | Announcements