DataTaunew | comments | leaders | submitlogin
How Airbnb uses machine learning to detect host preferences (airbnb.com)
15 points by bifrach 3298 days ago | 3 comments


1 point by kiyoto 3296 days ago | link

This blog article illustrates that there is more to data science than just machine learning. If you read the blog post, a bulk of the time was spent on two phases not enough data science literature covers:

1. Exploratory data analysis to formulate hypotheses: in AirBnB's case, it came from the observation that their product is a two-sided market, and if there is any pattern in how hosts accept guests. A great case of statistics meets empathy as well ("What would I prioritize if I were a host?")

2. Feature engineering: I bet they spent a LOT of time trying to smooth out the data to tame the noise that they describe. All of this is essentially data pre-processing involving SQL and various scripts.

3. Then, finally machine learning! Here, they ended up going with L2-regularized Logistic Regression, a tried and true method with a modern twist.

Thanks for a very illuminating write-up!

-----

1 point by bifrach 3296 days ago | link

Yes, we imported scikit-learn from the Hive UDF and ran it on the cluster.

-----

1 point by wei 3298 days ago | link

Did you guys write a Hive UDF in Python and in the UDF you imported the scikit-learn library and used its APIs? It would be cool if Hive UDFs in Python can use scikit-learn off-the-shelf.

-----




RSS | Announcements