DataTaunew | comments | leaders | submitlogin
2 points by nofreehunch 3336 days ago | link | parent

I think it depends a lot on if you want insights or the best possible model. If you want insights, go for a simple model that allows for full inspection. If you want the best performance, create a most complicated model. But use the simple model to explain what the complicated model is doing.

Was talking to a data scientist who had the problem of identifying good customers for a loan. Good customers gave them 50$ lifetime value. Misclassifying a bad customer as a good customer costs them 200$. No matter how smart or simple their models, it _has_ to beat those odds to make any sense to start using it. That's also a case where a single % increase in accuracy directly translates into profits.

Getting the data from data warehouses, cleaning it and shaping it into a dataset is more data engineering than data science for me. One team could present on the challenges of getting the data, another on the challenges of creating the best model for it.




RSS | Announcements