fwiw I'd never ask a "Data Scientist" to do this in an interview. This is data engineering work. If you get asked to do this in an interview for a "Data Scientist" position, just run away. Also, the code presented is far from production grade. I'd recommend running pylint over this code and correcting the warnings before considering this ready for anything.
That's a good point. You will end up with histograms, that have very similar mathematical properties.
One reason to go with a custom implementation is user expectation.
You want your bins to start and end at human readable locations, so that the data can be interpreted more easily.
Inserting log(x) into a regular histogram is not going to give you that:
Article mentioned decision tree. More specifically Boosting Trees.
If you can't add 1 + 1 together...
Also GLM is general linear model. It's just an abstract framework for regression. Are you saying article should learn more than just logit, ols regression? I too can throw out silly terminologies and pretend to be smart.
While I don't really fully agree with the article you're not helping.
umm. nope. you don’t really need to know most of them “to be a data scientist”, and some that you do need to know are not there. come on… convo-nets? you HAVE to know them to work as a data scientist for an insurance company? really?.. but not GLMs or xgboost?..