DataTaunew | comments | leaders | submitlogin

fwiw I'd never ask a "Data Scientist" to do this in an interview. This is data engineering work. If you get asked to do this in an interview for a "Data Scientist" position, just run away. Also, the code presented is far from production grade. I'd recommend running pylint over this code and correcting the warnings before considering this ready for anything.
1 point by AI-Store 9 days ago | link | parent | on: A Marketplace for AI

You can contact me through the Indiegogo campaign's "Ask a question" option beside the campaign's founder picture.

I'm the author of the video series, and I'm happy to answer questions about it (or about pandas in general). Thanks!

Happy to take feedback if there is any.

test004

test003

test002

test

I'm here to hear your views on data science. Drop your questions/comments/feedback

Thanks for sharing this list!

great insights!

test002

test

test

That's a good point. You will end up with histograms, that have very similar mathematical properties.

One reason to go with a custom implementation is user expectation. You want your bins to start and end at human readable locations, so that the data can be interpreted more easily. Inserting log(x) into a regular histogram is not going to give you that:

    log_10(x) = 9.0 => x = 10 ** 9.0 = 1.00 e9
    log_10(x) = 9.1 => x = 10 ** 9.1 = 1.25 e9
    log_10(x) = 9.2 => x = 10 ** 9.2 = 1.58 e9
With log-linear histograms you get buckets at:

    1.0 e9
    1.1 e9
    1.2 e9
Disclaimer: I am working for Circonus.

Happy to discuss if there are any questions.

seems good. Any offers/good discount?

I hope more helpful comments will come soon.

thank you for the information!

I'm not familiar with root; is this what you are referring to? https://root.cern.ch/doc/master/classTH1.html

Taking the log of the data would add an extra computational step prior to recording the value I would think. Will need to think about that one a bit more.


Huh, I guess just taking the log of your data and then doing a normal linear histogram is not cool enough?

If you want true battle tested histogramming, go with root.


Gives me place whre to start. Ty

I came across this and it interests me. Thanks!

A simple tool for searching multiple sites from one place. 100% free. Let your inner Data Scientist come out and shine!

> really?.. but not GLMs or xgboost?..

Article mentioned decision tree. More specifically Boosting Trees.

If you can't add 1 + 1 together...

Also GLM is general linear model. It's just an abstract framework for regression. Are you saying article should learn more than just logit, ols regression? I too can throw out silly terminologies and pretend to be smart.

While I don't really fully agree with the article you're not helping.


This might just be a guide. Not really a should-learn thing. At least it gives an idea where to start.

Honestly, I am a newcomer but I read it anyway.

I'm reading it. It gave me some insights for me as a newbie.

Please give it a try and any feedback is welcomed!

Thank you


umm. nope. you don’t really need to know most of them “to be a data scientist”, and some that you do need to know are not there. come on… convo-nets? you HAVE to know them to work as a data scientist for an insurance company? really?.. but not GLMs or xgboost?..
More

RSS | Announcements