DataTaunew | comments | leaders | submitlogin

1 point by qazwsx 9 days ago | link | parent | on: Python Lists in Depth

The most straightforward content on Python list I've come across

Looking for any feedback you might have on next steps, ways to improve this analysis. Thanks for reading!
1 point by monikagab 22 days ago | link | parent | on: A Scikit-learn pipeline in Wallaroo

A space widget is used to create blank spaces on the website for flexible positioning and distancing between items on the page. The space widget plays an important role allowing you to add space between the items.
1 point by jam 27 days ago | link | parent | on: Hackers Guide to Healthcare Data

Having worked with healthcare data for the past 15 years, this article is really not going to give you more than a superficial understanding of a few pieces of that world. In the first few paragraphs they state "There is also a complete lack of standard protocols for data exchanges between software systems." which is an extremely inaccurate statement. While not perfect or 100% agreed upon, there is HL7 and CCDA's that are commonly used across different systems and vendors. There is some good information in here, but I don't think the author fully understands this space and won't provide the reader with that much valuable information.

Jupyter Notebooks allow data scientists to quickly iterate as we explore data sets, try different models, visualize trends, and perform many other task. But how can they be integrated with production workflows? Tim Kopp, a data scientist from Pivotal, shows how.
1 point by spark 32 days ago | link | parent | on: 17 best python libraries !=
1 point by jai1294 37 days ago | link | parent | on: Know what OTP and 2FA is!


This post documents how I updated my code from scikit-learn 0.16 & Python 2.7 to scikit-learn 0.19.1 & Python 3.6.

If you just want a copy of the updated code, you can download the Jupyter notebooks from GitHub:


I just tried the link and it worked, so maybe just a slight hiccup?

Link is broken (or the site's down)

The Pivotal Data Science team has been working with Spark in a variety of environments for a variety of use cases for close to two years. Here's what we like and what's been difficult. This post gives a broad overview, and in subsequent posts we will dive deeper into the pros, cons, and how to make the most of Spark. We hope you enjoy!

Numpy: nope;

Pandas: ok, sure, but I wish there was a data.table for Python;

Matplotlib: meh. i mean, if you have to, but there is also much better stuff;

Scikit-Learn: sure;

Scipy: nope.

Nope. Legality might be an issue for this (i.e song lyrics), since I assume you want to use modern songs. You can find similar projects and code (mostly for rap for some reason);

fwiw I'd never ask a "Data Scientist" to do this in an interview. This is data engineering work. If you get asked to do this in an interview for a "Data Scientist" position, just run away. Also, the code presented is far from production grade. I'd recommend running pylint over this code and correcting the warnings before considering this ready for anything.
1 point by AI-Store 64 days ago | link | parent | on: A Marketplace for AI

You can contact me through the Indiegogo campaign's "Ask a question" option beside the campaign's founder picture.

I'm the author of the video series, and I'm happy to answer questions about it (or about pandas in general). Thanks!

Happy to take feedback if there is any.





I'm here to hear your views on data science. Drop your questions/comments/feedback

Thanks for sharing this list!

great insights!




That's a good point. You will end up with histograms, that have very similar mathematical properties.

One reason to go with a custom implementation is user expectation. You want your bins to start and end at human readable locations, so that the data can be interpreted more easily. Inserting log(x) into a regular histogram is not going to give you that:

    log_10(x) = 9.0 => x = 10 ** 9.0 = 1.00 e9
    log_10(x) = 9.1 => x = 10 ** 9.1 = 1.25 e9
    log_10(x) = 9.2 => x = 10 ** 9.2 = 1.58 e9
With log-linear histograms you get buckets at:

    1.0 e9
    1.1 e9
    1.2 e9
Disclaimer: I am working for Circonus.

RSS | Announcements