DataTaunew | comments | leaders | submitlogin

Cool, looking forward!

Hi one of the curators here. We started Deep Learning weekly in August of last year and have over 2000 subscribers currently. Apart from 'newsy' items, we keep an eye out for interesting open source projects and research papers. Any feedback is very much appreciated.
1 point by noah_ 7 days ago | link | parent | on: Datatau RSS feed not working?

I read the feed with Outlook 2013 and I haven't noticed any problems.

Hi DataTau! I recently downloaded California's SWITRS[0] database listing all the accidents reported to the police and have started combing through it. This post is my first look trying to answer the question: "When do people crash?"

It covers day of the year and day of the week, I plan to do time later (I'm especially interested in looking at time relative to sunrise/sunset, but that'll take a bit more work).

Happy to answer any questions!

[0]: http://iswitrs.chp.ca.gov/Reports/jsp/userLogin.jsp

1 point by datatej 7 days ago | link | parent | on: Online data analysis test

I've done a number of these data analysis 'tests' as part of interviews for data scientist / analyst roles. They typically vary -- usually (hopefully!) the data set & questions are highly specialized for the company + the role you are applying for. So as for ways to practice.... I'm not sure. If you're skilled in R / Python / Matlab / SQL or whatever the role requires... you should be fine.

General advice: 1. Be explicit: if you are making an assumption in your code, write why. Write out your logic & reasoning at every step. 2. Watch out for missing / 'bad' data. Write out how you would deal with messy data in real life. 3. Pay attention to the time limit. If you don't know, contact the recruiter to ask what's expected. 4. Interpret your results! If they ask you to, say, find the R^2 correlation between x and y, give them the number and then tell them what that might mean.

Don't stress. Hopefully the 'test' is fun and well thought out. I've always found them extremely helpful in understanding the problems that I would be solving if I were to end up at the company. On the other hand, I've heard that some companies don't put a lot of time into making these assignments relevant, so for me, that would be a warning sign.

Hope this helps!

1 point by BenoitParis 7 days ago | link | parent | on: Online data analysis test

I'd be very interested in your feedback, once you have completed the test.

For preparation, I'd go play with Kaggle if I were you; and read their forum.

----

What were the questions like?

Is is like Kaggle: they give you a csv, and you build a predictor? I guess they would expect some feature engineering from you as well.

Do they want a dissertation around how it could function inside their firm? How you would talk about it to the client? What are the maintenance hurdles?


I am doing this course at the moment as an introduction to both Data Science and Python. I had some basic Python knowledge before but nothing much but I am still able to do the course with some additional googling.

I recommend it highly, it's very engaging.


Hey guys! I just wrote an article on how to integrate RMarkdown with a static Pelican blog that I thought some of you might find interesting.

I've been using Pelican for a few years now to host my blog, but it's always been a bit of a pain converting any R analysis into a finalized blog post. One of my main goals for the year is building out my blog with more content, so I really wanted to streamline things, and I think this should go a long way.

I'm happy to answer any questions if anybody is considering doing something similar!


Exponentials going from nothing to human-parity over just a few years is instant by a lot of standards.

Garbage finding - there's obviously going to be latent variables related to background and socioeconomic status. This framework is much more likely to replicate some type of ugly profiling that's going on within the justice system rather than a determination of criminality.

C'mon DataTau, you're better than this.

1 point by mindcrime 17 days ago | link | parent | on: Datatau RSS feed not working?

Seems to be working OK for me with Digg Reader.
1 point by NotAGenius 18 days ago | link | parent | on: Datatau RSS feed not working?

I roll my own RSS reader, and the feed works fine for me.
2 points by rinze 18 days ago | link | parent | on: Datatau RSS feed not working?

It works fine with a self-hosted tt-rss.
1 point by hamedonline 18 days ago | link | parent | on: Datatau RSS feed not working?

I confirm that there is some kind of weird behavior regarding rss feed. Even when fetched by feedly, I don't get the titles marked as read (by pressing Mark as Read button) untill I click on each title separately. My rough guess here would be that Feed output is not tuned well according common standards.
3 points by nicolasparis 18 days ago | link | parent | on: Datatau RSS feed not working?

It still works fine with feedly

Also, here's the getting started blog post with an introduction to working with the available Dengue data: http://blog.drivendata.org/2016/12/23/dengue-benchmark/

Thanks for the note! I did mention colorblindness in the article. In case you missed it here is the quote:

"Colors can be hard to distinguish from one another. If you have colors of similar hue, it can be difficult to tell them apart. In addition, you might prevent those who are color blind from understanding what you are communicating."

There's a lot more to be said though, here's a helpful article on the subject: https://designshack.net/articles/accessibility/tips-for-desi...


Can't believe nobody mentioned colourblind readers. I'm not into the political correctness thing at all but if you want to have as broad an audience as possible, this is the way to go.

From "The Elements of Statistical Learning":

Our first edition was unfriendly to colorblind readers; in particular, we tended to favor red/green contrasts which are particularly troublesome. We have changed the color palette in this edition to a large extent, replacing the above with an orange/blue contrast


For this post, we have scraped various signals (e.g. technical maturity, popularity of the library, size of the community behind the library, social media mentions etc.) for more than 50 open source libraries from web.

We have fed all above signals to a trained Machine Learning algorithm to compute a score and rank the top libraries.

2 points by pyr0 26 days ago | link | parent | on: Introduction to Data Science in Python

I just finished this course. I would say that my skill level in Python would be average, but I was able to handle this course. I thoroughly enjoyed the programming assignments (the last one took me forever to figure out, but it was super rewarding when I finished).

Highly recommended.

1 point by axelr 26 days ago | link | parent | on: Monte Carlo explained in 3d

Hi, it uses WebGL shaders, so there can be some problems with support.

What is your OS/browser?

1 point by ricardodff 26 days ago | link | parent | on: Monte Carlo explained in 3d

I can't see the animations :(
1 point by axelr 27 days ago | link | parent | on: Monte Carlo explained in 3d

I have prepared another demonstration, this time about Monte Carlo generation with Markov chains. The most interesting part (for me) is Hamiltonian MC.
1 point by jahan 27 days ago | link | parent | on: Top 4 Recommender Systems Books

For this post, we have scraped various signals (e.g. online ratings/reviews, topics covered, author social influence in the field, year of publication, social media mentions, etc.) for more than 10 Recommender Systems books from web.

We have fed all above signals to a trained Machine Learning algorithm to compute a score and rank the top recommender systems books.

1 point by jahan 27 days ago | link | parent | on: Top Data Mining Books

For this post, we have scraped various signals (e.g. online ratings/reviews, topics covered, author social influence in the field, year of publication, social media mentions, etc.) for more than 10 Data Mining books from web.

We have fed all above signals to a trained Machine Learning algorithm to compute a score and rank the top books.


I went through both for my dissertation but settled on ESL. It depends on your needs, AISL is fairly mathematical while ESL focuses more on concept - it is better to develop your understanding of the algorithms.

I've read ESL but not AISL, but I've had several friends ask me about AISL. Would you say there is a lot of value in reading both?

I am a Junior Data Scientist based in London with a strong background in Engineering Sciences. I have been self-learning Machine Learning and related skills throughout the past year, leveraging my background in Maths and Engineering Sciences. Here, I share a list of machine learning books that are now on my bookshelf. Most of these books have a free version available on their website and can be ordered from Amazon. I have included links to relevant HN discussions, as it is how I found out about these books in most of the cases.

Doesn't look like it's reproducible. Code fails on line 24 and 29.
1 point by alexRutherford 31 days ago | link | parent | on: Azure Notebooks

The Azure products for data science are _not ready_ for serious usage. Very unstable, lots of fundamental features still to be figured out; such as how can you install a package and not have to install it again next time you log in. No roadmap for improvements.
More

RSS | Announcements