DataTaunew | comments | leaders | submitlogin


Recently found Egil Martinsson's WTTE-RNN models (, and tried to simplify all of his great work into a bare-bones example implemented in Keras. Hope somebody finds this helpful. It appears to be a very promising model for making predictions in a time-to-event (engineering failures, customer churn) context.

Thanks for sharing..

nice Blog

Here a tutorial for decision tree learning in general, German!

Not all correlations approach 1.0 e.g. KL distance

A practical introductory Notebook I just shared, with various examples of Dataframe manipulation and plotting using Seaborn. Feedback are welcome.

Whats up with the correlations of 1.0? That shouldn't happen

And if they are not correlations, what are they?

It seems like a lousy job. When you put up code samples make sure that code is working. I found at least 2 instances where the variable was missing by just reading the article. Please make sure when you share code, that it works. Also, a shared iPython notebook on github/juypter network would do better. Thanks for the article.
2 points by splike 7 days ago | link | parent | on: Introduction to Anomaly Detection

> Looks like our anomaly detector is doing a decent job.

I don't agree. Calling ~5% of your dataset anomalies purely by construction always seemed like a crappy way of doing things to me. Look At those stars towards the end of the curve, they don't look like anomalies to me.

Even more interesting from an exploratory approach is the tool miller (mlr,

No offense but this review seems extremely biased. You didn't have any criticism at all?

> useful language designed by mathematicians for mathematicians.


I'm not entirely sure if author knows what he's talking about or knows statistic.

It also seems like another of those click bait articles.

One hallmark strategy is to create a list article, top ten blah, 8 free blah, etc...

Link appears correct, but the title on datatau is misleading. If you read the conference (?) description, it says something like "Andrew Seies, of Plotly, is starting things off by talking about visualizations in Python".

This website is so awesome.More article update please.

Is this the correct link?

Couldn't you have computed this directly with matrix multiplication?
1 point by Trombone5 11 days ago | link | parent | on: What is a Data Scientist?

Statistics is not part of science to the exact same extent that maths is not part of science: by a technicality at most. You also have a very narrow view of what a statistician is if you think they don't spend most of their time applying methods on real data.

I would say that being a "data scientist" is quite different from being a "scientist", with "data science" being more a kind of engineering than a science. For example, the goals of the engineer and the data scientist coincide in the sense that both seek to build products for business purposes. Further, their methods are of a similar kind as in their work towards the product they use statistical models created in house or the industry at large that exist in conceptual frameworks from academia.

4 points by probinso 12 days ago | link | parent | on: What is a Data Scientist?

A data scientist is a person who attempts to define 'Data Science' in a way such that people will pay them for it.

The title of this article gave me brain damage.
1 point by nlp123 16 days ago | link | parent | on: Clustering Similar Stories Using LDA

LSH with cosine similarity as distance metrics??
1 point by larrydag 16 days ago | link | parent | on: What is a Data Scientist?

My take... Historically there are specialists for data such as Business Intelligence, Network Architecture, ETL, etc. They know how to move, manage and scale data.

Then there are the analysts that use the data such as Statisticians, Engineers, Physicists, etc. This is more the applied side of data.

The advent of computing power and speed of networks has allowed data to be more accessible. Now there is a need to merge these two specialities together. The Data Scientist knows enough of ETL and knows enough of applied math and statistics. This may be more of a generalist role.

1 point by diwaiyer 17 days ago | link | parent | on: What is a Data Scientist?

I don't necessarily disagree with you. But there is a distinction between what you call 'regular scientist' and 'data scientists' right? Pretty much all 'regular scientists' have PhDs in their discipline and are working on cutting edge things in their field of expertise. The term 'Data Scientist' is used much more loosely. Besides, do a lot of those regular scientists call themselves scientists? The term 'scientist' has an air to it. To some, 'data scientist' sounds more presumptuous that 'machine learning engineer' or 'senior modeler' or 'statistician'

I can see why the term 'statistician' doesn't seem to cover everything in a data scientist's skill set. Perhaps we need an alternative moniker.

1 point by anarchochossid 17 days ago | link | parent | on: What is a Data Scientist?

> On the term “data scientist”, Silver once said that a “data scientist is a sexed up term for a statistician. Statistics is a branch of science. Data scientist is slightly redundant in some way and people shouldn’t berate the term statistician”

Nate Silver is wrong. Statistics is not a “branch of science”, and being a data scientist, in practice, is not quite the same as being a statistician. Data scientists use statistics as a tool (just like regular scientists, such as physicists, do), but they are not statisticians. Being a data scientist is not that different from being a regular scientist, but, instead of studying some physical phenomena for fundamental purposes (to learn more about the Universe or our bodies), a data scientist studies some very specific set of phenomena relevant to his industry and builds predictive models that have business value. Statistics is but one (actually small) part of.

3 points by gwern 21 days ago | link | parent | on: Deep Learning for Chess

"I’m encouraged by this. I think it’s really cool that

    It’s possible to learn an evaluation function directly from raw data, with no preprocessing
    A fairly slow evaluation function (several orders of magnitude slower) can still play well if it’s more accurate
I’m pretty curious to see if this could fare well for Go or other games where AI’s still don’t play well"

I think AlphaGo has kinda demonstrated that the answer is... yes?

What are your thoughts about the most important trends for this year?

Was this written by a markov chain? This article provides absolutely no information about anything at all.

Edit: And the article header was stolen from elsewhere and hue-shifted:

This is garbage!

1 point by aikramer2 23 days ago | link | parent | on: Introduction to Correlation

This is a really good breakdown, bookmarking this!

Great product...but has nothing to do with this website...

Great product!

RSS | Announcements