DataTaunew | comments | leaders | submitlogin
3 points by fhadley 3526 days ago | link | parent

So I've been lurking around here for a while-mostly checking in when r/MachineLearning is a bit dead or is having its biweekly computer vision party- but haven't felt the urge to comment on a topic until seeing this. This probably reveals a note of hypocrisy in my following comments, but I thought I'd be transparent.

felipeclopes was entirely right in noting that the real difference between DataTau and HN lies in the strength of the community, as well as in pointing out the positive effects on content quality that are a product of HN's community. izyda rightly took that observation a step further, pointing out the dearth of- excuse my brashness- but garbage articles. At the risk of beating a dead horse, I find it absolutely ridiculous that the front page currently has three articles to the same website from the same user that are either tabloid fodder, junk science, or the kind of big data hype one would expect from a more mainstream news outlet. There's even one on XLMINER (for those who've fought the good fight, incorporating excel documents into a pipeline by stitching together pandas+xlrd+lxml along with healthy amounts of bubble gum and chicken wire, you understand my sentiments). XLMINER.

Ranting aside, there are some excellent academic articles detailing new modeling methods. I'm equally appreciative of the less formal content focused on data science in practice (deeplearning4j and knitr being examples that come immediately to mind) rather than the bleeding edge of machine learning research.

Thus the question to me isn't "What should be improved?"- it's quite clear to me at least that ending the vicious cycle of low quality content and lack of participation is the answer here- but instead, "How do we improve it?" Obviously, that's a bit more complicated. I'm sure it'll prove to be a rather controversial suggestion that's certainly not a panacea, but I think that DataTau would benefit from using a penalty system (here's a good reddit discussion: http://www.reddit.com/r/programming/comments/1qwnuh/how_hack...) similar to the one employed by HN. Instead of relying on downvotes and time decay alone, junk articles (how to be a data scientist, big data hype, anything from datasciencecentral) could be automatically penalized at the time of submission, which will hopefully lead to their eventual disappearance from the site.

Just my .02, and sorry for the rant above.



2 points by joe 3524 days ago | link

Well said. Community obviously is a big part of it. Implementing a penalty rating system may help.

But the real issue is really the combination of a weak community and weak expectations.

On HN, the weight of the community likely outweighs any negative (not intellectually stimulating) submissions. Interesting articles are up voted. Link-bait is down voted (or removed). With fewer people here at DT, we can only give so much weight to the better articles.

The bigger issue though is the weak expectations. HN has outlined guidelines about what to submit (though it is fairly general). I haven't been at HN since the beginning, but I feel its fairly easy to get a sense of what is a good submission and what is not after following the site for a week. That is not true at all here. A new user (or even me, who has been following this site for awhile), has NO IDEA what a good submission is here. Thus, they may find anything even tangentially relevant to data science a submission. Then if it is bad, we lack the community power above.

How to solve these issues? Im not quite sure, but a couple wacky ideas:

1. Only 10-15 articles on the front page. This will give weight to the "better" articles.

2. Approval process for new user submissions; needs to be approved by 5-10 older users.

There is no glue in the community right now. It is too weak and disconnected, which results in a lack of signaling to new users. We need to change that.

-----




RSS | Announcements