How I hacked Hacker News: An analysis of 1.3 million HN stories (
10 points by mcrowe 2179 days ago | 10 comments

4 points by thisaintnogame 2178 days ago | link

Cool post. I really like your analysis of being discovered. I feel like a lot of these posts just plot the average votes per hour and then call it a day. It's nice to see someone actually look at the effects of competition.

At the risk of being too-self promotional, I wrote a paper about popularity on Reddit and Hacker News that you might be interested in:

Not sure how the community feels about this, so please let me know if I'm out of line and I'm happy to take it down.


1 point by mcrowe 2177 days ago | link

Thanks for posting that article. Interesting to look at article quality!


2 points by mcrowe 2179 days ago | link

I looked at the data from 1.3 million Hacker News stories and found that when a story gets submitted makes a big difference (up to 172%). This article shows the analysis and results.

I used the official Hacker News API to get the stories using Python, and used R and ggplot2 to do the exploratory data analysis and plots.


1 point by alexleavitt 2178 days ago | link

Why did you not put all these facets of the analysis into a regression to see how they relate to each other?


1 point by mcrowe 2178 days ago | link

I suppose because I only saw this as a single feature (time). I could have done a regression on day-of-week, and hour-of-day, though. It would be interesting to look at more features (title length, etc.) and do a regression.


1 point by Nadav 2179 days ago | link

You might want to add to the caveats a note about correlation-causation, and tone down the conclusions a bit...


2 points by debrouwere 2178 days ago | link

For correlation to not imply causation, there have to be confounding factors. It's usually fairly easy to come up with a couple for almost every observational study, but for the life of me I cannot think of a confounder that might be present in this study. The author suggests that, who knows, perhaps stories submitted on weekends are simply better than those submitted during the week, but you have to admit even that is a fairly implausible candidate.


1 point by kiyoto 2178 days ago | link

Yeah, as a marketer myself, I like the post in spirit. But also as a recovering statistics person, I must point out that the post needs a lot of qualifications.

For example, quoting the OP:

>So is it better to submit on the weekend? I defined a story as “discovered” if it received more than 10 votes (that’s enough to get on front page, after that it’s up to the readers),

This reasoning needs further examinations. If the total activity on HN is low, I assume that it requires fewer upvotes to get to the front page. Is it really 10 upvotes to be on the front page for both weekends and weekdays?

This matters since if weekends require fewer votes on average to be "discovered", then the OP's "discover rate" argument is not particularly informative: because HN has lower activity levels on weekends but you have the threshold fixed, the "discovery rate" is likely to go up.

As for the "10am PST to submit" argument, here is one potential confounding factor: intentional article placements by startups and marketers.

HN is still one of the most effective places to get developer content seeded, or so it is believed among startup/developer-focused product marketers. Also, it is considered to be a good practice to place content around 10am PST because it covers both West Coast and East Coast (right at the beginning of the day/right after lunch respectively).

Combining all these facts, it's possible that for the submissions made around 10am PST, the submitters and their accomplices are working very, very hard to promote the submission ("hey, can you upvote my link on HN. thx").

Again, all of these are speculations, and I certainly do not want to sound like a statistical curmudgeon. It was an interesting read. Huge props to the OP =)


1 point by mcrowe 2178 days ago | link

Thanks for the comments and for reading, Kiyoto!

I agree that "more than 10 votes" is more of a line in the sand then a hard science. It was more important to me to have a very clear and consistent success metric, than to come up with the "best" success metric. My statement that "that's enough to get on the front page" was very hand-wavy, and might have been better left out.

Maybe you're right that the "pros" tend to post their content on weekday mornings, hence the higher discovery rate then. Interesting that the weekends would still show more success, though...

In any case, it's clear that I need to make my caveat about correlation-vs-causation more prominent.



1 point by mcrowe 2179 days ago | link

Hi Nadav. Thanks for the feedback. I wrote about that as caveat #1. Am I missing something?


