DataTaunew | comments | leaders | submitlogin
1 point by jcbozonier 3644 days ago | link | parent

I got by on just coding scripts to process files for quite a while. If you're careful, you can put off learning this stuff until you have quite a bit (read: terabytes) of data. If you're sloppy, you might need it after tens of GB.

The biggest use case is that these frameworks allow a certain amount of "sloppiness" and for less pre-planning. Instead I just know that all of this text is getting dumped to s3 and I know I can find a way to sift through it all using Hadoop-ish tools. Pour it into RedShift when I've got a specific view I want to be able to query ad hoc.

It's not that you can't do some of this in other ways (for myself at least). It's that I can be pretty nimble doing it this way personally. That's all.




RSS | Announcements