Your post struck a chord with me. Acquiring data is a huge part of one's job as a data analyst/scientist, but there nearly isn't enough tools or resources on "practical data collection."
For example, I recently became curious about Product Hunt and its growth. What I ended up spending most of my time (before plotting pretty charts with ggplot) was reverse-engineering Product Hunt's API to download a bunch of data. Stuff like this is never explicitly taught but hugely valuable if you want to use data as your decision-informing tool.
Agreed. Data aggregation is very hard. Some common problems include: those who have the data don't want to share or sell it; data is open but only in undocumented/hard-to-use formats; data is available but messy and heterogeneous; and data is clean but there are lots of different sources that need to be unified (and unification is very hard).
A lot of data scientists spend way more time on cleaning or joining or deduping data than they spend building analyses and models. It's frustrating. Fortunately there are more and more tools like Trifacta and OpenRefine that make those tasks easier.
There's definitely room for more tooling! Re: OpenRefine -- I'm not sure if it's still evolving much, but the last time I used it it still saved me a bunch of time.