Agreed. Data aggregation is very hard. Some common problems include: those who have the data don't want to share or sell it; data is open but only in undocumented/hard-to-use formats; data is available but messy and heterogeneous; and data is clean but there are lots of different sources that need to be unified (and unification is very hard).
A lot of data scientists spend way more time on cleaning or joining or deduping data than they spend building analyses and models. It's frustrating. Fortunately there are more and more tools like Trifacta and OpenRefine that make those tasks easier.
There's definitely room for more tooling! Re: OpenRefine -- I'm not sure if it's still evolving much, but the last time I used it it still saved me a bunch of time.