DataTaunew | comments | leaders | submitlogin
2 points by kisamoto 3580 days ago | link | parent

I suppose it really depends on what you're looking for.

I'm actually using a combination of MongoDB and PostgreSQL for my data storage and analytics. The data is uploaded and instantly stored in MongoDB. Being schemaless this makes it really easy for me to add a new attribute or dimension to my data and not worry about the API failing due to a column not present in PostgreSQL.

MongoDB has reasonable horizontal scalability as your data grows and you want to store as much as possible but it does have it's limitations (particularly in the geo field where I perform a lot of my analytics but also in the date/time area[1]).

Next step is data extraction and preparation for data science and this is where PostgreSQL comes in. I can read my data in chunks from MongoDB into PostgreSQL schemas and perform complex geo queries and analysis on them, often storing or adding the results back into a new mongo collection.

A really useful comparison is a blog post on aggregating NBA data in both datastores[2]. Due to the age and maturity of PostgreSQL and the powerful query language it provides clear syntax and database level power.

Tl;Dr - MongoDB = Schemaless, scalable datastore for evolving data. PostgreSQL = Powerful analytics through the SQL query language and mature features.

[1] - http://stackoverflow.com/questions/17834596/mongodb-querying...

[2] - http://tapoueh.org/blog/2014/02/17-aggregating-nba-data-Post...



2 points by kisamoto 3579 days ago | link

There's also a good answer on DataScience about the use of NoSQL in general in DataScience:

http://datascience.stackexchange.com/questions/793/uses-of-n...

-----




RSS | Announcements