DataTaunew | comments | leaders | submitlogin
Ask Datatau:What is your setup?
18 points by biqo 3765 days ago | 28 comments
What is your setup for day to day data wrangling?


5 points by rd108 3765 days ago | link

iPython, numpy/scipy/matplotlib, and any other Python libraries that might help with the task at hand! I also use dual monitors with a Zshell and Sublime Text editor on one and the iPython console on the other. Trying to remember to check in bigger bits of interactive analysis as I go (version control using git) just so I have a shareable record of what I did and don't repeat myself whenever possible.

-----

1 point by kghose 3764 days ago | link

Just add pandas and PyCharm for my setup

-----

4 points by gjreda 3764 days ago | link

Early 2011 13'' MacBook Pro w/ 2.3 GHZ i5, 8GB RAM + ThunderBolt Display

    * IPython + pandas/numpy/matplotlib/boto + Flask (if needed)
    * Amazon EC2, S3 (+s3cmd), Redshift, Elastic MapReduce (Hive, Impala)
    * MySQL (hate it, wasn't my choice)
    * Terminal (unix tools FTW)
    * Sublime Text 2
    * Trello (task management)
    * Git & SVN (depends on which internal project)

-----

3 points by jcbozonier 3765 days ago | link

Python, iPython, Pandas (especially the GA module), MRJob lately, S3, MixPanel, GA, Sublime Text 2, Omnifocus, Tableau, NLTK a bit, GIT, and less so every day but some R.

-----

1 point by barcel 3765 days ago | link

GA module?

-----

2 points by don 3764 days ago | link

Guessing "Google Analytics", pandas has an interface to read data from it

-----

1 point by jcbozonier 3764 days ago | link

Yep.

-----

1 point by thauck 3764 days ago | link

I assume you working in Marketing Analytics, but any more hints... just curious as this is very similar to my tool set.

-----

1 point by jcbozonier 3764 days ago | link

I'm a product optimization specialist which means I'm focused largely on split testing and understanding conversion rates of our website. I also dig through our data looking for insights that can inform our future testing direction.

It's a role that's odd in that it sits somewhat between marketing, product, and technology groups.

EDIT: typo

-----

2 points by jsogarro 3763 days ago | link

I'm new to Data Science but my setup while I'm learning is as follows:

Langauge: Python

Libraries: NumPy, SciPy, Matplotlib, Pandas, Scikit-learn

Editors: Sublime Text 3 and Vim (depends on how I feel)

Version control: Git

Other tools: IPython/IPython Notebook

Machine: 15 Inch MBP w/ Retina display

-----

1 point by BioCore 3720 days ago | link

Hello jsogarro,

How do you like the 15" MBP w RD for data analysis (I understand that you are learning)? Are you using any external displays at the moment?

-----

2 points by canuc 3764 days ago | link

Training/Exploration: - Mr Job on company's dedicated cluster - Local instance of postgresql to rapidly rebuild training sets - python mostly for combining the data and transformations that are a pain in mrJob/Postgres. - R for most model building unless I want to try something off the beaten path in which case I'll use sci-py + pycuda

Production: - Hive - MSSQL - Python which calls R models and functions through Rpy2 - Tableau

Use mostly vim for text editor and git for version control on a 2012 macbook pro

-----

1 point by Bschuster3434 3762 days ago | link

On the job, I use SQL Server Management Studio to interact with my company's database, and then I either export the data to excel or, more often, to Tableau for data visualization and dashboard creation.

After taking an online course for Data Analysis from Johns Hopkins University, I am now trying to bring some statistics and machine learning to the company via iPython.

-----

1 point by econometri-san 3763 days ago | link

Econometrician: mostly R, and some Scientific Python too. I also have my eye on Julia.

-----

1 point by larrydag 3763 days ago | link

I'm a data scientist/statistician first then a programmer second.

environment: R

archive: PostgreSQL, MySQL

-----

1 point by robdoherty2 3763 days ago | link

pandas, numpy, scipy, sk-learn, rpy2, ggplot2 emacs for editing and running ipython qtile window manager on ubuntu

-----

1 point by ultimatehurl 3764 days ago | link

iPython, numpy, sklearn, pandas (Anaconda basically). Have been looking into Clojure so have set up lein, but so far I haven't found a good project to try and learn it for.

Away from data science specifics I work a lot with plain text files for task management and use LaunchBar and Taskpaper on OSX to manage them. Have been switching between vim and Sublime, but honestly I think I'm just going to stick to Sublime, it's what's comfortable and similar to everything else I use.

-----

1 point by thauck 3764 days ago | link

  * pydata stack
  * Amazon EC2, S3, Redshift, OpsWorks, etc, etc
  * R 
  * Tableau
  * Terminal - vim, git, zsh
  * MrJob if I need hadoop
  * A bit of clojure / scala (still figuring out which I like more)
  * Asana for task management

-----

1 point by radikal 3764 days ago | link

ipython nb/numpy/scipy/matplotlib/pandas/pymc/cython

Trying to get into pylearn2/theano more but generally failing...

A lot of C/C++ for speed sensitive stuff.

HTML5/CSS/JS/d3.js/3.js/Data Tables/Tornado for interactive vis type stuff.

Sublime for editor, git for vc, zsh for shell, evernote for todos/notes.

-----

1 point by NickC 3764 days ago | link

iPython, numpy/scipy/matplotlib, Pandas, Git, Bash, Sublime Text 2, Vim, RStudio, R, ggplot2. I spend a lot of time working on remote computer clusters (SGE and LS) from a 13 inch MacBook Pro. Spectacle.app and Alfred.app are invaluable with screen space at a premium. I save my work/pipelines in Evernote and write documentation in Markdown with Marked.app.

http://spectacleapp.com/

-----

1 point by quantisan 3764 days ago | link

Languages: Clojure, Java, Python, Bash, R Editor: Vim, Sublime Text Platforms: VirtualBox/Vagrant, Elastic Mapreduce Libraries: Cascalog, ggplot

-----

1 point by manos_p 3765 days ago | link

iPython and numpy/scipy/matplotlib/pandas/rpy2

-----

1 point by zmjones 3765 days ago | link

R, Python, Git and Emacs (w/ ESS, Magit, and Org)

-----

1 point by barcel 3765 days ago | link

When do you use R vs Python? Or does it just depend on the specific problem?

-----

2 points by NickC 3764 days ago | link

R for plotting (ggplot2 > matplotlib) and specialized analyses not readily available in Python (e.g., ANOVAs, GLMs, etc). Python for data cleanup and preliminary analysis.

-----

2 points by Tomrod 3762 days ago | link

Have you used Rpy?

-----

1 point by zmjones 3764 days ago | link

depends on the problem, but mostly R. I am a social scientist and most of the data I work with is not that big. I can be really productive in R.

-----

1 point by ultimatehurl 3764 days ago | link

I'm really interested in taking first steps in R, people I've talked to have had an 'R to explore, Python for production' mentality, I've mostly used Python. Do you have any recommendations as to where to start?

-----




RSS | Announcements