iPython, numpy/scipy/matplotlib, and any other Python libraries that might help with the task at hand! I also use dual monitors with a Zshell and Sublime Text editor on one and the iPython console on the other. Trying to remember to check in bigger bits of interactive analysis as I go (version control using git) just so I have a shareable record of what I did and don't repeat myself whenever possible.
Python, iPython, Pandas (especially the GA module), MRJob lately, S3, MixPanel, GA, Sublime Text 2, Omnifocus, Tableau, NLTK a bit, GIT, and less so every day but some R.
I'm a product optimization specialist which means I'm focused largely on split testing and understanding conversion rates of our website. I also dig through our data looking for insights that can inform our future testing direction.
It's a role that's odd in that it sits somewhat between marketing, product, and technology groups.
Training/Exploration:
- Mr Job on company's dedicated cluster
- Local instance of postgresql to rapidly rebuild training sets
- python mostly for combining the data and transformations that are a pain in mrJob/Postgres.
- R for most model building unless I want to try something off the beaten path in which case I'll use sci-py + pycuda
Production:
- Hive
- MSSQL
- Python which calls R models and functions through Rpy2
- Tableau
Use mostly vim for text editor and git for version control on a 2012 macbook pro
On the job, I use SQL Server Management Studio to interact with my company's database, and then I either export the data to excel or, more often, to Tableau for data visualization and dashboard creation.
After taking an online course for Data Analysis from Johns Hopkins University, I am now trying to bring some statistics and machine learning to the company via iPython.
iPython, numpy, sklearn, pandas (Anaconda basically). Have been looking into Clojure so have set up lein, but so far I haven't found a good project to try and learn it for.
Away from data science specifics I work a lot with plain text files for task management and use LaunchBar and Taskpaper on OSX to manage them. Have been switching between vim and Sublime, but honestly I think I'm just going to stick to Sublime, it's what's comfortable and similar to everything else I use.
* pydata stack
* Amazon EC2, S3, Redshift, OpsWorks, etc, etc
* R
* Tableau
* Terminal - vim, git, zsh
* MrJob if I need hadoop
* A bit of clojure / scala (still figuring out which I like more)
* Asana for task management
iPython, numpy/scipy/matplotlib, Pandas, Git, Bash, Sublime Text 2, Vim, RStudio, R, ggplot2. I spend a lot of time working on remote computer clusters (SGE and LS) from a 13 inch MacBook Pro. Spectacle.app and Alfred.app are invaluable with screen space at a premium. I save my work/pipelines in Evernote and write documentation in Markdown with Marked.app.
R for plotting (ggplot2 > matplotlib) and specialized analyses not readily available in Python (e.g., ANOVAs, GLMs, etc). Python for data cleanup and preliminary analysis.
I'm really interested in taking first steps in R, people I've talked to have had an 'R to explore, Python for production' mentality, I've mostly used Python. Do you have any recommendations as to where to start?