DataTaunew | comments | leaders | submitlogin
1 point by adamlaiacano 3851 days ago | link | parent

The short answer, and the one that you seem to be looking to get, is that you don't have to learn these things if you don't need to use them. You also don't need to learn frameworks like rails or tornado, or languages like c or fortran, even though they are all instrumental every-day tools for a minority of people who work in "big data."

If the data you're working with can fit in memory, your far better off sticking to python/r/julia/matlab/stata/whatever. Your code will run much faster because Hadoop is for an i/o bound system rather than CPU bound, it's far easier to set a up (especially if you aren't familiar with JVM), and there are WAY more libraries for doing machine learning.

THAT SAID, if you ever plan on scaling your work, you're going to have to get into Hadoop world. Scalding has become my go-to for normal data munging/manipulation and some simple classification stuff, even on my local machine. I've found the "split/apply/combine" paradigm syntax is far more intuitive than plyr or pandas, and it's nice to know that I can submit the exact same code to a 100 node cluster if I have to. However, if I want to run any iterative algorithm like SVM or even k-means, I know it's going to be extremely slow because Hadoop does not handle iteration well.



1 point by tfturing 3851 days ago | link

I guess I "want" that answer to be true in the same sense Al Gore "wants" global warming to be true. I'm starting to feel that "big data" frameworks get more attention than they deserve since it clashes with the "Data Science for the Masses" mantra. My biggest fear is that people spend time on that instead of, or feel that is more important than, gaining an adequate background in computer programming and statistics. Also, I'm not sure why it warrants its own course on Udacity. Even if nobody uses your CS 101 website or search engine, you feel a sort of accomplishment when you complete it. You set out to build something, you make it and it doesn't cost anything more than the computer you already bought. I doubt you get that feeling of accomplishment when you type some MapReduce code you likely won't ever use to its full potential.

-----




RSS | Announcements