DataTaunew | comments | leaders | submitlogin
2 points by larrydag 3853 days ago | link | parent

I'm a Data Scientist first and a programmer second. I have not found a use-case for these frameworks yet in my work or side projects. I've found that I can do most any Data Science task with R and a relational database (PostgreSQL, MySQL, etc.). What are the typical use cases of these frameworks especially for Data Science?


1 point by jcbozonier 3848 days ago | link

I got by on just coding scripts to process files for quite a while. If you're careful, you can put off learning this stuff until you have quite a bit (read: terabytes) of data. If you're sloppy, you might need it after tens of GB.

The biggest use case is that these frameworks allow a certain amount of "sloppiness" and for less pre-planning. Instead I just know that all of this text is getting dumped to s3 and I know I can find a way to sift through it all using Hadoop-ish tools. Pour it into RedShift when I've got a specific view I want to be able to query ad hoc.

It's not that you can't do some of this in other ways (for myself at least). It's that I can be pretty nimble doing it this way personally. That's all.

-----

1 point by roycoding 3852 days ago | link

I'll echo this. I've been consulting as a data scientist for 1.5+ years and almost none of the work I've done has required large scale frameworks, even when clients assumed it was necessary.

That being said, I think it's important to understand the basics and have some grasp of these large frameworks so that you'll be more likely to know when they're appropriate.

-----

1 point by ironchef 3852 days ago | link

The issue is typically once you start getting to either high volume datasets (200+ GB) or high velocity datasets ("realtime or neartime") imo. Then one would often need to resort to some of these frameworks. Higher variety doesn't seem to require it offhand (although it can make things easier) and changes in veracity don't seem to necessitate it either.

-----




RSS | Announcements