RStudio is THE standard for exploratory data analysis on large data sets
Overall Satisfaction with RStudio
RStudio is used as a an R development environment for cleaning, manipulating, and analyzing large data sets. It is used in conjunction with Python for data science tasks. RStudio is used across the entire organization as a complement to other technologies and to support data science and analysis projects. In my role, I gather large data sets (>500,000 or million rows) from different platforms, and rely on RStudio to prepare data for further analysis. It's an excellent platform for conducting preliminary / exploratory data analysis: to get an understanding of trends and behaviors exhibited by the data set, and to guide later analytic decisions.
Pros
- Create and manipulate data frames: syntax is intuitive, terminal lets you see results / behaviors immediately.
- Visualization (especially using shiny or other visualization packages): so many different kinds of graphs and viz available.
- Sharing results and community documentation: extensive information is available on use and applications of different packages, making RStudio (and R) very versatile for a variety of analysis projects.
Cons
- R has a fairly steep learning curve and can be intimidating for new users. RStudio's package, swirl, is useful as an introductory tutorial for use and capabilities, but it is limited.
- RStudio sometimes has stability problems when it comes to working with very large / big data sets. This is because RStudio relies on the computer's memory to process the data. A quick calculation can be used to determine if the data set's size exceeds the computer's memory capabilities, though.
- Quickly analyze data to determine validity, and if further exploration is needed (basically as a triage to assess data trends/behavior/usefulness).
- Code can be re-used and redeployed to save time and improve organization efficiency.
RStudio works similarly to PyCharm (and PyCharm can support R code) insofar as it's a development environment meant to improve the coding experience and easily provide commonly used resources (packages). They both provide a navigable dev environment with some learning curve. RStudio is more bare-bones, though: it has fewer bells and whistles (like night mode, extensive additional language support, etc). I usually select RStudio if I'm just doing a basic internal analysis on data, because it's what I'm most familiar with and is usually the easiest to re-deploy for analyzing other sets of data.
Using RStudio
Pros | Cons |
---|---|
Well integrated Consistent Convenient Feel confident using | Unnecessarily complex Difficult to use Slow to learn Lots to learn |
- Ingesting data from common file types (CSV, XLSX).
- Performing basic visualization or analysis.
- swirl - can't recommend the built-in tutorials enough!
Comments
Please log in to join the conversation