In this podcast Werner Schuster talks to Martin Hadley, data scientist at University of Oxford. They discuss the state of the R language, the rich R ecosystem that covers development (RStudio), notebooks for publication (R Notebooks, RPubs), writing web apps (Shiny), and the pros/cons of the different data frames implementations.
Key Takeaways
- R is the tool for working with rectangular data
- Modern data frame implementations are Tibble and data.table (for large amounts of data)
- RMarkdown and R Notebooks allow to explore data and then publish it the results and (interactive) visualization
- Use Shinyapps to publish server side R applications
- Tidyverse is the place to look for modern R packages
Subscribe on:
Show Notes
R - the language
- 1:14 - Many new users perceive R not as a language but as a piece of software.
- 1:44 - R is great for rectangular data (from DBs or spreadsheets or flat files), less so for numerical data. Data frames are the technology for rectangular data, similar to data frames from Panda.
Data frames in R
- 2:44 - Different implementations of data frames. Base R data frames are included with R, but are a bit dated. Tibble comes from the Tidyverse, new in 2015. For huge amounts of data use the data.table package.
- 3:59 - Interfaces for Base R data frames and Tibble are interchangeable; converting them to data.table data frames takes some work.
Which R implementation to choose
- 4:39 - Popular R implementations: CRAN by the R Foundation, the package manager for R. It’s not that efficient, no MKL support for matrix operations.
Microsoft’s R implementation (used to be Revolution Analytics).
ValidR by Mango, uses MKL libraries etc, went through popular libraries from CRAN and validated they do what they promise to do. - 7:29 - CRAN is a repository for R packages, users can submit their packages to CRAN.
TidyVerse by RStudio is a more tightly controlled repository focussed on newer technologies. - 9:49 - Typical R users are mostly data scientists and academic researchers.
RStudio
- 11:11 - RStudio is the standard programming interface, provides notebooks with R Markdown which can export to formats like HTML, PDF, and MS Word documents. Notebooks contain text, code, and output. Can include HTML Widgets.
R Notebooks with R Markdown can be published for free to RPubs.com.
Published notebooks contain the data/code and visualizations, interactive visualizations are shipped with the data to allow interaction.
Deploying R code
- 16:27 - Different deployment options for R code. The server version of RStudio allows to run R code on the server.
Use Shiny for interactive elements in R, RStudio has a hosted solution shinyapps.io, free and paid (support) versions
What not to do with R
- 17:53 - R is not too well suited for numerical simulation, better to go with Python or others. R is GPL as are the packages on CRAN.
- 19:20 - R can be tricky to learn at first for people who are programmers. R has some libraries that can provide functional or other concepts.
R language development and community
- 21:41 - R language is quite stable over time.
“R for Data Science” by Hadley Wickham, Garrett Grolemund.
Upcoming changes are new ways for non-standard evaluation in Tidy Eval, gives lazy evaluation and other features. - 23:4523:45 - Resources for R: Twitter for keeping up with news on the #rstats and #tidyverse hashtag. Following @hadleywickham for new developments. RStudio blog for the IDE and Tidyverse. Martin’s courses on Lynda