I hate R with a passion! With the invention of iPython + Statsmodels + Pandas, or even Julia these days, it is becoming painfully obvious that it is the worst option of them all. The syntax sucks, there are 10 ways to do the same thing, and it isn’t scalable. Unfortunately, if you work in data science, it is still unavoidable a lot of statistical work is still done in R.

I dont think you need to learn the language fluently, but here are a series of commands that definitely will help you get by:

Install A Package

install.packages("ggplot2")

Changing your working directory

setwd("path/to/working/directory")

Run an R script

source("path/to/script.R")

Open plot window from within R script

dev.new()
bp <- boxplot(log(posts$all.count+1) ~ posts$post.hour,
col="lightblue",
xaxt="n",
pch=19,
xlab="Hour Of Day [0 is 12AM]",
ylab="log(Likes + Shares + Comments)",
main="Total Number Of Post Interactions vs Hour Of Day")

Reading data from a CSV

posts <- read.csv( '/path/to/file.csv', header=TRUE, strip.white=TRUE)

Converting a Data Frame Column From String to Date

df$date <- as.Date(df$date,format="%y/%m/%d")

Calculate the sum of multiple columns in a Data Frame

df$avg.interaction <- apply(df[,c(3,4,5)],1,sum,na.rm=TRUE)

Calculate the mean of multiple columns in a Data Frame

df$avg.interaction <- apply(df[,c(3,4,5)],1,mean,na.rm=TRUE)

Filtering A Data Frame Column On Upper + Lower Bounds

df$filtered <- df[posts$values<=upper_bound & posts$value>=lower_bound,]

String Concatenation

paste("Hello", " World")

Dump the Results of a Linear Regression

print(summary(lm(y ~ x),data=dataset)))

Number of Rows In A Dataframe

NROW(data)

Number of Columns In A Dataframe

NCOL(data)

Names of dataframe columns

names(data)

List all variables in the workspace

ls()