I hate R with a passion! With the invention of iPython + Statsmodels + Pandas, or even Julia these days, it is becoming painfully obvious that it is the worst option of them all. The syntax sucks, there are 10 ways to do the same thing, and it isn’t scalable. Unfortunately, if you work in data science, it is still unavoidable a lot of statistical work is still done in R.

I dont think you need to learn the language fluently, but here are a series of commands that definitely will help you get by:

Install A Package


Changing your working directory


Run an R script


Open plot window from within R script

bp <- boxplot(log(posts$all.count+1) ~ posts$post.hour,
xlab="Hour Of Day [0 is 12AM]",
ylab="log(Likes + Shares + Comments)",
main="Total Number Of Post Interactions vs Hour Of Day")

Reading data from a CSV

posts <- read.csv( '/path/to/file.csv', header=TRUE, strip.white=TRUE)

Converting a Data Frame Column From String to Date

df$date <- as.Date(df$date,format="%y/%m/%d")

Calculate the sum of multiple columns in a Data Frame

df$avg.interaction <- apply(df[,c(3,4,5)],1,sum,na.rm=TRUE)

Calculate the mean of multiple columns in a Data Frame

df$avg.interaction <- apply(df[,c(3,4,5)],1,mean,na.rm=TRUE)

Filtering A Data Frame Column On Upper + Lower Bounds

df$filtered <- df[posts$values<=upper_bound & posts$value>=lower_bound,]

String Concatenation

paste("Hello", " World")

Dump the Results of a Linear Regression

print(summary(lm(y ~ x),data=dataset)))

Number of Rows In A Dataframe


Number of Columns In A Dataframe


Names of dataframe columns


List all variables in the workspace