Sunday, January 23, 2011

Day one with R, head first data analysis

Awesome. I installed R (r-project) about 10 minutes ago, and I just created my first scatterplot! This is a long ways from my days with p-fit and n-fit.

I'm reading Head First Data Analysis, published by the fine folks at O'Reilly. I'm enjoying reading this Head First book. Going in, I always think the asides, cartoons and irreverent colloquial manner will be off-putting, but it really does flow nicely. I look forward to comparing it to my other new O'Reilly book, Data Analysis with Open Source Tools (released in Nov 2010).

On page 291, we see this "Ready Bake Code," to pull a csv from their website, load it into R and print a scatter plot of a subset of the data.

employees <- read.csv( "", header=TRUE)
head( employees, n=30 )
plot ( employees$requested[employees$negotiated==TRUE], employees$received[employees$negotiated==TRUE] )

Boom, I have a scatter plot of the subset of employees where the NEGOTIATED field is TRUE, comparing the requested to the received.

I did a full install onto my ubuntu laptop by adding the official r-project aptitude repository, which gave me a slightly newer version than what was available in the default Ubuntu 10.10 (Maverick) repositories. Cran asks you to manually pick a cran mirror, I chose my local UCLA mirror.
# Create /etc/apt/sources.list.d/r.list
deb maverick/
# add key (optional,but preferred)
gpg --keyserver --recv-key E2A11821
gpg -a --export E2A11821 | sudo apt-key add -
# update aptitude
sudo aptitude update
# install r
aptitude install r-base
# launch R (not 'r' -- that's a shell built-in)

No comments: