Exploratory Data Analysis Coursera Review

Exploratory Data Analysis Coursera Review

By Lucas | August 18, 2014

Coursera Exploratory Data Analysis

The fourth course in Johns Hopkins Data Science Specialization on Coursera is Exploratory Data Analysis. This is the second class in the sequence taught by Roger Peng, after R programming.

This course could just about as well be titled “Visualizing Data,” since most everything in the class emphasized methods of presenting data visually in R. The bulk of the time in the class was spent on the 3 most popular methods of graphing in R: the base plotting system, lattice plot, and ggplot2.

Each of these methods of plotting has their own unique syntax. While I got pretty comfortable with base plotting, I’m still gaining a comfort level with lattice and ggplot2. I’m glad we dove in with them because it’s pretty clear from poking around Stackoverflow and other forums that these packages are very widely used. Since I completed the course, I recently attended a webinar taught by R guru Hadley Wickham, where he explained the newer package ggvis. Since Hadley made it pretty clear that ggivs is going to someday replace ggplot2, I wish we had at least touched on it in the course.

Unlike most of the later classes, this one had 2 projects, not 1, and one of the projects was due at the end of week 1. I was not doubled up, but trippled up when taking Exploratory Data Analysis, since I was also taking Reproducible Research and Statistical Inference simultaneously. That meant I really had to jump in and get going on that first quiz and project immediately.

I found the second project to be extremely challenging. Only in retrospect did I realize that I’d made a few foolish mistakes by trying to accomplish things with for loops that could have been done much more easily with apply functions. If you are signing up for the sequence now, learn from my mistakes and master those apply functions early.

In week 3, significant time was devoted to hierarchical clustering, dendrograms, k-means clustering, and heatmaps. However, these topics weren’t assessed in either project, so I don’t feel like I mastered them as well as I wish I had. That is a bit of a weakness of these courses being each a month long. Some topics are going to have to be for exposure rather than mastery.

Ultimately, this was another course that taught absolutely critical skills in the Data Science Specialization. I can’t imagine moving forward without having learned these visualization techniques in R.

comments powered by Disqus