The seventh course in Johns Hopkins Data Science Specialization on Coursera is Regression Models. This is the second course in the sequence taught by Brian Caffo, after Statistical Inference. Much like that course, the emphasis here is on mathematics, and people who have been out of the mathematical loop for a while will probably find this class to be a struggle.

In fact, after breezing through most of Statistical Inference, I found significant portions of this class to be more challenging. After the first week of Regression Models, I didn’t have much prior knowledge to rely on, which automatically made the class more challenging.

The basics of regression such as a line of best fit, least squares, residuals and the like were all familiar enough. Multiple regression was something I’d played around with a bit too, but we went much deeper than I’d gone with that in the past. I also learned a great deal about topics such as anova testing, variance inflation, and hat values, topics that were completely new to me. In week 4, we covered generalized linear models, and a week after the class ended, I’m still trying to wrap my head around the most advanced aspects of glm’s.

Dr. Caffo does spend a significant amount of time on proofs in the class, even though the proofs aren’t really assessed at all. As a high school mathematics teacher, I can appreciate that because you don’t want to short change your students by giving them a lot of formulas and examples without the rationale and theory behind the problems. That said, given time limitations and the fact that they weren’t being assessed, I did skip some of the proofs in the lectures since I was more interested in application. Perhaps I short changed myself, but to stay on schedule, sometimes trade-offs have to be made. Much like with the Statistical Inference course, Caffo makes heavy use of the manipulate package, and that’s a good thing since it helps to visualize the concepts under discussion.

When I took the class in July, the grade was made up of 4 quizzes and a project, due at the end of week 3. The project, an analysis of the mtcars data set, felt very open ended for a statistics course, and I found myself stressing quite a bit over it, particularly since there was a very tight length limit (2 pages) that I had to justify myself. It seemed impossible to apply the concepts I had learned in Reproducible Research with such a short length of paper.

I found this to be a more challenging course than Statistical Inference because the material went beyond the typical beginning statistics course. You may want to look at the supplemental resources I suggested in my review of Statistical Inference because they are relevant here as well.