Reproducible Research Coursera Review

ReportingDataThe fifth course in Johns Hopkins Data Science Specialization on Coursera is Reproducible Research. This is the third and final course in the sequence taught by Roger Peng.

Reproducible Research is the course among the first five in the specialization (except The Data Scientist’s Toolbox), where I spent the least time learning new R code. Instead, the emphasis of this course was more philosophical in nature. Here the emphasis was on writing your research findings up in a way that they could be shared with others in such a way that they were considered to be reproducible, though not necessarily replicable. For more on the definition of reproducible research, check out this post from Dr. Peng.

That’s not to say there isn’t much R coding in Reproducible Research, or even less coding. Just like the other classes in the sequence, I still spent a fair amount of time cleaning data and programming R for data analysis. It’s just that the emphasis of the class was on communicating those results in a manner that anyone who was well versed in R could follow my analysis from the very first step to the very last step and reproduce those results.

One of the niftiest features of RStudio that we explored in this class was its ability to easily use Knitr. Using Knitr, we created single documents that combined markdown and R code into one, simple to read document. The output of the code is contained right in the document and the code itself can be revealed or hidden. The document can be outputted as say, a pdf or html file. It’s a really handy tool.

Throughout the course, Dr. Peng emphasized the importance of making your research reproducible. It reminded me a bit of being back in high school and being told I needed to “show my work.” Very compelling examples were shared with the class of the importance of reproducible research. Without a doubt, the most compelling example was the case of the fraudulent cancer research at Duke University, which eventually made its way onto 60 Minutes.

While I do hope the Data Science Specialization leads me to a new career opportunity, I don’t suppose it’s very likely that I’ll end up as a cancer researcher. Will reproducible research be as important to me as those cutting edge medical researchers? Perhaps not, but I can certainly understand why this course was included in the sequence, and even if I only end up sharing my code with a few coworkers down the road, I’ve learned a thing or two about the proper way to share my results with them.

Related articles:

Lucas Allen

Lucas Allen

For more than a decade, Lucas Allen was a high school math teacher and math team coach in Illinois. His 2012 Morton High School math team won the Illinois state championship. Recently, he made the jump from public education to the corporate world and is now working as a data scientist. He is interested in just about all forms of technology, including the TI-Nspire, Nexus devices, R, MOOCs, and more. You can follow , and if you are nice, he will probably follow you back.

More Posts - Website

Follow Me:
TwitterFacebookGoogle Plus


Sorry, comments are closed for this post.