Author Archives: Lucas Allen

An Interactive Map of ACT and SAT Results

An Interactive Map of ACT and SAT Results

One of the tools we used in my data science training to publish interactive web applications created with R was Shiny. I wanted to get a little more practice with Shiny and create an app that I thought could be of some interest to readers of my blog whether they visit for the secondary education aspect or data aspects I’ve been writing about more recently.

Below you will find an interactive map of the United States, written in R, using the googleVis package and Shiny. The left most tab includes ACT results, the middle tab includes SAT results, and the right most tab allows for comparison between the most comparable ACT and SAT results (i.e. the map compares “Reading” from the ACT to “Critical Reading” from the SAT).

Try not to take the comparisons between the states too seriously. As you can easily see by clicking on “Percent Tested”, some states have very few students participating in one test and many more in the other, which significantly skews results. For example, my own state of Illinois has the highest SAT results, but only 5% of our students take the SAT, likely among our highest achievers. We rank much lower on the ACT, but 100% of students took that test due to state requirements. Despite this major caveat, it’s interesting to see how different the results are across the country.

The map seems to look best when viewed in recent versions of Internet Explorer. All results used in the analysis are from the 2013 test. You can see the source of the ACT data here and the SAT data here. My source code can be viewed on GitHub.

Responding to Good for TI…Bad for Kids and Stat

Responding to Good for TI…Bad for Kids and Stat

Over at R-Bloggers a few days ago day, I came across a post from Norm Matloff, professor of computer science at UC-Davis. The post, Good for TI, Good for Schools, Bad for Kids, Bad for Stat, had been reposted from his blog, the Mad (Data) Scientist. Throughout this post and the one that preceded it, Statistics:…

Thoughts on Completing the 9 Johns Hopkins Data Science Courses

Thoughts on Completing the 9 Johns Hopkins Data Science Courses

A process that began 4 months ago, the sequence of 9 Johns Hopkins Data Science Specialization courses on Coursera, wrapped up for me late last week with my last quiz in course 9, Developing Data Products. While I haven’t truly finished the specialization yet (the first ever capstone project doesn’t launch until late October), I…

Developing Data Products Coursera Review

Developing Data Products Coursera Review

The ninth and final course prior to the capstone in Johns Hopkins Data Science Specialization on Coursera is Developing Data Products. This is the third and final course in the sequence taught by Brian Caffo. After taking the lead on two statistics courses, Statisical Inference and Regression Models, this class seemed to bring out a more humorous side…

Practical Machine Learning Coursera Review

Practical Machine Learning Coursera Review

The eighth course in Johns Hopkins Data Science Specialization on Coursera is Practical Machine Learning This is the third and final course in the sequence taught by Jeff Leek. Probably more than any other course in the JHU series of classes, this is the one that feels like it brought the whole sequence together. Students of Practical Machine…

Regression Models Coursera Review

The seventh course in Johns Hopkins Data Science Specialization on Coursera is Regression Models. This is the second course in the sequence taught by Brian Caffo, after Statistical Inference. Much like that course, the emphasis here is on mathematics, and people who have been out of the mathematical loop for a while will probably find this…

Statistical Inference Coursera Review

The sixth course in Johns Hopkins Data Science Specialization on Coursera is Statistical Inference. This is the first course in the specialization taught by Brian Caffo. In my review of the R Programming course, I mentioned that there were two places in the sequence that seemed (based solely on my observations of forum comments) to be bogging…

Reproducible Research Coursera Review

The fifth course in Johns Hopkins Data Science Specialization on Coursera is Reproducible Research. This is the third and final course in the sequence taught by Roger Peng. Reproducible Research is the course among the first five in the specialization (except The Data Scientist’s Toolbox), where I spent the least time learning new R code. Instead, the…

Exploratory Data Analysis Coursera Review

The fourth course in Johns Hopkins Data Science Specialization on Coursera is Exploratory Data Analysis. This is the second class in the sequence taught by Roger Peng, after R programming. This course could just about as well be titled “Visualizing Data,” since most everything in the class emphasized methods of presenting data visually in R. The bulk of…