Category Archives: Data

Minivan Price Comparison With R

Minivan Price Comparison With R

With my family growing once again and my 13-year-old Mazda Protégé on the fritz, I recently decided it was time to go minivan shopping. A frugal shopper, some might say cheap, I quickly set my focus on the used, domestic market and found that there are only two competitors here, the Dodge Grand Caravan and the Chrysler Town and Country.

Two questions immediately came to mind:

  • As these two minivans are, for all practical purposes identical (manufactured at the same facility, same internals, just different branding), if one compared them with a similar set of features, does one name carry a price premium over the other?
  • Is there any feature of the price-depreciation curve that is readily observable that one might take advantage of when shopping? Are there advantageous mileages or years to buy?

In order to try to answer these questions, I gathered data from AutoTrader.com. The Grand Caravan/Town & Country was most recently redesigned in 2012, so I collected a total of 350 prices for the model years 2012-2015.

Subject to restrictions from my wife, the minivan had to have power doors/trunk, a backup camera, and a DVD player. It turns out that the entry line model Town and Country Touring meets these requirements, but these features are not guaranteed in the Dodge Grand Caravan until the RT trim line. As such, I only collected prices for models at these trim lines to make the feature set as similar as possible.

First, I plotted the combined set of data points with ggplot2 to try to get a sense of that depreciation curve.

minivan_prices_ggplot2

It would appear that all those years that I spent teaching straight-line depreciation to Algebra II students were an exaggeration. The graph does seem to trend a bit more horizontal between around 50,000 miles and 90,000 miles. Then depreciation resumes its previous course. This might be explained in part by the fact that Chrysler offers a 5-year, 100,000-mile power-train warranty on these vehicles, making them a bit more desirable until their warranties expire at 100,000 miles.

Having made the case that this trend is not completely linear, let me nonetheless assume that is is reasonably close enough to use R to fit a linear model to this data. A little bit of exploratory data analysis suggests that the average van owner in this data set drives about 18,000 miles a year, so my variables will be model year, 18K miles, and make (Chrysler or Dodge).

lm(formula = priceK ~ age + miles18K + factor(make), data = training

Coefficients:
   (Intercept)                age           miles18K  factor(make)dodge  
        24.813             -0.412             -1.862              1.317  

As we can see, the model suggests approximately a $1300 price premium on the Dodge Grand Caravan. A quick look at the confidence intervals suggests that the Dodge probably really does sell for more.

                       2.5 %      97.5 %
(Intercept)       23.8111303 25.81575484
age               -0.7719586 -0.05211861
miles18K          -2.2128834 -1.51184676
factor(make)dodge  0.5989757  2.03552222

It’s very possible that the RT trim line carries some additional features that the Town and Country Touring trim line doesn’t, as I am far from a minivan expert. Either way, given the restrictions I was handed, these were the two models and trim lines under consideration for my family, so any significant difference between them was what was of greatest interest to me. For the curious reader, we ended up buying a 2014 Chrysler Town and Country with about 50,000 miles on it, and yes, it did have a negative residual.

University of Washington Machine Learning Classification Review

I’ve spent the last couple of months working through course three in the University of Washington’s Machine Learning Specialization on Coursera. Course two was regression (review); the topic of the third course is classification. As has been the case with previous courses, this specialization continues to be taught by Carlos Guestrin and Emily Fox. For… Continue Reading

Graphing Calculator Price Dashboard

Graphing Calculator Price Dashboard

These interactive plots show the prices on Amazon for popular Texas Instruments calculators such as the TI-Nspire CX (review) and TI-84 Plus CE (review) as well as non-TI models like the Casio Prizm (review) and HP Prime (review). The graphs show the last 7 days, and they update every hour, day and night, so check… Continue Reading

Coursera Review–Machine Learning: Regression

Coursera Review–Machine Learning: Regression

I’ve recently completed the second course in the University of Washington Machine Learning Specialization on Coursera, “Machine Learning: Regression.” This comes on the heels of completing course 1, Machine Learning Foundations: A Case Study Approach. This course debuted right at the end of November and wrapped up 6 weeks later (my impression is that these… Continue Reading

Constructing a Social Graph With Twitter and Plotly

Constructing a Social Graph With Twitter and Plotly

In a couple of earlier posts, I showed an example of a social graph created from Twitter data and Plotly, a graph of relationships between educational technology enthusiasts on Twitter. Those posts were more for the educator audience that I write for, but increasingly, I’m getting feedback on my posts from other data scientists, so… Continue Reading

Teaching Graph Theory With Twitter

In a recent post, I displayed the social network graph that I created using the Twitter API and Plotly. There are a number of interesting applications here. Given my history with education, one that I think that shouldn’t be overlooked is as an interesting way to teach graph theory for an innovative teacher and school.… Continue Reading

#EdTechChat Social Network Graph

#EdTechChat Social Network Graph

Using the Twitter API and Plotly with Python, I created a visualization of a recent #EdTechChat on Twitter, held on December 14. If you aren’t familiar with graph theory, the dots in this visualization are referred to as nodes or vertices. They represent the Twitter users that participated in the chat. The line segments connecting… Continue Reading