Update: this article was featured in The Boston Globe.
As I approached my 24th birthday, I realized that my driver's license was about to expire. I'd have to go to the RMV. These agencies are notorious for long wait times so naturally I wondered when the best time to go would be. Certainly not lunch hours, probably not close of business either - perhaps right before then?
Fortunately, MassDOT posts its wait times for licensing and registration; for example, the Boston RMV location is posted here. But it only gives you a point in time - it doesn't tell you how the wait time has changed over time.
I decided this would be a pretty simple task, starting with a Python script. Query all RMV location websites every minute and store that in a database (eg. .csv file), host that database on Amazon EC2, run an R Shiny server on it and have an R Shiny webapp auto-update with the database.
For simplicity, I wondered what a "typical" week looked like - the following R Shiny app takes the entire database and averages each minute for each weekday, Monday through Friday. Here are the results (click on the image to see the live webapp - about 30s load time):
Somewhat as I expected (though it varies by location), the best time to go is before close of business and between 10-11am. What I didn't expect is how much it varies by day (eg. Friday is a terrible day to go; Wednesday is pretty good). As this database queries the RMV websites and grows, it will illustrate a better and better picture of when you should go to the RMV.
Although the practical application of this webapp is apparent, I mostly did it because it seemed like a multi-disciplinary project. And it was. I was able to get my hands dirty with R (backend code), R Shiny (webapp interface), R ggplot2 (plotting), Amazon AWS (virtual server), apache (webserver), git (code version control), Python scraping and scheduled live data refreshes - all in a single project.
If anyone ever wants to build their own app, or simply access the data, it's available online. The data, which is live and scraping the RMV websites each minute, can be found here. The code (Python and R) can be found here.
I also thought it'd be useful to map of the RMV locations - if one location is not too far away and has persistently lower average wait times, it probably makes sense to drive there instead: