A very preliminary bus tardiness distribution

September 17th, 2014

Briefly following up on the previous post:

I’ve started working with Champaign-Urbana’s real-time departure API. Right now, I’m using a little Python script to send requests and store them in a local PostgreSQL database. Below is a probability density plot from the first 1,000 or so data points I’ve pulled down. It’s only from the weekday mornings when I’ve run the script, mostly from the 150 stops I queried just a moment ago.

A very preliminary tardiness density plot

But it looks like my prediction (see the earlier post) may not have been terribly far from the mark.

The next step is to determine a programmatic way to randomly query particular arrivals to make sure I avoid any systematic error in the sampling. This is necessary because I’m limited to 1,000 API calls per day and can’t just hammer their server with requests for every scheduled arrival.

I’m also recording location attributes for all these records so I’ll be able to do some spatial analysis too :-)

2 responses to “A very preliminary bus tardiness distribution”

  1. Matthew says:

    I’m working on something similar for the MBTA real time data feed, which is readily available for buses.

    I’m most interested right now in buses that do not arrive at regular intervals, but rather ones for which the user must check the schedule to know when it will arrive.