Just some preliminary results from my first attempt to archive real-time bus data, these from the Champaign-Urbana ‘Mass Transit’ District:
This is a look at variance in the distribution of delayed buses throughout the day. It’s only looking at off-schedule buses right now so we aren’t seeing any change in the proportion of precisely on-time buses (if there is any such change). The little clock in the top right is the time of day the event was recorded, and right below that is the number of events used to ascertain the momentary distribution. For now, this sample size is as much a reflection of when I was running my computer as it is of schedule frequency.
I assumed that the first and last percentiles of the overall distribution were outliers and clipped them off.
I don’t actually see much going on here except random fluctuation, but I suspect this will get much more interesting with larger samples and many more and more diverse agencies included. I’m actually quite eager to see what that turns up! Right now, I’m working on developing a tool to query APIs from the Toronto Transit Commission and Philly’s SEPTA. Unfortunately, because neither Cincinnati transit agency is willing to share their data, which they’ve been collecting and using for years, it will be impossible to include them in this sort of analysis.