I’m excited to announce that I’ll be spending part of this summer working with Daniel Schleith and Brad Thomas on a rather exciting project… Thanks to a grant from People’s Liberty, we’re going to have the opportunity to develop what I believe will be the first application to make use of SORTA’s recently released real-time data.
The goal of the project is to get real-time arrival displays into businesses along major transit lines. These will be privately owned and operated computer displays that ingest real-time data through the interwebs and display localized arrival predictions for nearby stops. We’ll be developing a display/app1, and subsidizing the purchase of tablet computers which can then be mounted behind a bar, in a shop window, near the door of the coffee shop, etc.
I’m sure I’ll have lots more to say here on the topic in the near future, but for now, I leave you with hope only.
The data is structured according to the GTFS real-time specification. I was able to parse it pretty easily in Python by following the instructions on that page. The fields currently included in the feed (many are optional in the specification) are as follows.
The feeds update every 30 seconds, which seems a little slow, but oh well.
Right now, my understanding is that these feeds have been tentatively released as-is for developers only, and that SORTA is not ready yet to make a general public announcement that real-time data is available. Tim Harrington at SORTA, who shared the links with us, has politely asked to see the neat stuff that we’re able to develop with this data. I imagine that the sooner someone sends him a link to a decent, working app, the sooner they’ll give us the go-ahead and the sooner we’ll all be able to use this data in every-day situations.
So who’s gonna make an app? There must be a dozen open-source applications that are already designed to work with GTFS-realtime. We probably just need to plug this feed in and maybe make a few localization tweaks. If you or anyone you know has the skills and/or interest to make an app…then for the love of transit, let’s make this happen ASAP!
A wee nit to pick from SORTA’s recent “State of Metro'” dog and pony show:
I distinctly remember one of the speakers saying something like ‘and all this without raising fares!’, and this to my feeble memory reeked of bullshit, so I found the numbers again and ran them to see if I was remembering correctly. I was indeed.
Here are the facts, as reported by SORTA to the Federal Transit Administration. Over the period where we have data on both fare revenues and ridership (currently 2002 to 2012) SORTA has been steadily getting more money from fare revenues while moving fewer passengers. We are currently at the nadir of this trend, with
More fare revenue than ever
Fewer passenger trips than ever
When SORTA says by the way that they are a ‘most efficient’ agency, a title pinned on them by the laughably unscientific UC Economics Center, it is precisely this measure they have in mind. There is hardly a better example of doublespeak to be found. Here’s the trend:
In order to plot both agencies together, I normalized fares and passenger trips to the same range. The scale is linear.
Now you may rightly note that the standard fare for a zone 1 trip hasn’t changed lately. But that’s not the only kind of fare that can be paid. It might not even be the most common! I don’t know for certain. I haven’t personally paid standard fare in quite a while because my transit use is partly subsidized by UC. So for example, the fare revenue variable in this data almost certainly includes UC’s cash subsidy for my fare as well as the dollar I put in myself. Multiply that by the dozens of private fare subsidies each agency probably negotiates (or drops) each year and you get a more dynamic picture. Fare could also be effected, though probably isn’t, by people using transit cards more or less, while paying the same monthly price.
But anyway, I’ll be damned if f the total price paid by riders or their agents, on a per-trip basis doesn’t constitute a better definition of ‘fare’ than SORTA’s standard zone-1 single-segment price. And by that definition, fares have risen from $0.76 in 2002 to $1.78 in 2012 (+134%). For TANK, the change is from $0.72 in 2002 to $1.16 in 2012 (+60%). Adjusting for inflation, the changes are 84% and 26% respectively. So much for SORTA’s unchanging fares theory lie.
I’ll end with an ineffectual plea to the people at SORTA. Please, understand that when you speak in lies and euphemisms, no matter how nice your breakfast spread, you turn off clever people and retain only the idiots and the cynical. People from all three of these categories vote, to be sure, but I know who I’d rather spend my time with. And I know who could build the better transit system.
I’ve started working with Champaign-Urbana’s real-time departure API. Right now, I’m using a little Python script to send requests and store them in a local PostgreSQL database. Below is a probability density plot from the first 1,000 or so data points I’ve pulled down. It’s only from the weekday mornings when I’ve run the script, mostly from the 150 stops I queried just a moment ago.
But it looks like my prediction (see the earlier post) may not have been terribly far from the mark.
The next step is to determine a programmatic way to randomly query particular arrivals to make sure I avoid any systematic error in the sampling. This is necessary because I’m limited to 1,000 API calls per day and can’t just hammer their server with requests for every scheduled arrival.
I’m also recording location attributes for all these records so I’ll be able to do some spatial analysis too :-)
I’m about to start digging into various real-time data feeds for American (bus) transit systems. For the most part right now I’m interested in finding a simple, average distribution of lateness/earliness across all stops, the idea being that this could help riders predict, without live real-time feeds, when the bus is most likely to show up, by looking only at a fixed schedule.
Are buses more likely to be late than early? What percentage of buses are early, anyway? If it’s already five minutes late, is it very likely it’s coming in the next minute? Or should you start walking? What’s the difference in tardiness distributions between frequent and infrequent services? Are there types of places in a city which have consistently different distributions?
In the name of science, I’d like to make a prediction, ie. state my hypothesis, before I’ve collected any actual data. So here it is:
I think that overall the distribution will have a strong late skew, a very short early tail, and a wide second hump around the time a second bus might start bunching up on the one in question. I’ll guess that between 10% and 20% of buses running on fixed schedules will be at least a few seconds early and that the median will be about 2 minutes late.
Now…anyone want to suggest a city with a real-time feed? I have my eye set on Portland at the moment but only because I’m have trouble finding decent APIs.