In case anyone is interested, I’ve begun storing data from SORTA’s GTFS-realtime feed. I started the script today and I see no reason to stop it unless I start running out of space on my server. I’ve got GTFS trip_id’s, vehicle_id’s, locations and timestamps for every vehicle location update from here until infinity.
Let me know in the comments if you happen to be interested and I’ll find a way to share the data!
I’m excited to announce that I’ll be spending part of this summer working with Daniel Schleith and Brad Thomas on a rather exciting project… Thanks to a grant from People’s Liberty, we’re going to have the opportunity to develop what I believe will be the first application to make use of SORTA’s recently released real-time data.
The goal of the project is to get real-time arrival displays into businesses along major transit lines. These will be privately owned and operated computer displays that ingest real-time data through the interwebs and display localized arrival predictions for nearby stops. We’ll be developing a display/app1, and subsidizing the purchase of tablet computers which can then be mounted behind a bar, in a shop window, near the door of the coffee shop, etc.
I’m sure I’ll have lots more to say here on the topic in the near future, but for now, I leave you with hope only.
Thinking about it, it’s actually kind of odd that I hadn’t tried animating GTFS data before. I certainly wouldn’t be the first to have tried it.
The videos above are pretty simple. The stops are clustered into a reduced number of nodes and the system is simplified into a graph. Edges are drawn with thickness according to the number of trips scheduled for each frame. Each frame is a 15 minute span and with 10 frames per second we traverse a three-day period in ~29 seconds. The three days are the distinct service patterns, weekday, Saturday and Sunday.
Color! I need to improve the color foremost among many things, but here the color is white where schedule padding is minimal, and saturated where maximal. Since the padding values as I’ve calculated them here have a strong positive skew, the above video uses the square root of the actual value for the coloring. The two videos below try a linear and a log2 scale in that order.
Padding is calculated as ( the difference between the fastest scheduled time for a segment and the actual scheduled time ) divided by the straightline length of the edge. This gives me something like the amount of schedule padding added to the schedule per KM, roughly a metric for anticipated congestion. It’s (currently at least) normalized by the edge length rather than actual travel distance to maintain a proportional visual emphasis for the graph representation.
Dear lord this was a technical post. Here’s a fun little thing to look for though: turn the videos up to 1080p, and you can start to see what looks like peristalsis in the busier *ahem* corridors.
Also interesting to note is the absence of the subways. Because the buses make so many connections to the subways, you can clearly see when and where they are operating. Did you notice the two big lines that seem to spring up in the late hours? My guess, without looking, is that these are parallel bus substitutes for the subways after they stop running. Once the subways start running, those corridors become conspicuous by their emptiness.
I think I’ve finally perfected my method for linking real-time data with scheduled stops. This is a comparison of the average (weekly) scheduled speeds to the observed average speed for each stop->stop segment. Results that look roughly as expected are what we all hope for.
Note that each classification is broken into eight equal sized quantiles
There is a lot of information in that little gif! More than I can explain here. More to come…
Higher resolution here by the way. It’s interesting to look at even if you don’t know Toronto. Also, the line widths are determined by the number of trips scheduled for each segment.
The release of SORTA’s real-time location data has been delayed again, this time until April 20141. Originally scheduled for sometime around this past December, the system upgrade that’s necessary for the public release of the data was apparently tied in with capital funding for the streetcar.
That funding was of course delayed by shenanigans.
A reader just passed this article my way and I can hardly do a better job of explaining why real-time arrival data is important for growing ridership on our transit system. This recent cold-punch-in-the-face weather has emphasized, for me at least, just how long waiting can seem to take when the bus is nowhere in sight. The release of this data should be a major priority for both agencies which already have the necessary systems installed on most if not all vehicles and just need to get the appropriate back-end systems in place to handle web requests.
Here is a rather crude, though I think useful, visualization of service frequency at the stop level. Basically, I used the GTFS data from SORTA and TANK to calculate the number of times a bus stops at each stop every week. Since a week is the basic cycle period of transit(service is bad on Sunday, better on monday), this should give us a an idea of basic average frequency with the huge caveat that there’s enormous variation within each week.
Click the image to get a bigger version. There’s lot’s of interesting detail in there!
You may notice that frequency can appear vary in a single line where it doesn’t seem like it probably should:
In most cases, this is simply an artifact of the way I grouped stops that were next to each other and had exactly the same name. At least 2-3,000 stops of the 6,000 stops in the dataset can reasonably be thought of as pairs with one serving each direction of travel.