Spotted this map while out and about yesterday (…and quickly appropriated it for the geography grad office).
People I don’t know making maps?? WHHAAAAT? How exciting! And screen-printed no less.
As everyone may or may not know, there’s no obvious and standard ontology in the world of cartography and map data, meaning that each dataset has its own way of describing and encoding the objects of the world. Its own familiar quirks. I was able to pick this out as OSM data by the too-thick blotches in the railyard, where I remembered I had failed to properly classify some of the shorter segments.
Just another way of looking at speed distributions. This shows the distribution of observed speeds across several thousand route segments by percentile of the speeds on each segment. The animation changes as we’re shown different parts of the speed distribution on each segment.
Not totally sure if it’s useful yet, but it is kind of pretty.
I think I’ve finally perfected my method for linking real-time data with scheduled stops. This is a comparison of the average (weekly) scheduled speeds to the observed average speed for each stop->stop segment. Results that look roughly as expected are what we all hope for.
Note that each classification is broken into eight equal sized quantiles
There is a lot of information in that little gif! More than I can explain here. More to come…
Higher resolution here by the way. It’s interesting to look at even if you don’t know Toronto. Also, the line widths are determined by the number of trips scheduled for each segment.
The data is structured according to the GTFS real-time specification. I was able to parse it pretty easily in Python by following the instructions on that page. The fields currently included in the feed (many are optional in the specification) are as follows.
The feeds update every 30 seconds, which seems a little slow, but oh well.
Right now, my understanding is that these feeds have been tentatively released as-is for developers only, and that SORTA is not ready yet to make a general public announcement that real-time data is available. Tim Harrington at SORTA, who shared the links with us, has politely asked to see the neat stuff that we’re able to develop with this data. I imagine that the sooner someone sends him a link to a decent, working app, the sooner they’ll give us the go-ahead and the sooner we’ll all be able to use this data in every-day situations.
So who’s gonna make an app? There must be a dozen open-source applications that are already designed to work with GTFS-realtime. We probably just need to plug this feed in and maybe make a few localization tweaks. If you or anyone you know has the skills and/or interest to make an app…then for the love of transit, let’s make this happen ASAP!
A wee nit to pick from SORTA’s recent “State of Metro'” dog and pony show:
I distinctly remember one of the speakers saying something like ‘and all this without raising fares!’, and this to my feeble memory reeked of bullshit, so I found the numbers again and ran them to see if I was remembering correctly. I was indeed.
Here are the facts, as reported by SORTA to the Federal Transit Administration. Over the period where we have data on both fare revenues and ridership (currently 2002 to 2012) SORTA has been steadily getting more money from fare revenues while moving fewer passengers. We are currently at the nadir of this trend, with
More fare revenue than ever
Fewer passenger trips than ever
When SORTA says by the way that they are a ‘most efficient’ agency, a title pinned on them by the laughably unscientific UC Economics Center, it is precisely this measure they have in mind. There is hardly a better example of doublespeak to be found. Here’s the trend:
In order to plot both agencies together, I normalized fares and passenger trips to the same range. The scale is linear.
Now you may rightly note that the standard fare for a zone 1 trip hasn’t changed lately. But that’s not the only kind of fare that can be paid. It might not even be the most common! I don’t know for certain. I haven’t personally paid standard fare in quite a while because my transit use is partly subsidized by UC. So for example, the fare revenue variable in this data almost certainly includes UC’s cash subsidy for my fare as well as the dollar I put in myself. Multiply that by the dozens of private fare subsidies each agency probably negotiates (or drops) each year and you get a more dynamic picture. Fare could also be effected, though probably isn’t, by people using transit cards more or less, while paying the same monthly price.
But anyway, I’ll be damned if f the total price paid by riders or their agents, on a per-trip basis doesn’t constitute a better definition of ‘fare’ than SORTA’s standard zone-1 single-segment price. And by that definition, fares have risen from $0.76 in 2002 to $1.78 in 2012 (+134%). For TANK, the change is from $0.72 in 2002 to $1.16 in 2012 (+60%). Adjusting for inflation, the changes are 84% and 26% respectively. So much for SORTA’s unchanging fares theory lie.
I’ll end with an ineffectual plea to the people at SORTA. Please, understand that when you speak in lies and euphemisms, no matter how nice your breakfast spread, you turn off clever people and retain only the idiots and the cynical. People from all three of these categories vote, to be sure, but I know who I’d rather spend my time with. And I know who could build the better transit system.
It looks like a crude system map, and it is, but it’s actually made of thousands of vehicle GPS tracks.
The vehicle tracks are oddly pretty; I keep wasting time just zooming in on different spots as new tracks are added. Here’s a transfer point at one of the subway stations:
Bus stops shown as ~20m circles
And what looks like perhaps a train station or a bus garage, below. This image also shows the relative frequency of service on different streets, something that becomes quite visible in the data when the lines are given a high degree of transparency.
Buildings for scale
And here’s an expressway carrying some limited-stop services:
As of now, after just a week of erratic development and testing, I’ve collected ~60,000 unique tracks, representing ~160 scheduled lines, derived from 3,200,000+ vehicle location records. About 50 new vehicle locations come in each second that I have my little script running.
Here’s what I’m doing so far, described algorithmically here, and implemented in a Python script:
Request updated vehicle locations from the API every five or six seconds. The API only sends the ones which have updated since my last request, which ends up being between 200 and 600 depending on the day and time of day. There seem to be between 500 and 1500 vehicles operating at any given time in Toronto, so I’m seeing maybe a 10 second update from each on average. It lets me do this for the whole agency at once, which is slightly surprising.
Keep tabs on each vehicle by it’s given ID number, and begin building a track for each vehicle by putting points in order of their appearance.
If a vehicle fails to update within 60 seconds, gets a new route assignment, or a new direction identifier on the same route, I start it a new track and insert the old one into the database.
Tracks shorter than 5 points or 500m or 2 minutes, or some other arbitrary amount can then be ignored or dropped.
A track is a set of ordered points, each point with a position and a time. The next step is to line the tracks up with the stop segments to which they’re scheduled, and if they’re actually close and the direction matches, to calculate stop times and segment durations from the observations. That’s actually turning out to be pretty difficult, but I’m sure I’ll crack it fairly soon. One thing I’ll have to seriously consider as I’m doing this is error in the location reports.
As the first image and the one immediately above show, there is significant error in the data, particularly downtown where tall buildings are presumably interfering with GPS signal reception.