Here are some early results from my efforts to track transit vehicles, these ones in Toronto:
It looks like a crude system map, and it is, but it’s actually made of thousands of vehicle GPS tracks.
The vehicle tracks are oddly pretty; I keep wasting time just zooming in on different spots as new tracks are added. Here’s a transfer point at one of the subway stations:
Bus stops shown as ~20m circles
And what looks like perhaps a train station or a bus garage, below. This image also shows the relative frequency of service on different streets, something that becomes quite visible in the data when the lines are given a high degree of transparency.
Buildings for scale
And here’s an expressway carrying some limited-stop services:
As of now, after just a week of erratic development and testing, I’ve collected ~60,000 unique tracks, representing ~160 scheduled lines, derived from 3,200,000+ vehicle location records. About 50 new vehicle locations come in each second that I have my little script running.
Here’s what I’m doing so far, described algorithmically here, and implemented in a Python script:
- Request updated vehicle locations from the API every five or six seconds. The API only sends the ones which have updated since my last request, which ends up being between 200 and 600 depending on the day and time of day. There seem to be between 500 and 1500 vehicles operating at any given time in Toronto, so I’m seeing maybe a 10 second update from each on average. It lets me do this for the whole agency at once, which is slightly surprising.
- Keep tabs on each vehicle by it’s given ID number, and begin building a track for each vehicle by putting points in order of their appearance.
- If a vehicle fails to update within 60 seconds, gets a new route assignment, or a new direction identifier on the same route, I start it a new track and insert the old one into the database.
- Tracks shorter than 5 points or 500m or 2 minutes, or some other arbitrary amount can then be ignored or dropped.
A track is a set of ordered points, each point with a position and a time. The next step is to line the tracks up with the stop segments to which they’re scheduled, and if they’re actually close and the direction matches, to calculate stop times and segment durations from the observations. That’s actually turning out to be pretty difficult, but I’m sure I’ll crack it fairly soon. One thing I’ll have to seriously consider as I’m doing this is error in the location reports.
As the first image and the one immediately above show, there is significant error in the data, particularly downtown where tall buildings are presumably interfering with GPS signal reception.