For the last few months I’ve been scraping SORTA’s real-time GTFS data for vehicle positions. I don’t really have any plans for what to do with all of it, but it’s easy for me to collect and I figure someone else may have a use for it somewhere down the road. This could eventually be a very interesting dataset for looking at e.g. changes in on-time performance, traffic congestion, bunching, etc.
Essentially, for each vehicle in operation at a given moment, I’ve been storing its:
- vehicle_id: vehicle ID given by the API
- trip_id: trip_id given by the API. I believe this corresponds to the trip_id in the GTFS package for the corresponding period.
- report_time: the timestamp field, per vehicle, given by the API. This is stored as Greenwich time, without a time zone, so you need to subtract a few hours to get to local time.
- location: I’ve been storing everything in a PostGIS database, and the location datatype in this case is geometry(POINT,4326). Postres dumps this as a hexadecimal string.
The API updates all vehicle locations every 30 seconds and I’ve been requesting updates every 25 seconds and ignoring duplicates, so I should have all of the data on vehicle positions that have been made publicly available. I’ve tried to keep my script running steadily, but there have inevitably been a few interruptions as the postgresql server has been restarted, etc, so there may be some big gaps. Where there is any data, it should be complete; It just may skip out for a day or two. The earliest date I have is 2017-05-16 17:59:36.
Anyway, here is the script I’ve been using along with ancillary files:
and a compressed SQL dump of the PostGIS DB:
The script is still running, so if you find this post in a year, hit me up for some fresher data!