Discovering the Space-Time Dimensions of Schedule Padding and Delay from GTFS and Real-Time Transit Data

December 4th, 2014

My master’s thesis proposal, something I’ll be talking a lot about in the coming months:

The popular conflation of bus coach and railcar-based public transit lines with their typical relation to automotive traffic has caused much confusion in recent years. Though the superficial wheel may not much matter, the general public is right to sense a distinction in speed and reliability between transit services that operate in mixed traffic and those that are given priority over such traffic. As the public more and more aggressively demands rail-based transit services, these should be read as demands for increased speed and reliability (among many other things) and planners should respond by modifying existing services to meet these implicit demands.

Speed and reliability are a function in large part of the potential delays a line encounters along its course. Potential delay, or random delay results from events that cannot be precisely planned for such as automotive, pedestrian or bicycle traffic, flag-stop passenger boardings and alightings, traffic signals, and bus wheelchair boardings and securements. Scheduled delay, also known as schedule padding is delay that is built into scheduled transit services that allows them to be tolerant of regular disruptions and unscheduled delays by conceding the average effects of such delays in advance. Agencies try to strike a balance between heavily padded (and thus slow) schedules and the disruptions of extra unscheduled delay to create schedules that are neither too slow nor too often late. While the public often reacts negatively to significantly late vehicles they are typically unaware of schedule padding though both are dependent on the same environmental factors.

Since the politically active public, and not transit schedulers, are in control of policy direction in most cities, it becomes important to explain delay and its causes and effects to a lay audience and thereby to direct them toward potentially fruitful responses. Further, since funds for radical infrastructure interventions are difficult to find in the current political regime, attention should be focused on potential incremental improvements to the surface-running bus lines which constitute the vast majority of all transit.

Toward a Solution:
Where can the smallest new delay-avoidance technique create the biggest potential improvement in speed and reliability for existing services?
This thesis proposes a technique for exposing and visualizing the spatial and temporal locations of random delay and schedule padding. Implicitly, this should reveal the space-time locations where transit is running more slowly and less reliably than it might if it could avoid delay, and suggest times and places where delay-avoidance techniques such as designated rights of way for transit could have the biggest impact. The focus will not be on any particular line or city but will try to demonstrate the possibility for and usefulness of the technique in a variety of different circumstances.

There will be two ways of identifying delay. The first will be to analyze the agency’s General Transit Feed Specification (GTFS) schedule data. For each scheduled trip segment ( stop A → stop B ), the minimum time to complete that segment in any trip will be identified and considered as the baseline for that segment. Assuming this gives reasonable values, any time another trip spends completing that same segment beyond the minimum will be considered to be schedule padding.

The second method will take a large sample of representative data from a set of three or four public ‘real-time’ transit APIs from the same agencies. Requests to these APIs will be made with a Python script which will process the results and store them in a PostGIS database for later analysis. These real-time data will be compared to the GTFS schedule data for the same period. The basic task in this method will be to identify temporal vehicle trajectories by following particular vehicles along each line as they become more or less displaced from their schedule. These data will be used to:

  1. Relocate the ideal speed that determines the extent of schedule padding in a given segment by looking at the best reasonable, observed speeds of late-running-but-catching-up vehicles, in each (spatial) trip segment.
  2. Identify excess random delay by identifying segments where vehicles became late or were becoming later relative to their (padded) schedules. Determine the amount by which they were delayed beyond their schedule time in these segments and look for non-random unscheduled delay.

Since the API’s vehicle location and arrival estimate reporting is fairly discontinuous (these might be updated every 5-30 seconds), arrival times will be interpolated along the length of a trip.

Results will be mapped both by line and for whole transit systems. Depending on temporal variability, these maps may or may not include a distinct temporal component. The maps must be engaging, attractive and informative to a general audience. They must invite exploration and be able to explain their complex subject without reference to the main text of the thesis.

Potential Problems:
One problem with this thesis is that it will not fully distinguish between lines that make every stop and those that operate by request only as almost all bus lines do. This is an important point because we risk finding the maximum (theoretically unpadded) speeds for a segment at moments when there are few or no passengers boarding or alighting. It would be a mistake to assume that all or most delay is due to street traffic without trying to measure the effect of passenger boardings, particularly for bunched vehicles. Passenger boarding is not a thing to be avoided; it is however possible to reduce delay caused by boardings with infrastructure changes such as increased stop spacing, pre-payment fare systems, and multi-door boarding. It is therefor not totally unreasonable to consider the effects of passenger boarding on flag-stop services as a source of delay.

Basically this thesis is after a systematic and effective way of identifying, measuring and displaying the effect of choke-points in existing scheduled transit services. It does this by analyzing publicly available data to identify variance in operating speeds through space and time. It assumes that the fastest reasonable observed speeds are undelayed and that slower trips are delayed in one of several ways. It measures and displays this delay and suggests potential causes and interventions for certain types of delay scenarios.

Transit delay has been studied extensively, but this thesis is novel in it’s focus on spatio-temporal description, it’s emphasis on schedule padding as a source of avoidable delay, and it’s use of cartographic technique to display the results.

2 responses to “Discovering the Space-Time Dimensions of Schedule Padding and Delay from GTFS and Real-Time Transit Data”

  1. Sounds like a great project Nate. There are only few (but growing) number of studies analyzing GTFS data and this comparison between GTFS schedules against realt-time GPS data sounds like a real contribution. I will be working with GTFS data to analyze the transport system in Rio de Janeiro (Brazil), more focused on accessibility issues, though. I’m looking forward to hearing more from your work.

    BTW, you might like this post, despite the title being a bit misleading

  2. […] are some early results from my efforts to track transit vehicles, these ones in […]