This is a histogram showing the distance from every stop on every line in each direction to it’s nearest neighbor on the same line in the same direction. That’s the complicated way of saying: How far is every bus stop from it’s nearest neighbor?
Some basic stats:
Minimum = 56 feet
Maximum = 66,125 feet (12.52 miles)
Mean = 589 feet
Median = 476 feet
That massive 12-mile outlier is the 82X, which seems to have only one stop at it’s terminus in Eastgate after picking people up downtown. The next largest value is 8,484 feet for a pair of stops on the 30X.
I think it would be interesting to see how this distribution compares to TANK and some agencies in other cities…though that analysis will have to wait until after exam week. Unless anyone cares to get a head start on me! It wouldn’t be hard to do using GTFS data and the following code.
POSTGIS SQL code:
-- how far away is the nearest stop in this line and direction?
WITH stop_matrix AS (
SELECT
a.line,
a.stop_id AS s1,
b.stop_id AS s2,
a.direction,
a.the_geom <-> b.the_geom AS dist -- geometry unit is feet(EPSG:3735)
FROM
stops_table AS a,
stops_table AS b
WHERE a.line = b.line AND a.direction = b.direction ),
SELECT
s1,
line,
direction,
MIN(dist) AS mindist
FROM stop_matrix
WHERE s1 != s2 -- or else we'll get zeros
GROUP BY s1, line, direction ; -- aggregate at the level of stops
It’d be neat to see what San Francisco’s Muni looks like. I think as part of their transit effectiveness program they’ve been reducing the spaces between stops, because no one wants to ride a bus that stops every two blocks all the way across town…
Ask and ye shall receive ;-)
Absolutely amazing. It would be interesting as well to see how far stop x is from x+1 in minutes; you have the data in the feed.
It would be interesting to see how well correlated time-distance is to space distance when looking at unique segments. It could also be interesting to regress time-distance on space-distance and deviate the actual from the expected values…the residuals might then make an interesting map…if the results were beyond the obvious. But knowing SORTA’s schedule data at least, this might not prove really interesting until real-time data is available. If I recall, most of the stops between timepoints looked like pretty rough estimates (many values rounded to whole minutes)
Perhaps I might compare the distribution of errors across agencies, giving a comparative distribution of speeds relative to some agency mean. That could give an interesting look at the ratio of local to express services/limited services.
… ah, data :-)