Stop Dispersion, Inter-agency Comparison

April 22nd, 2014


That last post was fun. Here’s some more :-)

All data is from current(April 2014) public GTFS feeds listed here. I calculated the straight-line distance, in feet, using the appropriate US state-plane projections, from each and every stop on each line to the next scheduled stop in that line. I then weighted each segment distance thus created by the number of times the agency traverses that segment each week. Weekday trips count 5 times Saturday-only trips, etc. That should give us something like the average person’s experience of the distance to the next stop, if people are evenly distributed across vehicles. The chart was plotted in R using the density() function, output to SVG, and then tweaked and reoriented in Inkscape.

Draw your own conclusions!

EDIT: Some useful SQL code using the basic table structure from GTFS; I simply imported the calendar.txt, trips.txt, stop_times.txt & stops.txt from each agency’s feed into a POSTGIS DB and ran a spatial query on that. Here’s the basic procedure for Muni:

--create the whole table, even though we only need a few things
--it's easier than editing the CSV
CREATE TABLE sfmta_calendar (
	service_id varchar,
	monday integer,
	tuesday integer,
	wednesday integer,
	thursday integer,
	friday integer,
	saturday integer,
	sunday integer,
	start_date varchar,
	end_date varchar
CREATE TABLE sfmta_trips (
	route_id varchar,
	service_id varchar,
	trip_id varchar,
	trip_headsign varchar,
	direction_id varchar,
	block_id varchar,
	shape_id varchar
CREATE TABLE sfmta_stop_times (
	trip_id varchar,
	arrival_time varchar,
	departure_time varchar,
	stop_id varchar,
	stop_sequence integer,
	headsign varchar,
	pickup_type varchar,
	drop_off_type varchar,
	dist_travelled varchar
CREATE TABLE sfmta_stops (
	stop_id varchar,
	--stop_code varchar,
	stop_name varchar,
	stop_desc varchar,
	stop_lat varchar,
	stop_lon varchar,
	zone_id varchar,
	url varchar

-- bring in the data
-- mind you get rid of the headers first
COPY sfmta_calendar FROM '/home/nate/calendar.txt' DELIMITER ',' CSV;
COPY sfmta_trips FROM '/home/nate/trips.txt' DELIMITER ',' CSV;
COPY sfmta_stop_times FROM '/home/nate/stop_times.txt' DELIMITER ',' CSV;
COPY sfmta_stops FROM '/home/nate/stops.txt' DELIMITER ',' CSV;

-- add a geometry column
ALTER TABLE sfmta_stops ADD COLUMN the_geom geometry(POINT,3494);
UPDATE sfmta_stops SET the_geom = 
		ST_GeomFromText('POINT('|| stop_lon ||' '|| stop_lat ||')', 4326)
--add columns which we'll update with the values we're actually interested in
ALTER TABLE sfmta_stop_times
ADD COLUMN weight integer,
ADD COLUMN next_stop real;

--run the spatial query
--first get each stop paired up with the next one down the line.
--this should return the total number of records in the stop_times
-- table minus the number of records in the trips table
--(each trip has one final stop)
WITH temp AS (
	FROM sfmta_trips AS t 
	JOIN sfmta_stop_times AS st ON t.trip_id = st.trip_id
	JOIN sfmta_stops AS s ON st.stop_id = s.stop_id),
--now get a table of weights
--most agencies use the same schedule for all week days and 
--we don't want to over-emphasize weekend-only services
--each day column simply has a true/false binary value, which we've treated as an integer
weights AS (
		(monday+tuesday+wednesday+thursday+friday+saturday+sunday) AS weight
	FROM sfmta_calendar
--join all that shit and calculate the distance from each stop to the next
UPDATE sfmta_stop_times SET 
	next_stop = t1.the_geom <-> t2.the_geom,
	weight = weights.weight
FROM temp AS t1
JOIN temp AS t2 ON 
	t1.stop_sequence = t2.stop_sequence + 1
	AND t1.trip_id = t2.trip_id
	AND t1.route_id = t2.route_id
JOIN weights ON weights.service_id = t1.service_id
	t1.trip_id = sfmta_stop_times.trip_id AND
	t1.stop_sequence = sfmta_stop_times.stop_sequence;

-- and export the data
	FROM sfmta_stop_times
	WHERE next_stop > 0
) TO '/home/nate/sfmta.csv' DELIMITER ',' CSV HEADER;

And then it’s on to R!

2 responses to “Stop Dispersion, Inter-agency Comparison”

  1. Tim Bender says:

    I assume “frequency” means number of occurrences rather than route headway. This is great stuff man.