That last post was fun. Here’s some more :-)
All data is from current(April 2014) public GTFS feeds listed here. I calculated the straight-line distance, in feet, using the appropriate US state-plane projections, from each and every stop on each line to the next scheduled stop in that line. I then weighted each segment distance thus created by the number of times the agency traverses that segment each week. Weekday trips count 5 times Saturday-only trips, etc. That should give us something like the average person’s experience of the distance to the next stop, if people are evenly distributed across vehicles. The chart was plotted in R using the density() function, output to SVG, and then tweaked and reoriented in Inkscape.
Draw your own conclusions!
EDIT: Some useful SQL code using the basic table structure from GTFS; I simply imported the calendar.txt, trips.txt, stop_times.txt & stops.txt from each agency’s feed into a POSTGIS DB and ran a spatial query on that. Here’s the basic procedure for Muni:
--create the whole table, even though we only need a few things --it's easier than editing the CSV CREATE TABLE sfmta_calendar ( service_id varchar, monday integer, tuesday integer, wednesday integer, thursday integer, friday integer, saturday integer, sunday integer, start_date varchar, end_date varchar ); CREATE TABLE sfmta_trips ( route_id varchar, service_id varchar, trip_id varchar, trip_headsign varchar, direction_id varchar, block_id varchar, shape_id varchar ); CREATE TABLE sfmta_stop_times ( trip_id varchar, arrival_time varchar, departure_time varchar, stop_id varchar, stop_sequence integer, headsign varchar, pickup_type varchar, drop_off_type varchar, dist_travelled varchar ); CREATE TABLE sfmta_stops ( stop_id varchar, --stop_code varchar, stop_name varchar, stop_desc varchar, stop_lat varchar, stop_lon varchar, zone_id varchar, url varchar ); -- bring in the data -- mind you get rid of the headers first COPY sfmta_calendar FROM '/home/nate/calendar.txt' DELIMITER ',' CSV; COPY sfmta_trips FROM '/home/nate/trips.txt' DELIMITER ',' CSV; COPY sfmta_stop_times FROM '/home/nate/stop_times.txt' DELIMITER ',' CSV; COPY sfmta_stops FROM '/home/nate/stops.txt' DELIMITER ',' CSV; -- add a geometry column ALTER TABLE sfmta_stops ADD COLUMN the_geom geometry(POINT,3494); UPDATE sfmta_stops SET the_geom = ST_Transform( ST_GeomFromText('POINT('|| stop_lon ||' '|| stop_lat ||')', 4326) ,3494 ); --add columns which we'll update with the values we're actually interested in ALTER TABLE sfmta_stop_times ADD COLUMN weight integer, ADD COLUMN next_stop real; --run the spatial query --first get each stop paired up with the next one down the line. --this should return the total number of records in the stop_times -- table minus the number of records in the trips table --(each trip has one final stop) WITH temp AS ( SELECT t.route_id, t.trip_id, t.service_id, st.stop_sequence, s.the_geom FROM sfmta_trips AS t JOIN sfmta_stop_times AS st ON t.trip_id = st.trip_id JOIN sfmta_stops AS s ON st.stop_id = s.stop_id), --now get a table of weights --most agencies use the same schedule for all week days and --we don't want to over-emphasize weekend-only services --each day column simply has a true/false binary value, which we've treated as an integer weights AS ( SELECT service_id, (monday+tuesday+wednesday+thursday+friday+saturday+sunday) AS weight FROM sfmta_calendar ) --join all that shit and calculate the distance from each stop to the next UPDATE sfmta_stop_times SET next_stop = t1.the_geom <-> t2.the_geom, weight = weights.weight FROM temp AS t1 JOIN temp AS t2 ON t1.stop_sequence = t2.stop_sequence + 1 AND t1.trip_id = t2.trip_id AND t1.route_id = t2.route_id JOIN weights ON weights.service_id = t1.service_id WHERE t1.trip_id = sfmta_stop_times.trip_id AND t1.stop_sequence = sfmta_stop_times.stop_sequence; -- and export the data COPY ( SELECT next_stop, weight FROM sfmta_stop_times WHERE next_stop > 0 ) TO '/home/nate/sfmta.csv' DELIMITER ',' CSV HEADER;
And then it’s on to R!
I assume “frequency” means number of occurrences rather than route headway. This is great stuff man.
Yep: Statistical ‘frequency’. Thanks!