1 2 Boarding 3 Next Page 4 5

Updated Frequencies

Below is a visualization of weekly frequency for each of SORTA’s routes, derived from the just-released GTFS data. The area of each square is directly proportional to the number of trips a route will make each week after the August 18th, 2013 service changes. The #1 at the top will make a total  of 232 weekly trips. The m+ at the bottom will make 595. The route with the most weekly trips is the 43 with 1,289.1 I’ll let them talk for themselves now:

Visualisation of weekly frequency for SORTA's 2013 service changes

I’ve been meaning to write a really simple little graph-generating script for a while. I’ve been unhappily living with Excel and OpenOffice forever as has everyone, and their ability to make visualisations is just NOTHING compared to the simplicity of SVG combined with a language like PHP. Just a couple hours coding and a little time tweaking in Inkscape and boom! Coloured squares. Oh yeah.

*does a little dance*

I didn’t choose the colours by the way. Those were in the data in a field marked simply ‘route_colors’. Someone at SORTA picked them. Any colour theorists in the audience want to offer a psychoanalysis?

Show 1 footnote

  1. A trip here is somewhat loosely defined. In almost every case, it’s the journey from the far end of a route’s path to the other. The journey back to the starting point is another trip. I’m being a touch lazy and not looking at the actual shapes though. I know there have been instances in the past, though I can’t say for sure if there are any cases of it in this data, where for some technical reason a trip as just defined is split into two or more trips for the purpose of encoding it neatly in the data. If that’s happening here, it will distort the visualization and I won’t know any better. I simply haven’t looked. I’m tired and it’s my bed time. Plus, a quick glance over and it seems to pass the smell test. No huge outliers that I immediately notice.
Comments: Leave one?
Posted in: Data
Tags: | | | | | |

SORTA releases changed schedules

For all those anxiously wondering what exactly service will be like after SORTA’s service changes take effect this month, the answer arrived last night in the form of both PDF schedules and a new GTFS feed. Woot!!

SERVICE CHANGES: COMING AUGUST 18th 2013. Don’t get caught waiting for a bus that isn’t coming. Check your schedules.

In related news, I’ll should some more quantitative analysis of the new service plan here soon once I get around to it, including an updated frequency map to be available on the website. I was just waiting for the GTFS data for that one. And hopefully…some new analysis of ridership on a restructured system. Will more cross-town routes be reflected in relatively decreased boardings in downtown? I’m counting on SORTA to release that data from their automatic passenger counting system as soon as there’s enough time for the results to be statistically significant, perhaps around early to mid-September.

Comments: 7
Posted in: Data
Tags: | | |

How fast is the bus?

This fast:

Agency Line Avg Speed(MPH)
SORTA 82x 47.2
SORTA 52x 43.5
SORTA 42x 35.7
SORTA 67 35.4
SORTA 71 32.7
SORTA 75x 32.2
TANK 22X 31.5
TANK 2X 29.7
TANK 28X 29.4
TANK 29X 28.9
TANK 32X 28.2
SORTA 30x 27.4
TANK Gateway 27
SORTA 12x 25.8
SORTA 23 25.5
TANK 1X 24.9
SORTA 81x 24.6
SORTA 29x 24.2
TANK 30X 23.8
SORTA 28 23.6
SORTA 14x 23.1
SORTA 3 23.1
TANK 17X 23
SORTA 50 22.8
SORTA 74x 22.8
TANK 19X 22.4
TANK 18X 22
SORTA 2x 21.8
TANK 9 21.6
TANK 25X 21.5
TANK H5AM 21.1
SORTA 25x 20.5
TANK H2PM 20.4
SORTA 40x 19.8
SORTA 77x 18.7
SORTA 41 18.4
SORTA 20 17.4
SORTA 32 17.4
TANK H8AM 17.1
SORTA 15x 17
SORTA 24 16.9
TANK 1 16.8
TANK H9PM 16.7
TANK 33 16.7
TANK H1AM 16.5
TANK 25 16.5
TANK H1PM 16.2
TANK 11 16.2
SORTA 10 16.1
TANK ND2P 15.8
SORTA 38 15.6
TANK 3 15.4
TANK H3PM 15.3
TANK 23 15.2
SORTA 11 15.1
TANK 16 15.1
TANK NPIA 15
SORTA 78 15
SORTA 49 14.9
TANK NPI 14.8
SORTA 27 14.8
SORTA 19 14.7
SORTA 16 14.7
SORTA 85 14.5
SORTA 17 14.4
SORTA 6 14.3
SORTA 64 14.3
SORTA 39 14.2
TANK 7 14
SORTA 21 14
TANK 20 13.9
TANK H6PM 13.9
SORTA 43 13.9
TANK 5 13.9
SORTA 4 13.5
SORTA 31 13.2
SORTA 51 13.1
SORTA 33 13.1
SORTA 1 12.7
TANK 12 12.2
TANK H4PM 11.1
TANK H2AM 10.8
TANK Southbank 10.4
TANK H7PM 9.9
SORTA 46 9.6
TANK HTPM 7.9
TANK H5PM 6.6

Interesting stuff! And pretty close to a normal distribution. The “H…” lines are TANK’s school bus lines by the way.(EDIT: See the comments for a much better account of TANK naming conventions) And of course the speed varies by location and time of day. Everything will be slower downtown than on the highway.

I derived the numbers from the GTFS feeds from both agencies. All of the numbers look like they’re in the right ballpark, but I haven’t gone through this line by line to rule out anything weird like big scheduled layovers. I’ve also updated the “shapes from GTFS” file to include average speeds for the line. There should be a map coming soon…it will be interesting to see if I can use the data to identify chokepoints in the system and potentially how they vary throughout the day as traffic changes. Of course the data isn’t from actual observed speeds, just inferred from the length of route segments and their scheduled times throughout the week.

Transit agencies: I know your planning departments must do interesting statistical analyses of your routes internally…it would be awesome if you shared the results with the world!

Comments: 2
Posted in: Analysis | Data | Math
Tags: | | | | |

A guide to local transit and other geospatial data

A list of good, local geospatial data is one of those things that a few of y’all will find immensely useful and no one else will care about at all. If I haven’t already lost you, keep reading! I’m going to try to list all of the spatial data sources I’ve found useful along with some comments from my own experience as to how relevant and usable they are for the Cincinnati area.

Here’s a guess: if you’ve read this far, you’re a planning student from DAAP. If you’re not, I’d like to meet you! You’re a rare independently motivated Cincinnati area cartographer.1

1. One of the surprisingly unfortunate things about going to planning school at DAAP is that you get free and unfettered access to Cincinnati Area GIS (CAGIS) data. I say unfortunate because you’ll soon find yourself without it like a junky on the street coming down from a high. CAGIS, for those who aren’t yet familiar is a very thoroughly developed and wide-ranging set of geospatial data for Hamilton county. It’s run by the county and has a pretty large staff devoted to little more than collecting and updating their data all day. That means it’s extremely complete and accurate. The City of Cincinnati and County planners and engineers have access to it but that’s pretty much the end of it. A few large institutions like DAAP buy their annual access for more than you’ll ever get paid as a cartographer/planner. It has pretty much everything you can think of from sewer lines to parks to water, parcels, building footprints, etc… But it’s strictly limited to Hamilton county, meaning there are a lot of maps made that end unnecessarily at the river…

property value, including improvements, of Hamilton County Ohio in dollars per square feet

The county really wasn’t the most logical extent for a map of property values, but that’s the data I had to work with…

CAGIS’ actual, publicly funded, data sits on a big secret server somewhere that only the well funded have access to, but there’s also an online version for the masses. It’s sort of like visiting the data in prison though. There’s glass between the two of you and the best you can do is talk over the phone. This interface was made for people who needed to look up their parcel’s ID number or find their lot’s zoning, not for people interested in mapping or analysis. If you do manage to arrange a conjugal visit, there are people watching and you feel a little dirty when you try to explain why you want the data. It’s worth a try though. I’d recommend picking a phone number at random from their contact list and making a case for whatever you’re trying to do. They have shared data with me in the past, but I got it by telling someone I was still a planning student and coming by the office to pick up a CD. I’ve probably just ruined my chances for future data access.

Generally, I’ve found that CAGIS data(almost all shapefiles) is bloated with fields that don’t mean much without documentation that you probably can’t find. Government types work on these files and their need for documentation is extremely minimal since they’re not sharing them with anyone.

2. 180 degrees away we have my favourite data source, ideologically speaking at least. OpenStreetMap(OSM) is a wiki map of the world, and the Cincinnati area is surprisingly well developed relative to other American cities. I’ll let you play around with a slippy map here to see what I’m talking about.

OSM isn’t just a web map, it’s an actual data source(and receptacle). You could download the entire world if you had the time and computing capacity.2 I personally recommend using this site to extract the data for a smaller area . It will take a few minutes, but you get to select your own extent and can choose any of a number of formats. The most basic format is a .osm XML file. The format is completely extensible3 with the attributes of each object (points, lines, polygons or ‘relations’) stored in a theoretically unlimited number of  “key”=”value” pairs. The amount of data the format can store is limited only by the amount of data you care to put in. A given polygon for example might have only one tag, “building”=”yes”, meaning of course that it represents a building, or it could have attributes telling us it’s use-type, ownership, height, presence-of-basement, name, the date it was drawn, height above sea-level, original data source, facade material, and on to infinity. Also, everything is in one file which is quite refreshing if you’re used to working with multiple shapefiles. That’s right: bus lines, streets, bike shops, forests, subdivisions, pedestrian crossings, stop lights and cell towers are all in the same place.

So the format will let you do pretty much anything, but completeness is an issue with OSM as you might already have imagined for a wiki map. In the Cincinnati region:

Find something missing? You can add it yourself and make the data more complete for everyone! OSM data is most useful if you can import it into a PostGIS enabled database.

3. The US Geological Survey(USGS) is the go-to place for anything you could capture from a satellite. Don’t let the contour lines in CAGIS fool you; this is the original source! They’ve got aerial imagery, elevation, landcover, and a few other things you’ll find useful. In my experience, the most valuable data from the USGS are in such rasters. While they do have some vector data like the location of schools, public buildings and streets that’s not what I find myself going there for.

aerial photo of the queensgate yard

First you’ll use the interactive map to select the area you want, then it will prompt you to select from the available datasets. After a couple more steps, you’ll get an email with links to the data. There are a couple annoying things: first, for large rasters, they’ll break your download into pieces of ~70mb. Just a moment ago, I tried to download an aerial image of the downtown area and it broke it into 22 pieces which I would probably want to stitch back together into one file before using it in a project.
The other thing I’ve found frustrating is that for some datasets, our region seems to be sitting on a couple of data collection boundaries. Here for example high resolution elevation data is available for most of the city, but if you’re mapping the airport, you’ll have to settle for a significantly lower resolution that may show up pixelated at a large scale.

topographic map of ohio

For anyone interested in elevations, I’ve compiled the best available data (from 2009) into only two files available on the data page of this site.

4. The most exact data we have on current transit services is in the General Transit Feed Specification(GTFS) files released by both TANK and SORTA. This data is what transit agencies provide to Google for use in their transit trip planning service. From it you could theoretically find the scheduled location of any vehicle at any time of the day or week. The data isn’t directly usable in a GIS software. It consists of a number of related CSV tables that need to be joined to each other before useful data can be extracted. That means you’ll need to be passingly familiar with SQL and databases before you’ll be able to make much of it. However once you get going with it, you can find anything that could be derived from schedule information, including speeds, headways, frequency, span of service, distance covered, route variance and more. To save the newbs a little bit of trouble, I’ve extracted all of the different lines into a more directly usable format:

map of cincinnati transit

A rendering of my derived GTFS shapes file

That file is on the data page too.

5. Everyone is probably already familiar enough with the US Census Bureau‘s demographic data, but they’re also a great source for boundaries like cities, school districts, counties, states, congressional districts and more. Some people might tell you to use the shapefiles they provide for streets or railways, but for the Cincinnati area at least, I’d advise anyone to use OSM instead. Most of the OSM streets were originally derived from the census bureau data but they’ve since been thoroughly validated, cleaned up and updated by OSM users. The Census bureau excels at boundaries and demographics.

6. SORTA’s stop-level ridership data is a goldmine of information on the way people actually use the transit system. The only place it’s currently published is right here on the data page. Basically, each bus has a monitoring system that counts the number of people boarding and de-boarding at each and every stop throughout the day. Those numbers are aggregated by line and by stop for each day. SORTA collects this data and more every single day they operate but they don’t publish it. TANK seems to have recently implemented a similar monitoring system, so we should soon be able to look forward to ridership data for the whole system.

I’ve done a little analysis of SORTA’s data already, so you can check that out here and here to get an idea of how this dataset could be useful. If you care to pursue it, I would say that it’s worthwhile to ask SORTA and TANK directly for access to more data. There’s no reason they shouldn’t share live and/or historical ridership data through an API. The possibilities for interactive mapping that would let us understand how the system flows are stupendously interesting.

Finally, a note on licenses: If you’re going to be making maps for more than your personal use, particularly if you want to sell or distribute them, you’ll want to consider what you’re legally allowed to do with the data you’re using. CAGIS data is totally off limits as far as I understand. Unless you’re paying to use it, you probably can’t legally make a map from it. Federal data is all in the public domain and you can do absolutely anything with it.4 SORTA’s GTFS feed comes with some weird legalese, but unless you use it to mislead transit users you’re probably in the clear. OpenStreetMap is subject to the Open Database License, requiring in effect only that you say where the data came from and credit OSM contributors.

Happy mapping! Please let me know in the comments if I’ve missed anything that’s particularly useful for Cincinnati area cartographers and I’ll add it to the list. There’s a lot of general information on Cincinnati transit out there, but I’m particularly interested here in data that can be manipulated in a GIS system.

Show 4 footnotes

  1. Seriously, send me an email. I’ll buy you a coffee :-)
  2. It’s about 370GB, or 27GB when compressed.
  3. As you perhaps already got from “XML” ;-)
  4. Make porn from it if you like. No one can stop you!
Comments: 3
Posted in: Data
Tags: | | | | | | | | |
1 2 Boarding 3 Next Page 4 5