A guide to local transit and other geospatial data

March 22nd, 2013

A list of good, local geospatial data is one of those things that a few of y’all will find immensely useful and no one else will care about at all. If I haven’t already lost you, keep reading! I’m going to try to list all of the spatial data sources I’ve found useful along with some comments from my own experience as to how relevant and usable they are for the Cincinnati area.

Here’s a guess: if you’ve read this far, you’re a planning student from DAAP. If you’re not, I’d like to meet you! You’re a rare independently motivated Cincinnati area cartographer.1

1. One of the surprisingly unfortunate things about going to planning school at DAAP is that you get free and unfettered access to Cincinnati Area GIS (CAGIS) data. I say unfortunate because you’ll soon find yourself without it like a junky on the street coming down from a high. CAGIS, for those who aren’t yet familiar is a very thoroughly developed and wide-ranging set of geospatial data for Hamilton county. It’s run by the county and has a pretty large staff devoted to little more than collecting and updating their data all day. That means it’s extremely complete and accurate. The City of Cincinnati and County planners and engineers have access to it but that’s pretty much the end of it. A few large institutions like DAAP buy their annual access for more than you’ll ever get paid as a cartographer/planner. It has pretty much everything you can think of from sewer lines to parks to water, parcels, building footprints, etc… But it’s strictly limited to Hamilton county, meaning there are a lot of maps made that end unnecessarily at the river…

property value, including improvements, of Hamilton County Ohio in dollars per square feet

The county really wasn’t the most logical extent for a map of property values, but that’s the data I had to work with…

CAGIS’ actual, publicly funded, data sits on a big secret server somewhere that only the well funded have access to, but there’s also an online version for the masses. It’s sort of like visiting the data in prison though. There’s glass between the two of you and the best you can do is talk over the phone. This interface was made for people who needed to look up their parcel’s ID number or find their lot’s zoning, not for people interested in mapping or analysis. If you do manage to arrange a conjugal visit, there are people watching and you feel a little dirty when you try to explain why you want the data. It’s worth a try though. I’d recommend picking a phone number at random from their contact list and making a case for whatever you’re trying to do. They have shared data with me in the past, but I got it by telling someone I was still a planning student and coming by the office to pick up a CD. I’ve probably just ruined my chances for future data access.

Generally, I’ve found that CAGIS data(almost all shapefiles) is bloated with fields that don’t mean much without documentation that you probably can’t find. Government types work on these files and their need for documentation is extremely minimal since they’re not sharing them with anyone.

2. 180 degrees away we have my favourite data source, ideologically speaking at least. OpenStreetMap(OSM) is a wiki map of the world, and the Cincinnati area is surprisingly well developed relative to other American cities. I’ll let you play around with a slippy map here to see what I’m talking about.

[osm_map lat=”39.115″ long=”-84.51″ zoom=”15″ width=”550″ height=”550″ theme=”dark” type=”AllOsm”]

OSM isn’t just a web map, it’s an actual data source(and receptacle). You could download the entire world if you had the time and computing capacity.2 I personally recommend using this site to extract the data for a smaller area . It will take a few minutes, but you get to select your own extent and can choose any of a number of formats. The most basic format is a .osm XML file. The format is completely extensible3 with the attributes of each object (points, lines, polygons or ‘relations’) stored in a theoretically unlimited number of  “key”=”value” pairs. The amount of data the format can store is limited only by the amount of data you care to put in. A given polygon for example might have only one tag, “building”=”yes”, meaning of course that it represents a building, or it could have attributes telling us it’s use-type, ownership, height, presence-of-basement, name, the date it was drawn, height above sea-level, original data source, facade material, and on to infinity. Also, everything is in one file which is quite refreshing if you’re used to working with multiple shapefiles. That’s right: bus lines, streets, bike shops, forests, subdivisions, pedestrian crossings, stop lights and cell towers are all in the same place.

So the format will let you do pretty much anything, but completeness is an issue with OSM as you might already have imagined for a wiki map. In the Cincinnati region:

Find something missing? You can add it yourself and make the data more complete for everyone! OSM data is most useful if you can import it into a PostGIS enabled database.

3. The US Geological Survey(USGS) is the go-to place for anything you could capture from a satellite. Don’t let the contour lines in CAGIS fool you; this is the original source! They’ve got aerial imagery, elevation, landcover, and a few other things you’ll find useful. In my experience, the most valuable data from the USGS are in such rasters. While they do have some vector data like the location of schools, public buildings and streets that’s not what I find myself going there for.

aerial photo of the queensgate yard

First you’ll use the interactive map to select the area you want, then it will prompt you to select from the available datasets. After a couple more steps, you’ll get an email with links to the data. There are a couple annoying things: first, for large rasters, they’ll break your download into pieces of ~70mb. Just a moment ago, I tried to download an aerial image of the downtown area and it broke it into 22 pieces which I would probably want to stitch back together into one file before using it in a project.
The other thing I’ve found frustrating is that for some datasets, our region seems to be sitting on a couple of data collection boundaries. Here for example high resolution elevation data is available for most of the city, but if you’re mapping the airport, you’ll have to settle for a significantly lower resolution that may show up pixelated at a large scale.

topographic map of ohio

For anyone interested in elevations, I’ve compiled the best available data (from 2009) into only two files available on the data page of this site.

4. The most exact data we have on current transit services is in the General Transit Feed Specification(GTFS) files released by both TANK and SORTA. This data is what transit agencies provide to Google for use in their transit trip planning service. From it you could theoretically find the scheduled location of any vehicle at any time of the day or week. The data isn’t directly usable in a GIS software. It consists of a number of related CSV tables that need to be joined to each other before useful data can be extracted. That means you’ll need to be passingly familiar with SQL and databases before you’ll be able to make much of it. However once you get going with it, you can find anything that could be derived from schedule information, including speeds, headways, frequency, span of service, distance covered, route variance and more. To save the newbs a little bit of trouble, I’ve extracted all of the different lines into a more directly usable format:

map of cincinnati transit

A rendering of my derived GTFS shapes file

That file is on the data page too.

5. Everyone is probably already familiar enough with the US Census Bureau‘s demographic data, but they’re also a great source for boundaries like cities, school districts, counties, states, congressional districts and more. Some people might tell you to use the shapefiles they provide for streets or railways, but for the Cincinnati area at least, I’d advise anyone to use OSM instead. Most of the OSM streets were originally derived from the census bureau data but they’ve since been thoroughly validated, cleaned up and updated by OSM users. The Census bureau excels at boundaries and demographics.

6. SORTA’s stop-level ridership data is a goldmine of information on the way people actually use the transit system. The only place it’s currently published is right here on the data page. Basically, each bus has a monitoring system that counts the number of people boarding and de-boarding at each and every stop throughout the day. Those numbers are aggregated by line and by stop for each day. SORTA collects this data and more every single day they operate but they don’t publish it. TANK seems to have recently implemented a similar monitoring system, so we should soon be able to look forward to ridership data for the whole system.

I’ve done a little analysis of SORTA’s data already, so you can check that out here and here to get an idea of how this dataset could be useful. If you care to pursue it, I would say that it’s worthwhile to ask SORTA and TANK directly for access to more data. There’s no reason they shouldn’t share live and/or historical ridership data through an API. The possibilities for interactive mapping that would let us understand how the system flows are stupendously interesting.

Finally, a note on licenses: If you’re going to be making maps for more than your personal use, particularly if you want to sell or distribute them, you’ll want to consider what you’re legally allowed to do with the data you’re using. CAGIS data is totally off limits as far as I understand. Unless you’re paying to use it, you probably can’t legally make a map from it. Federal data is all in the public domain and you can do absolutely anything with it.4 SORTA’s GTFS feed comes with some weird legalese, but unless you use it to mislead transit users you’re probably in the clear. OpenStreetMap is subject to the Open Database License, requiring in effect only that you say where the data came from and credit OSM contributors.

Happy mapping! Please let me know in the comments if I’ve missed anything that’s particularly useful for Cincinnati area cartographers and I’ll add it to the list. There’s a lot of general information on Cincinnati transit out there, but I’m particularly interested here in data that can be manipulated in a GIS system.

Show 4 footnotes

  1. Seriously, send me an email. I’ll buy you a coffee :-)
  2. It’s about 370GB, or 27GB when compressed.
  3. As you perhaps already got from “XML” ;-)
  4. Make porn from it if you like. No one can stop you!

3 responses to “A guide to local transit and other geospatial data”

  1. Chris says:

    Awesome post. I can’t believe I’d never heard of OSM, that is some cool shit. Just recently found this blog a month or two ago, keep up the good work.

  2. OpenStreetMap has more detail on Cincinnati and Hamilton County than most of Manhattan, for the same reason Wikipedia has more on Pokémon than the entire animal kingdom: Cincinnati’s just that interesting! ;-) But the surrounding counties still need a lot of attention, particularly Brown and Dearborn. OSM is a constant work in progress, so if you see anything in particular that needs more work, please nag either of us about it, or better yet, try your hand at fixing it.