Meanwhile in the world of OpenStreetMap, Minh Nguyen and I have been putting together a proposal for the import of CAGIS’s building footprint dataset which has been recently listed under a public domain license. This would add about 295,000 building footprints to OpenStreetMap in Hamilton County, many with address and height information. To any OSM contributors who may be perusing this blog, we’re still processing the data and could use all the help we can get once it comes time to actually do the import and especially when we need to get our hands dirty with some manual conflation. For the rest of you, map users, data consumers, etc, get ready for a more complete OSM in Cincinnati!
Post Script: For the especially keen, Minh has compiled a vast and surprisingly interesting summary of OSM completeness in the Cincinnati MSA (and Ohio).
Just last year I had a fairly long email exchange with some CAGIS folks, including eventually a City of Cincinnati lawyer, who insisted that the data which they collect with public dollars for civil uses was not a ‘public record’ per Ohio state law. They gave no justification for that claim of course, but I didn’t, as they well knew, have the time or money to take them to court. But anyway…if they want to start playing nice and sharing some of their toys, I’m willing to let bygones be bygones.
So what’s in there? There are two files, the larger of which (‘annual release’) you probably don’t need to bother with. The smaller file (‘quarterly release’) has some real goodies which I’ll describe below. I say that you don’t need to bother with the larger file because it looks like it only has data which could be derived from public domain federal sources like the Census and the USGS. Those sources cover the whole US and are readily available, so why trust CAGIS employees to mediate when you can go straight to the source?
Here’s what’s in the ‘quarterly release’ file(with some notes of my own):
Building footprints, Hamilton County (they seem to have cut out all of the data on each building and just left us with the outline. Still, for what it is, the data set looks to be very thorough with ~360,000 buildings.)
Street Centerlines, Hamilton County (I would suggest OpenStreetMap as better for most uses, though there are some other fields in here which I haven’t properly explored. I did notice that the speed limit field seems to indicate a whole lot more 15mph streets than actually exist. The street I live on is tagged as 15mph and physically signed as 25mph for example.)
Neighborhood Boundaries, Cincinnati
‘Sidewalks’, ‘Pavements’, ‘Driveways’ & ‘Parking’, Hamilton County (These are lines only, no data or metadata. They might be useful for CAD-type architectural drawings, but for any sort of spatial analysis they’re probably not worth much.)
Zoning, Cincinnati and parts of Hamilton County
Railroad, Hamilton County (You’ll do better getting this from a federal source or from OpenStreetMap)
Parks, OKI Region (I’m not sure how CVG is considered a park, but this looks like an interesting dataset, with lots of useful fields like name and who’s responsible, and where you can find them)
Parcels, Hamilton County (There’s tons of juicy stuff in here, like the assessment and sale values for each and every parcel. I played with this a bit in the past, before I knew how to make maps.)
‘Subdivisions’, Hamilton County (This looks like a legal artifact more than anything. It seems to contain any parcel that’s been subdivided since GIS was invented. There are many ‘subdivisions’ downtown for example, and I can’t make any sense of other areas that I know well. 1/10: do not use.)
There are a few other datasets in there, but they’re either very obscure (survey benchmarks) or redundant (parcel ‘pages’) to other datasets. As far as I can make out, each of the files is projected in the Ohio Southern State Plane, EPSG 3735, though some of them appear to be missing that metadata.
A list of good, local geospatial data is one of those things that a few of y’all will find immensely useful and no one else will care about at all. If I haven’t already lost you, keep reading! I’m going to try to list all of the spatial data sources I’ve found useful along with some comments from my own experience as to how relevant and usable they are for the Cincinnati area.
Here’s a guess: if you’ve read this far, you’re a planning student from DAAP. If you’re not, I’d like to meet you! You’re a rare independently motivated Cincinnati area cartographer.1
1. One of the surprisingly unfortunate things about going to planning school at DAAP is that you get free and unfettered access to Cincinnati Area GIS (CAGIS) data. I say unfortunate because you’ll soon find yourself without it like a junky on the street coming down from a high. CAGIS, for those who aren’t yet familiar is a very thoroughly developed and wide-ranging set of geospatial data for Hamilton county. It’s run by the county and has a pretty large staff devoted to little more than collecting and updating their data all day. That means it’s extremely complete and accurate. The City of Cincinnati and County planners and engineers have access to it but that’s pretty much the end of it. A few large institutions like DAAP buy their annual access for more than you’ll ever get paid as a cartographer/planner. It has pretty much everything you can think of from sewer lines to parks to water, parcels, building footprints, etc… But it’s strictly limited to Hamilton county, meaning there are a lot of maps made that end unnecessarily at the river…
The county really wasn’t the most logical extent for a map of property values, but that’s the data I had to work with…
CAGIS’ actual, publicly funded, data sits on a big secret server somewhere that only the well funded have access to, but there’s also an online version for the masses. It’s sort of like visiting the data in prison though. There’s glass between the two of you and the best you can do is talk over the phone. This interface was made for people who needed to look up their parcel’s ID number or find their lot’s zoning, not for people interested in mapping or analysis. If you do manage to arrange a conjugal visit, there are people watching and you feel a little dirty when you try to explain why you want the data. It’s worth a try though. I’d recommend picking a phone number at random from their contact list and making a case for whatever you’re trying to do. They have shared data with me in the past, but I got it by telling someone I was still a planning student and coming by the office to pick up a CD. I’ve probably just ruined my chances for future data access.
Generally, I’ve found that CAGIS data(almost all shapefiles) is bloated with fields that don’t mean much without documentation that you probably can’t find. Government types work on these files and their need for documentation is extremely minimal since they’re not sharing them with anyone.
2. 180 degrees away we have my favourite data source, ideologically speaking at least. OpenStreetMap(OSM) is a wiki map of the world, and the Cincinnati area is surprisingly well developed relative to other American cities. I’ll let you play around with a slippy map here to see what I’m talking about.
OSM isn’t just a web map, it’s an actual data source(and receptacle). You could download the entire world if you had the time and computing capacity.2 I personally recommend using this site to extract the data for a smaller area . It will take a few minutes, but you get to select your own extent and can choose any of a number of formats. The most basic format is a .osm XML file. The format is completely extensible3 with the attributes of each object (points, lines, polygons or ‘relations’) stored in a theoretically unlimited number of “key”=”value” pairs. The amount of data the format can store is limited only by the amount of data you care to put in. A given polygon for example might have only one tag, “building”=”yes”, meaning of course that it represents a building, or it could have attributes telling us it’s use-type, ownership, height, presence-of-basement, name, the date it was drawn, height above sea-level, original data source, facade material, and on to infinity. Also, everything is in one file which is quite refreshing if you’re used to working with multiple shapefiles. That’s right: bus lines, streets, bike shops, forests, subdivisions, pedestrian crossings, stop lights and cell towers are all in the same place.
So the format will let you do pretty much anything, but completeness is an issue with OSM as you might already have imagined for a wiki map. In the Cincinnati region:
The street network is very complete and well developed. This is probably the best source for a complete street network; everything from alleys and driveways to highways and parking aisles.
Landuses and buildings are maybe 35% complete in the region with perhaps 90% completeness in the urban basin.
Most of the major transit lines are now included since I started entering them last month but most minor lines are still missing.
You’ll also find some historical features here that you won’t find anywhere else like old rail rights of way or long-gone schools and parks
3. The US Geological Survey(USGS) is the go-to place for anything you could capture from a satellite. Don’t let the contour lines in CAGIS fool you; this is the original source! They’ve got aerial imagery, elevation, landcover, and a few other things you’ll find useful. In my experience, the most valuable data from the USGS are in such rasters. While they do have some vector data like the location of schools, public buildings and streets that’s not what I find myself going there for.
First you’ll use the interactive map to select the area you want, then it will prompt you to select from the available datasets. After a couple more steps, you’ll get an email with links to the data. There are a couple annoying things: first, for large rasters, they’ll break your download into pieces of ~70mb. Just a moment ago, I tried to download an aerial image of the downtown area and it broke it into 22 pieces which I would probably want to stitch back together into one file before using it in a project.
The other thing I’ve found frustrating is that for some datasets, our region seems to be sitting on a couple of data collection boundaries. Here for example high resolution elevation data is available for most of the city, but if you’re mapping the airport, you’ll have to settle for a significantly lower resolution that may show up pixelated at a large scale.
For anyone interested in elevations, I’ve compiled the best available data (from 2009) into only two files available on the data page of this site.
4. The most exact data we have on current transit services is in the General Transit Feed Specification(GTFS) files released by both TANK and SORTA. This data is what transit agencies provide to Google for use in their transit trip planning service. From it you could theoretically find the scheduled location of any vehicle at any time of the day or week. The data isn’t directly usable in a GIS software. It consists of a number of related CSV tables that need to be joined to each other before useful data can be extracted. That means you’ll need to be passingly familiar with SQL and databases before you’ll be able to make much of it. However once you get going with it, you can find anything that could be derived from schedule information, including speeds, headways, frequency, span of service, distance covered, route variance and more. To save the newbs a little bit of trouble, I’ve extracted all of the different lines into a more directly usable format:
5. Everyone is probably already familiar enough with the US Census Bureau‘s demographic data, but they’re also a great source for boundaries like cities, school districts, counties, states, congressional districts and more. Some people might tell you to use the shapefiles they provide for streets or railways, but for the Cincinnati area at least, I’d advise anyone to use OSM instead. Most of the OSM streets were originally derived from the census bureau data but they’ve since been thoroughly validated, cleaned up and updated by OSM users. The Census bureau excels at boundaries and demographics.
6. SORTA’s stop-level ridership data is a goldmine of information on the way people actually use the transit system. The only place it’s currently published is right here on the data page. Basically, each bus has a monitoring system that counts the number of people boarding and de-boarding at each and every stop throughout the day. Those numbers are aggregated by line and by stop for each day. SORTA collects this data and more every single day they operate but they don’t publish it. TANK seems to have recently implemented a similar monitoring system, so we should soon be able to look forward to ridership data for the whole system.
I’ve done a little analysis of SORTA’s data already, so you can check that out here and here to get an idea of how this dataset could be useful. If you care to pursue it, I would say that it’s worthwhile to ask SORTA and TANK directly for access to more data. There’s no reason they shouldn’t share live and/or historical ridership data through an API. The possibilities for interactive mapping that would let us understand how the system flows are stupendously interesting.
Finally, a note on licenses: If you’re going to be making maps for more than your personal use, particularly if you want to sell or distribute them, you’ll want to consider what you’re legally allowed to do with the data you’re using. CAGIS data is totally off limits as far as I understand. Unless you’re paying to use it, you probably can’t legally make a map from it. Federal data is all in the public domain and you can do absolutely anything with it.4 SORTA’s GTFS feed comes with some weird legalese, but unless you use it to mislead transit users you’re probably in the clear. OpenStreetMap is subject to the Open Database License, requiring in effect only that you say where the data came from and credit OSM contributors.
Happy mapping! Please let me know in the comments if I’ve missed anything that’s particularly useful for Cincinnati area cartographers and I’ll add it to the list. There’s a lot of general information on Cincinnati transit out there, but I’m particularly interested here in data that can be manipulated in a GIS system.