Below is a visualization of weekly frequency for each of SORTA’s routes, derived from the just-released GTFS data. The area of each square is directly proportional to the number of trips a route will make each week after the August 18th, 2013 service changes. The #1 at the top will make a total of 232 weekly trips. The m+ at the bottom will make 595. The route with the most weekly trips is the 43 with 1,289.1 I’ll let them talk for themselves now:
I’ve been meaning to write a really simple little graph-generating script for a while. I’ve been unhappily living with Excel and OpenOffice forever as has everyone, and their ability to make visualisations is just NOTHING compared to the simplicity of SVG combined with a language like PHP. Just a couple hours coding and a little time tweaking in Inkscape and boom! Coloured squares. Oh yeah.
*does a little dance*
I didn’t choose the colours by the way. Those were in the data in a field marked simply ‘route_colors’. Someone at SORTA picked them. Any colour theorists in the audience want to offer a psychoanalysis?
A trip here is somewhat loosely defined. In almost every case, it’s the journey from the far end of a route’s path to the other. The journey back to the starting point is another trip. I’m being a touch lazy and not looking at the actual shapes though. I know there have been instances in the past, though I can’t say for sure if there are any cases of it in this data, where for some technical reason a trip as just defined is split into two or more trips for the purpose of encoding it neatly in the data. If that’s happening here, it will distort the visualization and I won’t know any better. I simply haven’t looked. I’m tired and it’s my bed time. Plus, a quick glance over and it seems to pass the smell test. No huge outliers that I immediately notice. ↩
For all those anxiously wondering what exactly service will be like after SORTA’s service changes take effect this month, the answer arrived last night in the form of both PDF schedules and a new GTFS feed. Woot!!
SERVICE CHANGES: COMING AUGUST 18th 2013. Don’t get caught waiting for a bus that isn’t coming.Check your schedules.
In related news, I’ll should some more quantitative analysis of the new service plan here soon once I get around to it, including an updated frequency map to be available on the website. I was just waiting for the GTFS data for that one. And hopefully…some new analysis of ridership on a restructured system. Will more cross-town routes be reflected in relatively decreased boardings in downtown? I’m counting on SORTA to release that data from their automatic passenger counting system as soon as there’s enough time for the results to be statistically significant, perhaps around early to mid-September.
Interesting stuff! And pretty close to a normal distribution. The “H…” lines are TANK’s school bus lines by the way.(EDIT: See the comments for a much better account of TANK naming conventions) And of course the speed varies by location and time of day. Everything will be slower downtown than on the highway.
I derived the numbers from the GTFS feeds from both agencies. All of the numbers look like they’re in the right ballpark, but I haven’t gone through this line by line to rule out anything weird like big scheduled layovers. I’ve also updated the “shapes from GTFS” file to include average speeds for the line. There should be a map coming soon…it will be interesting to see if I can use the data to identify chokepoints in the system and potentially how they vary throughout the day as traffic changes. Of course the data isn’t from actual observed speeds, just inferred from the length of route segments and their scheduled times throughout the week.
Transit agencies: I know your planning departments must do interesting statistical analyses of your routes internally…it would be awesome if you shared the results with the world!
A list of good, local geospatial data is one of those things that a few of y’all will find immensely useful and no one else will care about at all. If I haven’t already lost you, keep reading! I’m going to try to list all of the spatial data sources I’ve found useful along with some comments from my own experience as to how relevant and usable they are for the Cincinnati area.
Here’s a guess: if you’ve read this far, you’re a planning student from DAAP. If you’re not, I’d like to meet you! You’re a rare independently motivated Cincinnati area cartographer.1
1. One of the surprisingly unfortunate things about going to planning school at DAAP is that you get free and unfettered access to Cincinnati Area GIS (CAGIS) data. I say unfortunate because you’ll soon find yourself without it like a junky on the street coming down from a high. CAGIS, for those who aren’t yet familiar is a very thoroughly developed and wide-ranging set of geospatial data for Hamilton county. It’s run by the county and has a pretty large staff devoted to little more than collecting and updating their data all day. That means it’s extremely complete and accurate. The City of Cincinnati and County planners and engineers have access to it but that’s pretty much the end of it. A few large institutions like DAAP buy their annual access for more than you’ll ever get paid as a cartographer/planner. It has pretty much everything you can think of from sewer lines to parks to water, parcels, building footprints, etc… But it’s strictly limited to Hamilton county, meaning there are a lot of maps made that end unnecessarily at the river…
The county really wasn’t the most logical extent for a map of property values, but that’s the data I had to work with…
CAGIS’ actual, publicly funded, data sits on a big secret server somewhere that only the well funded have access to, but there’s also an online version for the masses. It’s sort of like visiting the data in prison though. There’s glass between the two of you and the best you can do is talk over the phone. This interface was made for people who needed to look up their parcel’s ID number or find their lot’s zoning, not for people interested in mapping or analysis. If you do manage to arrange a conjugal visit, there are people watching and you feel a little dirty when you try to explain why you want the data. It’s worth a try though. I’d recommend picking a phone number at random from their contact list and making a case for whatever you’re trying to do. They have shared data with me in the past, but I got it by telling someone I was still a planning student and coming by the office to pick up a CD. I’ve probably just ruined my chances for future data access.
Generally, I’ve found that CAGIS data(almost all shapefiles) is bloated with fields that don’t mean much without documentation that you probably can’t find. Government types work on these files and their need for documentation is extremely minimal since they’re not sharing them with anyone.
2. 180 degrees away we have my favourite data source, ideologically speaking at least. OpenStreetMap(OSM) is a wiki map of the world, and the Cincinnati area is surprisingly well developed relative to other American cities. I’ll let you play around with a slippy map here to see what I’m talking about.
OSM isn’t just a web map, it’s an actual data source(and receptacle). You could download the entire world if you had the time and computing capacity.2 I personally recommend using this site to extract the data for a smaller area . It will take a few minutes, but you get to select your own extent and can choose any of a number of formats. The most basic format is a .osm XML file. The format is completely extensible3 with the attributes of each object (points, lines, polygons or ‘relations’) stored in a theoretically unlimited number of “key”=”value” pairs. The amount of data the format can store is limited only by the amount of data you care to put in. A given polygon for example might have only one tag, “building”=”yes”, meaning of course that it represents a building, or it could have attributes telling us it’s use-type, ownership, height, presence-of-basement, name, the date it was drawn, height above sea-level, original data source, facade material, and on to infinity. Also, everything is in one file which is quite refreshing if you’re used to working with multiple shapefiles. That’s right: bus lines, streets, bike shops, forests, subdivisions, pedestrian crossings, stop lights and cell towers are all in the same place.
So the format will let you do pretty much anything, but completeness is an issue with OSM as you might already have imagined for a wiki map. In the Cincinnati region:
The street network is very complete and well developed. This is probably the best source for a complete street network; everything from alleys and driveways to highways and parking aisles.
Landuses and buildings are maybe 35% complete in the region with perhaps 90% completeness in the urban basin.
Most of the major transit lines are now included since I started entering them last month but most minor lines are still missing.
You’ll also find some historical features here that you won’t find anywhere else like old rail rights of way or long-gone schools and parks
3. The US Geological Survey(USGS) is the go-to place for anything you could capture from a satellite. Don’t let the contour lines in CAGIS fool you; this is the original source! They’ve got aerial imagery, elevation, landcover, and a few other things you’ll find useful. In my experience, the most valuable data from the USGS are in such rasters. While they do have some vector data like the location of schools, public buildings and streets that’s not what I find myself going there for.
First you’ll use the interactive map to select the area you want, then it will prompt you to select from the available datasets. After a couple more steps, you’ll get an email with links to the data. There are a couple annoying things: first, for large rasters, they’ll break your download into pieces of ~70mb. Just a moment ago, I tried to download an aerial image of the downtown area and it broke it into 22 pieces which I would probably want to stitch back together into one file before using it in a project.
The other thing I’ve found frustrating is that for some datasets, our region seems to be sitting on a couple of data collection boundaries. Here for example high resolution elevation data is available for most of the city, but if you’re mapping the airport, you’ll have to settle for a significantly lower resolution that may show up pixelated at a large scale.
For anyone interested in elevations, I’ve compiled the best available data (from 2009) into only two files available on the data page of this site.
4. The most exact data we have on current transit services is in the General Transit Feed Specification(GTFS) files released by both TANK and SORTA. This data is what transit agencies provide to Google for use in their transit trip planning service. From it you could theoretically find the scheduled location of any vehicle at any time of the day or week. The data isn’t directly usable in a GIS software. It consists of a number of related CSV tables that need to be joined to each other before useful data can be extracted. That means you’ll need to be passingly familiar with SQL and databases before you’ll be able to make much of it. However once you get going with it, you can find anything that could be derived from schedule information, including speeds, headways, frequency, span of service, distance covered, route variance and more. To save the newbs a little bit of trouble, I’ve extracted all of the different lines into a more directly usable format:
5. Everyone is probably already familiar enough with the US Census Bureau‘s demographic data, but they’re also a great source for boundaries like cities, school districts, counties, states, congressional districts and more. Some people might tell you to use the shapefiles they provide for streets or railways, but for the Cincinnati area at least, I’d advise anyone to use OSM instead. Most of the OSM streets were originally derived from the census bureau data but they’ve since been thoroughly validated, cleaned up and updated by OSM users. The Census bureau excels at boundaries and demographics.
6. SORTA’s stop-level ridership data is a goldmine of information on the way people actually use the transit system. The only place it’s currently published is right here on the data page. Basically, each bus has a monitoring system that counts the number of people boarding and de-boarding at each and every stop throughout the day. Those numbers are aggregated by line and by stop for each day. SORTA collects this data and more every single day they operate but they don’t publish it. TANK seems to have recently implemented a similar monitoring system, so we should soon be able to look forward to ridership data for the whole system.
I’ve done a little analysis of SORTA’s data already, so you can check that out here and here to get an idea of how this dataset could be useful. If you care to pursue it, I would say that it’s worthwhile to ask SORTA and TANK directly for access to more data. There’s no reason they shouldn’t share live and/or historical ridership data through an API. The possibilities for interactive mapping that would let us understand how the system flows are stupendously interesting.
Finally, a note on licenses: If you’re going to be making maps for more than your personal use, particularly if you want to sell or distribute them, you’ll want to consider what you’re legally allowed to do with the data you’re using. CAGIS data is totally off limits as far as I understand. Unless you’re paying to use it, you probably can’t legally make a map from it. Federal data is all in the public domain and you can do absolutely anything with it.4 SORTA’s GTFS feed comes with some weird legalese, but unless you use it to mislead transit users you’re probably in the clear. OpenStreetMap is subject to the Open Database License, requiring in effect only that you say where the data came from and credit OSM contributors.
Happy mapping! Please let me know in the comments if I’ve missed anything that’s particularly useful for Cincinnati area cartographers and I’ll add it to the list. There’s a lot of general information on Cincinnati transit out there, but I’m particularly interested here in data that can be manipulated in a GIS system.