Here is a rather crude, though I think useful, visualization of service frequency at the stop level. Basically, I used the GTFS data from SORTA and TANK to calculate the number of times a bus stops at each stop every week. Since a week is the basic cycle period of transit(service is bad on Sunday, better on monday), this should give us a an idea of basic average frequency with the huge caveat that there’s enormous variation within each week.
Click the image to get a bigger version. There’s lot’s of interesting detail in there!
You may notice that frequency can appear vary in a single line where it doesn’t seem like it probably should:
In most cases, this is simply an artifact of the way I grouped stops that were next to each other and had exactly the same name. At least 2-3,000 stops of the 6,000 stops in the dataset can reasonably be thought of as pairs with one serving each direction of travel.
Access is a pretty vague word. I don’t think I could succinctly define it and I suspect no one else could either as it regards discussions about transit and transportation. Still, we can imagine a transit line that makes no stops at all and say that it would provide no access whatsoever. It would be useless. Similarly, a line with infinite stops where the bus moves infinitesimally after each stop before stopping again also has an access value of 0; it would also be useless.1
Somewhere in the middle is the Goldilocks stop spacing arrangement. Where does the m+ fall on this spectrum? Where do the other lines in the system? Might there be an ideal middle ground or are both either too crowded or sparse? Is there room for …lets call it ‘schedule diversity’ within a corridor? What effect does that have on effective frequency and average wait times at the skipped-over stops?
I’d like to hear SORTA’s and TANK’s official positions, or perhaps not positions but perspectives, on these questions as they move forward with their discussions of adding more rapid-transit-like lines to their systems. It’s not evident to me as an outsider that they’ve weighed the issue at all, at least publicly. Transit planners? Can you weigh in please? I’ve made my opinion clear in the above chart but I’m curious how SORTA and TANK would re-draw it and what they might add to it.
Though my line tapers off here without hitting 0 because the driver has some agency in stopping and doesn’t have to stop at an empty stop. Infinite stops might be more comparable to dial-a-ride or flexible schedule or no-stop services. ↩
OpenStreetMap, the wiki map of the world, continues to delight me.
I’ve been slowly entering transit information over the last few months and while the effort is far from finished I’ve got most of the important lines and many of the major stops entered now. One result is the first and only web-map1 of transit services in Cincinnati.
Such a thing is useful because it allows users to dig deep into information-dense places like downtown(or their own neighborhoods) while allowing them to skim over sparse or irrelevant areas. The need for that ability is the reason most good printed transit maps exaggerate dense areas like downtowns and condense suburban service dramatically. A web map however is able to preserve topographic accuracy while filling the same need for differential detail. It’s more able to let individual users focus on what information is important to them without the designer needing to make as many assumptions.
Move around in the above map and if you see anything wrong let me know or have a go at correcting it yourself!
A list of good, local geospatial data is one of those things that a few of y’all will find immensely useful and no one else will care about at all. If I haven’t already lost you, keep reading! I’m going to try to list all of the spatial data sources I’ve found useful along with some comments from my own experience as to how relevant and usable they are for the Cincinnati area.
Here’s a guess: if you’ve read this far, you’re a planning student from DAAP. If you’re not, I’d like to meet you! You’re a rare independently motivated Cincinnati area cartographer.1
1. One of the surprisingly unfortunate things about going to planning school at DAAP is that you get free and unfettered access to Cincinnati Area GIS (CAGIS) data. I say unfortunate because you’ll soon find yourself without it like a junky on the street coming down from a high. CAGIS, for those who aren’t yet familiar is a very thoroughly developed and wide-ranging set of geospatial data for Hamilton county. It’s run by the county and has a pretty large staff devoted to little more than collecting and updating their data all day. That means it’s extremely complete and accurate. The City of Cincinnati and County planners and engineers have access to it but that’s pretty much the end of it. A few large institutions like DAAP buy their annual access for more than you’ll ever get paid as a cartographer/planner. It has pretty much everything you can think of from sewer lines to parks to water, parcels, building footprints, etc… But it’s strictly limited to Hamilton county, meaning there are a lot of maps made that end unnecessarily at the river…
The county really wasn’t the most logical extent for a map of property values, but that’s the data I had to work with…
CAGIS’ actual, publicly funded, data sits on a big secret server somewhere that only the well funded have access to, but there’s also an online version for the masses. It’s sort of like visiting the data in prison though. There’s glass between the two of you and the best you can do is talk over the phone. This interface was made for people who needed to look up their parcel’s ID number or find their lot’s zoning, not for people interested in mapping or analysis. If you do manage to arrange a conjugal visit, there are people watching and you feel a little dirty when you try to explain why you want the data. It’s worth a try though. I’d recommend picking a phone number at random from their contact list and making a case for whatever you’re trying to do. They have shared data with me in the past, but I got it by telling someone I was still a planning student and coming by the office to pick up a CD. I’ve probably just ruined my chances for future data access.
Generally, I’ve found that CAGIS data(almost all shapefiles) is bloated with fields that don’t mean much without documentation that you probably can’t find. Government types work on these files and their need for documentation is extremely minimal since they’re not sharing them with anyone.
2. 180 degrees away we have my favourite data source, ideologically speaking at least. OpenStreetMap(OSM) is a wiki map of the world, and the Cincinnati area is surprisingly well developed relative to other American cities. I’ll let you play around with a slippy map here to see what I’m talking about.
OSM isn’t just a web map, it’s an actual data source(and receptacle). You could download the entire world if you had the time and computing capacity.2 I personally recommend using this site to extract the data for a smaller area . It will take a few minutes, but you get to select your own extent and can choose any of a number of formats. The most basic format is a .osm XML file. The format is completely extensible3 with the attributes of each object (points, lines, polygons or ‘relations’) stored in a theoretically unlimited number of “key”=”value” pairs. The amount of data the format can store is limited only by the amount of data you care to put in. A given polygon for example might have only one tag, “building”=”yes”, meaning of course that it represents a building, or it could have attributes telling us it’s use-type, ownership, height, presence-of-basement, name, the date it was drawn, height above sea-level, original data source, facade material, and on to infinity. Also, everything is in one file which is quite refreshing if you’re used to working with multiple shapefiles. That’s right: bus lines, streets, bike shops, forests, subdivisions, pedestrian crossings, stop lights and cell towers are all in the same place.
So the format will let you do pretty much anything, but completeness is an issue with OSM as you might already have imagined for a wiki map. In the Cincinnati region:
The street network is very complete and well developed. This is probably the best source for a complete street network; everything from alleys and driveways to highways and parking aisles.
Landuses and buildings are maybe 35% complete in the region with perhaps 90% completeness in the urban basin.
Most of the major transit lines are now included since I started entering them last month but most minor lines are still missing.
You’ll also find some historical features here that you won’t find anywhere else like old rail rights of way or long-gone schools and parks
3. The US Geological Survey(USGS) is the go-to place for anything you could capture from a satellite. Don’t let the contour lines in CAGIS fool you; this is the original source! They’ve got aerial imagery, elevation, landcover, and a few other things you’ll find useful. In my experience, the most valuable data from the USGS are in such rasters. While they do have some vector data like the location of schools, public buildings and streets that’s not what I find myself going there for.
First you’ll use the interactive map to select the area you want, then it will prompt you to select from the available datasets. After a couple more steps, you’ll get an email with links to the data. There are a couple annoying things: first, for large rasters, they’ll break your download into pieces of ~70mb. Just a moment ago, I tried to download an aerial image of the downtown area and it broke it into 22 pieces which I would probably want to stitch back together into one file before using it in a project.
The other thing I’ve found frustrating is that for some datasets, our region seems to be sitting on a couple of data collection boundaries. Here for example high resolution elevation data is available for most of the city, but if you’re mapping the airport, you’ll have to settle for a significantly lower resolution that may show up pixelated at a large scale.
For anyone interested in elevations, I’ve compiled the best available data (from 2009) into only two files available on the data page of this site.
4. The most exact data we have on current transit services is in the General Transit Feed Specification(GTFS) files released by both TANK and SORTA. This data is what transit agencies provide to Google for use in their transit trip planning service. From it you could theoretically find the scheduled location of any vehicle at any time of the day or week. The data isn’t directly usable in a GIS software. It consists of a number of related CSV tables that need to be joined to each other before useful data can be extracted. That means you’ll need to be passingly familiar with SQL and databases before you’ll be able to make much of it. However once you get going with it, you can find anything that could be derived from schedule information, including speeds, headways, frequency, span of service, distance covered, route variance and more. To save the newbs a little bit of trouble, I’ve extracted all of the different lines into a more directly usable format:
5. Everyone is probably already familiar enough with the US Census Bureau‘s demographic data, but they’re also a great source for boundaries like cities, school districts, counties, states, congressional districts and more. Some people might tell you to use the shapefiles they provide for streets or railways, but for the Cincinnati area at least, I’d advise anyone to use OSM instead. Most of the OSM streets were originally derived from the census bureau data but they’ve since been thoroughly validated, cleaned up and updated by OSM users. The Census bureau excels at boundaries and demographics.
6. SORTA’s stop-level ridership data is a goldmine of information on the way people actually use the transit system. The only place it’s currently published is right here on the data page. Basically, each bus has a monitoring system that counts the number of people boarding and de-boarding at each and every stop throughout the day. Those numbers are aggregated by line and by stop for each day. SORTA collects this data and more every single day they operate but they don’t publish it. TANK seems to have recently implemented a similar monitoring system, so we should soon be able to look forward to ridership data for the whole system.
I’ve done a little analysis of SORTA’s data already, so you can check that out here and here to get an idea of how this dataset could be useful. If you care to pursue it, I would say that it’s worthwhile to ask SORTA and TANK directly for access to more data. There’s no reason they shouldn’t share live and/or historical ridership data through an API. The possibilities for interactive mapping that would let us understand how the system flows are stupendously interesting.
Finally, a note on licenses: If you’re going to be making maps for more than your personal use, particularly if you want to sell or distribute them, you’ll want to consider what you’re legally allowed to do with the data you’re using. CAGIS data is totally off limits as far as I understand. Unless you’re paying to use it, you probably can’t legally make a map from it. Federal data is all in the public domain and you can do absolutely anything with it.4 SORTA’s GTFS feed comes with some weird legalese, but unless you use it to mislead transit users you’re probably in the clear. OpenStreetMap is subject to the Open Database License, requiring in effect only that you say where the data came from and credit OSM contributors.
Happy mapping! Please let me know in the comments if I’ve missed anything that’s particularly useful for Cincinnati area cartographers and I’ll add it to the list. There’s a lot of general information on Cincinnati transit out there, but I’m particularly interested here in data that can be manipulated in a GIS system.
I spent the last few days poking around in SORTA’s fresh ridership statistics, and I’ve compiled some maps, a couple charts, and a few anecdotes for your general edification. All of this is from data I got last week that I understand to be an average of a month of weekdays in January 2013. In no particular order…here we go!
The biggest stops for each neighborhood in the city by total daily riders:
Each stop’s total is created by adding the number of people getting on there, and the number of people getting off there each day. There were a couple ties within neighborhoods. The area of the circle corresponds to the total number. Click for the full size image.
Transit users familiar with any of these neighborhoods will probably have been able to guess where a lot of these stops would be if not their relative proportion, but it’s fun to put them all on a map.
Just zoomed in on the above.
Clifton in particular took me by surprise though. I would have guessed the stops in the business district in front of Sitwells would have won out over Cincy State.
Now of course just picking out the biggest stops by some arbitrary boundary is losing a lot of information. In this case, it’s failing to account for 86.5% of riders. Government Square for example has several stops in the thousands that haven’t shown up. Here’s what the same map looks like with all of the stops shown.
And zoomed in again
That’s sort of a mess.
Heatmap of ridership intensity:
A heatmap is a slightly more complicated kind of thing because it’s not quite as intuitive how the total is calculated. In this case, the total number of of riders was applied to an area a couple thousand feet wide, directly in the center and progressively less intensely toward the edges of that circle. Stops nearby each other compound the total value as their circles overlap. So heatmaps aren’t great for deriving actual values, but they’re good for comparing approximate intensity and should make more sense of Downtown than the above.
That lone black spot is Downtown. It’s ridiculously dominant on pretty much any map of transit in Cincinnati. Let’s see what happens if we set the upper limit of our scale to ignore Downtown altogether.
Note that whereas the upper limit of the spectrum was 15,000 before, it’s now only 1,200. Transit activity Downtown is more than an order of magnitude more intense than anywhere else in the region.
I also broke the data down in a slightly less mathematically ambiguous, and slightly more designer-y way.
If anyone wants to install this as tile on their bathroom floor, there is a link to the data is at the end of this post. You can group the numbers to reflect the number of tile colors your remodeling budget can afford. Perhaps the faucets can be made from melted down streetcar track.
But I think it’s statistically kind of useless since, again, the boundaries are rather arbitrary. But let’s have one more boundary based map before we go right into some harder data.
Trip Density by Neighborhood:
It’s broken into five quantile classes for the color coding, but you can zoom in to see the numbers for each neighborhood.
This is the total number of trips per square mile by neighborhood, with half a trip counted for each boarding and the other half counted for each de-boarding. Thus if I took a bus from Downtown to Mt. Washington, 0.5 would accrue to each neighborhood(Assuming each were exactly 1 square mile in size). It’s possible that neighborhood boundaries running down the middle of roads served by transit make this not very useful at a fine level, but it looks generally right at a small scale, so I thought I’d share it. Now here’s the hard stuff.
Express Line Ridership Totals:
The left column is the line number, the right the total number of trips, derived as (total ons + total offs) / 2
Non-Express Line Ridership Totals:
Totals for Express vs. Non-Express:
SQL is fun! Those totals are an addition of ons and offs by the way, not the total number of trips(which would be half that). Call me sloppy, but I’m only concerned with proportion right now.
It looks like non-express lines account for the lion’s share of all trips. Express lines count for less than 10%, and that’s only on weekdays! I’m also told that CPS students(counted by their use of special passes) account for about 9% of all trips. Assuming they’re not taking any express lines, which seems like a fair assumption, that means they account for about 10% of all non-express trips. Anecdotally it would seem, with such numbers from CPS, that the reintroduction of free rides for University of Cincinnati students could cause a pretty large jump in total riders. With the high numbers near Cincy State, I’m curious how much they’re subsidizing passes for students. BTW, Cincy State claims a distinct proximity to the 25th and 30th largest stops…
30 Biggest stops:
And that’s it for now. Here’s the data! I’ve provided the original files from SORTA and most of what I was able to make from it, including a shapefile of all stops by line with ridership, and one of all stops grouped by name with the total of counts from all lines. There are also a few CSVs and a geo-tiff of the heatmap. 1 I’ll probably add a few more things to the data page later, so be sure to check back. If anyone is able to make anything interesting of the data, I really would love to see it!
And here they are! I haven’t had a chance to really thoroughly pick through them yet. The data is a bit messy and I still need to write a little script to walk through that flat text file to clean it up before doing a join with the list of stop locations. But go ahead and poke around it yourself to see if you can find anything interesting. I’ll be able to get around to making some maps from this and comparing it with the 2009 data later this week or next. I’ll also share it again in a cleaned up format once I finish compiling it.
In related news, SORTA is reporting a 4.2% increase in ridership for 2012 over the previous year. I took a quick look into the National Transit Database to see how this compared to earlier years and immediately ran into a higher figure for unlinked trips in 2011 than was reported in that story for 2012. So I probably need to learn more about the methodology behind both numbers. For now, let’s celebrate a short term increase in transit use, and leave the potential downer of longitudinal context for later. Statistics is messy!