A wee nit to pick from SORTA’s recent “State of Metro'” dog and pony show:
I distinctly remember one of the speakers saying something like ‘and all this without raising fares!’, and this to my feeble memory reeked of bullshit, so I found the numbers again and ran them to see if I was remembering correctly. I was indeed.
Here are the facts, as reported by SORTA to the Federal Transit Administration. Over the period where we have data on both fare revenues and ridership (currently 2002 to 2012) SORTA has been steadily getting more money from fare revenues while moving fewer passengers. We are currently at the nadir of this trend, with
More fare revenue than ever
Fewer passenger trips than ever
When SORTA says by the way that they are a ‘most efficient’ agency, a title pinned on them by the laughably unscientific UC Economics Center, it is precisely this measure they have in mind. There is hardly a better example of doublespeak to be found. Here’s the trend:
In order to plot both agencies together, I normalized fares and passenger trips to the same range. The scale is linear.
Now you may rightly note that the standard fare for a zone 1 trip hasn’t changed lately. But that’s not the only kind of fare that can be paid. It might not even be the most common! I don’t know for certain. I haven’t personally paid standard fare in quite a while because my transit use is partly subsidized by UC. So for example, the fare revenue variable in this data almost certainly includes UC’s cash subsidy for my fare as well as the dollar I put in myself. Multiply that by the dozens of private fare subsidies each agency probably negotiates (or drops) each year and you get a more dynamic picture. Fare could also be effected, though probably isn’t, by people using transit cards more or less, while paying the same monthly price.
But anyway, I’ll be damned if f the total price paid by riders or their agents, on a per-trip basis doesn’t constitute a better definition of ‘fare’ than SORTA’s standard zone-1 single-segment price. And by that definition, fares have risen from $0.76 in 2002 to $1.78 in 2012 (+134%). For TANK, the change is from $0.72 in 2002 to $1.16 in 2012 (+60%). Adjusting for inflation, the changes are 84% and 26% respectively. So much for SORTA’s unchanging fares theory lie.
I’ll end with an ineffectual plea to the people at SORTA. Please, understand that when you speak in lies and euphemisms, no matter how nice your breakfast spread, you turn off clever people and retain only the idiots and the cynical. People from all three of these categories vote, to be sure, but I know who I’d rather spend my time with. And I know who could build the better transit system.
I’m taking a self-guided course in R this semester — that is, teaching myself, but with deadlines — and since I’ve been playing with transit data for the most part, it seems appropriate to tickle y’all with some of the mildly interesting data visualizations that I’ve so far produced.
I’ll be using the 2014 SORTA spatio-temporal ridership dataset, which I’ve already sliced a couple different ways on this blog. The first was here with a set of animated maps andthe second here showing basic peaking in passenger activity through time.
This time, I’m going to take that later analysis a little further by breaking out passenger activity into lines. Go ahead and take a look at the graphic, which I’ll explain in more detail below.
Ok. So first, it’s important to understand what we’re measuring here. Our dataset tells us the average number of people getting on a bus (boarding) and the average number getting off (alighting) for each scheduled stop. There are1 about 162,000 scheduled stops on a weekday. Of those, I was able to identify a precise, scheduled time for all but ~ 2,0002. Of the remaining ~160,000 the dataset tells me that 77,763 have at least 0.1 people boarding or alighting on an average weekday. I used those stops to calculate a weighted density plot over the span of the service day for each route. Added together of course, the individual routes sum to the total ridership for the system3. I then sorted the routes by their total ridership and plotted them.
The first thing that becomes clear, to me at least, is that a minority of SORTA’s lines account for a large majority of actual riders. These lines by the way are precisely the ones featured in the Cincinnati Transit Frequency Map, and I’ve used their color from that map to distinguish them in the chart above. The remaining routes, as I knew even before I had this data, are relatively unimportant.
May 2013 routing
The one grey line mixed in among the colored lines is the m+ (a latecomer to the frequency map), which does actually run all day on weekdays.
Now another interesting question, to me at least, is what this would look like without the pea under the mattress; how large are the rush-hour peaks if we exclude the peak-only lines from the chart? Let’s try it. I’ll also reverse the order, so we can see some of the larger lines with less distortion.Well, the rush-hours are still pretty distinct. More distinct than I would have expected. It’s an open question whether this is the result of more service in the rush-hours, or more crowding at the same level of service.
One last way (for now) to slice the data will be to take the total ridership at any given moment, and relativize each line’s total, showing each line’s percent share of the total. To keep it easy to read, I’ll leave the peak-only lines out of this one too.I found it slightly surprising how straight these lines are. Only toward the end of the day do we see a major wobble in any direction, and that’s essentially the result of a few lines shutting down earlier than the others.
I’m just beginning to play with the new, highly detailed ridership data I got from SORTA, and boy is it a treat. I’ll start here with a high-level overview of the temporal dimension of the data, before looking at spatial aspects and breaking it down by line, stop, service type etc as the summer progresses. I think I may also use this data as the basis of my study in R this coming semester (Hi Michael!), so perhaps we can count on seeing some more detailed and particularly nerdy and multivariate analysis through the cooler months as well. I am, by the way, acutely aware that I’ve started a number of little projects on this blog, and have failed as yet to carry them through to their completion. I keep getting distracted by the realization that I have no idea what I’m talking about, the inevitably illusive prospect of making money some way or another, and the all too comforting thought that no one is reading this or taking it seriously anyway. But hopefully, this is a small enough commitment and certainly it’s interesting enough for me to actually provide a reasonably complete picture of this particular dataset before altogether too long. Perhaps I can even apply the same techniques to the TANK ridership dataset that I’ve been meaning to get to and publish for more than a year.
Anyway, let’s actually get to that temporal overview. Since we know the trip each record belongs to, identified by the line number and the trip’s start time, I was able to identify the actual scheduled time for the great majority of stops in the data set by matching the records to GTFS schedule data for the same period. About 170,000 of the 230,000 records matched to a precise time. The remainder account for a very small portion of total activity, about 2%, and I think it’s most likely that many of these records are an artifact of the way SORTA’s database is structured and not actual stops belonging to a trip. I’ll dig into that more some other time though.
For the ~98% of boardings and alightings that I could pin to a precise time of day, I created a histogram:
As would be expected, alightings(that is, people getting OFF), trail boardings(getting ON) by a half-hour or so. People need to get on and get somewhere before they’ll get off at their destination. The difference therefor is peoples’ travel time.
Anecdotally, the temporal distribution of transit users closely mirrors the distribution of actual service. This is a chicken/egg situation, and it would make good sense to inquire what ridership might look like late at night if service itself didn’t trail off into hourly or half-hourly frequencies where it continues at all past 10pm. There’s also good reason to suspect that changing service levels at one time of day could effect ridership at another. Might we, for example, see differently shaped rush-hour peaks if suburbanites had and got used to having the option of staying late at their downtown office? If service continued all night, might we see echoes of the main rush-hours as second and third-shifters head for work? Might there be a night-life peak if night service weren’t so abysmal?
EDIT: For those of you who don’t have huge computer monitors…
For all the transit researchers and data-junkies out there, we now have a tremendous resource for better understanding ridership patterns in SORTA’s system. I have just taken delivery of this glorious data, the weekday average of boardings, alightings, and bus-load for every scheduled weekday stop for a two month period of 2014.
This means we now have spatial and temporal attributes for average ridership on every part of scheduled weekday service. When people get on, where they get off(and vice-versa) and the average load of a bus at every point in every trip. This can answer questions like:
On average, how many people board the m+ at its XU stop on the morning’s first run at x:xx(some specific time)? How does ridership at this stop develop through the day? How is the overall morning flow of riders different from the afternoon rush? To what extent is ridership skewed to the rush hours and how many people are riding in the off-peak hours?
I have uploaded a zip file on the Data page of this site and will update it with some helpful derivatives once I’ve had a chance to play with the data myself. Please leave a comment if you find this useful, if you can share your own analysis with us, or if you just have any questions about the data I can help answer!
I was taking another look at the old ridership dataset SORTA shared with me last January, when I realized: there are a good many stops that have an average daily ridership of exactly zero1.
There are really a lot of them, and they’re pretty evenly distributed. About 1,000 of them by my count2, compared to ~2,650 with at least some daily riders(above in black). I seem to have missed this before by immediately visualizing all the stops with circles sized according to their total ridership…naturally, these stops simply failed to render.
Click the image above (or here) for a PDF that will let you look up close at the locations of ghost stops throughout the whole system. Red dots are ghost stops, black circles are stops with riders on an average day; their area is proportional to the number of riders. The average day, including weekends, has ~46,100 passenger trips, not counting TANK.
It’s important to note that the presence of these low-to-no rider stops may not be hurting anything if we’re OK with the lines serving them being there in the first place. If no one is getting on or off, the bus probably isn’t slowing down by stopping there.
well, not quite exactly. Total passenger counts for a month were divided by the number of days and rounded to the nearest integer. But basically, if any of these stops had even one person using them more than half the days, they would have been rounded up to > 0 ↩
…which may vary from yours. I’ve aggregated stops that share the same name and or exact location. That means that stops that are paired on opposite sides of the streets were usually lumped together. ↩
It’s October now, eight months since I first touched on annual ridership figures and it’s about time for a little update. I only have recent numbers from SORTA at the moment (and here they are) so that’s all I’ll be able to touch on for now. Overall, ridership is holding steady, fluctuating down a bit this year:
The total number of rides in 2013 is down 3.6% from the same period in 2012.(beginning of January through the end of September). If this continues to the end of the year, we’ll be about where we were in 2011, with the lowest annual ridership we’ve seen since the National Transit Database started tracking these things in 1991: about 16.8 million unlinked trips.1
The slight decrease doesn’t seem to be related to the service changes that took place in August; The month prior to the changes (July) shows a total decrease of 4% YTD so if anything, things have been on the upswing since the shift.
A few random interesting numbers:
The M+ has made 34,791 unlinked trips2 between it’s start in mid August and the end of September. In September the M+ made 4.2 times as many trips as the #1, which was itself down 48% from the previous year due to route changes that significantly shortened it.
The #51, with significant route changes, saw a 38% increase to 44,000 trips from Sept 2012 to Sept 2013. The #41 which was similarly extended further across the city did not see significant change (-0.3%) for the same period.
SORTA was unable to provide data at the stop level.
This is likely also the lowest ridership has been since it was still on it’s way up in the early 19th century, though I can’t say that with complete certainty. ↩
An ‘unlinked trip’ is the basic unit of measure used here and pretty much everywhere else. It is one person getting on a bus once and getting off somewhere else. Most of what actual people call ‘trips’ will consist of two or more ‘unlinked trips’: The trip there and the trip back and any transfers in between. The ‘link’ in ‘unlinked’ refers to that joining of two or more discrete data points. ↩