1 Boarding 2 Next Page 3 4

GIF of the day

Just another way of looking at speed distributions. This shows the distribution of observed speeds across several thousand route segments by percentile of the speeds on each segment. The animation changes as we’re shown different parts of the speed distribution on each segment.

relative speed distributions on route segmentsNot totally sure if it’s useful yet, but it is kind of pretty.

Comments: Leave one?
Posted in: Data
Tags: | | | |

“without raising fares”

A wee nit to pick from SORTA’s recent “State of Metro'” dog and pony show:
I distinctly remember one of the speakers saying something like ‘and all this without raising fares!’, and this to my feeble memory reeked of bullshit, so I found the numbers again and ran them to see if I was remembering correctly. I was indeed.

Here are the facts, as reported by SORTA to the Federal Transit Administration. Over the period where we have data on both fare revenues and ridership (currently 2002 to 2012) SORTA has been steadily getting more money from fare revenues while moving fewer passengers. We are currently at the nadir of this trend, with

  1. More fare revenue than ever
  2. Fewer passenger trips than ever

When SORTA says by the way that they are a ‘most efficient’ agency, a title pinned on them by the laughably unscientific UC Economics Center, it is precisely this measure they have in mind. There is hardly a better example of doublespeak to be found. Here’s the trend:

a shitty decade for transit

In order to plot both agencies together, I normalized fares and passenger trips to the same range. The scale is linear.

Now you may rightly note that the standard fare for a zone 1 trip hasn’t changed lately. But that’s not the only kind of fare that can be paid. It might not even be the most common! I don’t know for certain. I haven’t personally paid standard fare in quite a while because my transit use is partly subsidized by UC. So for example, the fare revenue variable in this data almost certainly includes UC’s cash subsidy for my fare as well as the dollar I put in myself. Multiply that by the dozens of private fare subsidies each agency probably negotiates (or drops) each year and you get a more dynamic picture. Fare could also be effected, though probably isn’t, by people using transit cards more or less, while paying the same monthly price.

But anyway, I’ll be damned if f the total price paid by riders or their agents, on a per-trip basis doesn’t constitute a better definition of ‘fare’ than SORTA’s standard zone-1 single-segment price. And by that definition, fares have risen from $0.76 in 2002 to $1.78 in 2012 (+134%). For TANK, the change is from $0.72 in 2002 to $1.16 in 2012 (+60%). Adjusting for inflation, the changes are 84% and 26% respectively. So much for SORTA’s unchanging fares theory lie.

—-

I’ll end with an ineffectual plea to the people at SORTA. Please, understand that when you speak in lies and euphemisms, no matter how nice your breakfast spread,  you turn off clever people and retain only the idiots and the cynical. People from all three of these categories vote, to be sure, but I know who I’d rather spend my time with. And I know who could build the better transit system.

Comments: 2
Posted in: Analysis | Data | Politics
Tags: | | | | |

Another Stab at the 2014 Ridership Dataset

I’m taking a self-guided course in R this semester — that is, teaching myself, but with deadlines — and since I’ve been playing with transit data for the most part, it seems appropriate to tickle y’all with some of the mildly interesting data visualizations that I’ve so far produced.

I’ll be using the 2014 SORTA spatio-temporal ridership dataset, which I’ve already sliced a couple different ways on this blog. The first was here with a set of animated maps andthe second here showing basic peaking in passenger activity through time.

This time, I’m going to take that later analysis a little further by breaking out passenger activity into lines. Go ahead and take a look at the graphic, which I’ll explain in more detail below.

SORTA 2014 ridership by time and lineOk. So first, it’s important to understand what we’re measuring here. Our dataset tells us the average number of people getting on a bus (boarding) and the average number getting off (alighting) for each scheduled stop. There are1 about 162,000 scheduled stops on a weekday. Of those, I was able to identify a precise, scheduled time for all but ~ 2,0002. Of the remaining ~160,000 the dataset tells me that 77,763 have at least 0.1 people boarding or alighting on an average weekday. I used those stops to calculate a weighted density plot over the span of the service day for each route. Added together of course, the individual routes sum to the total ridership for the system3.  I then sorted the routes by their total ridership and plotted them.

The first thing that becomes clear, to me at least, is that a minority of SORTA’s lines account for a large majority of actual riders. These lines by the way are precisely the ones featured in the Cincinnati Transit Frequency Map, and I’ve used their color from that map to distinguish them in the chart above. The remaining routes, as I knew even before I had this data, are relatively unimportant.

transit map of Uptown Cincinnati

May 2013 routing

The one grey line mixed in among the colored lines is the m+ (a latecomer to the frequency map), which does actually run all day on weekdays.

Now another interesting question, to me at least, is what this would look like without the pea under the mattress; how large are the rush-hour peaks if we exclude the peak-only lines from the chart? Let’s try it. I’ll also reverse the order, so we can see some of the larger lines with less distortion.SORTA 2014 ridership by time and line no-expressWell, the rush-hours are still pretty distinct. More distinct than I would have expected. It’s an open question whether this is the result of more service in the rush-hours, or more crowding at the same level of service.

One last way (for now) to slice the data will be to take the total ridership at any given moment, and relativize each line’s total, showing each line’s percent share of the total. To keep it easy to read, I’ll leave the peak-only lines out of this one too.SORTA all-day lines ridership flow diagramI found it slightly surprising how straight these lines are. Only toward the end of the day do we see a major wobble in any direction, and that’s essentially the result of a few lines shutting down earlier than the others.

Show 3 footnotes

  1. or were when this data was collected
  2. These ~2,000 stops seem to account for about 1,000 passengers
  3. Minus the missing values for the records which couldn’t be matched.
Comments: Leave one?
Posted in: Data
Tags: | | | |

Tree Map!

Just dicking around with R a bit this afternoon…here is a tree map showing the number of weekday transit riders in each city neighborhood (area) compared to the size of the neighborhood (color).

Interpretation:
1. Downtown has many transit riders (because it’s chart area is proportional to that number)
2. Downtown has many riders for it’s geographic size (because it’s dark green in the chart)

SORTA 2013 rdership tree mapData is from SORTA, 2013 and is available on the data page. Any stop outside of the City limits was classified as ‘other’.

Comments: 2
Posted in: Data
Tags: | |

Delay Variance Through Time

Just some preliminary results from my first attempt to archive real-time bus data, these from the Champaign-Urbana ‘Mass Transit’ District:

Animated Delay Variance

This is a look at variance in the distribution of delayed buses throughout the day. It’s only looking at off-schedule buses right now so we aren’t seeing any change in the proportion of precisely on-time buses (if there is any such change). The little clock in the top right is the time of day the event was recorded, and right below that is the number of events used to ascertain the momentary distribution. For now, this sample size is as much a reflection of when I was running my computer as it is of schedule frequency.

I assumed that the first and last percentiles of the overall distribution were outliers and clipped them off.

I don’t actually see much going on here except random fluctuation, but I suspect this will get much more interesting with larger samples and many more and more diverse agencies included. I’m actually quite eager to see what that turns up! Right now, I’m working on developing a tool to query APIs from the Toronto Transit Commission and Philly’s SEPTA. Unfortunately, because neither Cincinnati transit agency is willing to share their data, which they’ve been collecting and using for years, it will be impossible to include them in this sort of analysis.

Comments: Leave one?
Posted in: Data
Tags: | | | | |

A very preliminary bus tardiness distribution

Briefly following up on the previous post:

I’ve started working with Champaign-Urbana’s real-time departure API. Right now, I’m using a little Python script to send requests and store them in a local PostgreSQL database. Below is a probability density plot from the first 1,000 or so data points I’ve pulled down. It’s only from the weekday mornings when I’ve run the script, mostly from the 150 stops I queried just a moment ago.

A very preliminary tardiness density plot

But it looks like my prediction (see the earlier post) may not have been terribly far from the mark.

The next step is to determine a programmatic way to randomly query particular arrivals to make sure I avoid any systematic error in the sampling. This is necessary because I’m limited to 1,000 API calls per day and can’t just hammer their server with requests for every scheduled arrival.

I’m also recording location attributes for all these records so I’ll be able to do some spatial analysis too :-)

Comments: 2
Posted in: Data
Tags: | | |
1 Boarding 2 Next Page 3 4