I’m taking a self-guided course in R this semester — that is, teaching myself, but with deadlines — and since I’ve been playing with transit data for the most part, it seems appropriate to tickle y’all with some of the mildly interesting data visualizations that I’ve so far produced.
I’ll be using the 2014 SORTA spatio-temporal ridership dataset, which I’ve already sliced a couple different ways on this blog. The first was here with a set of animated maps andthe second here showing basic peaking in passenger activity through time.
This time, I’m going to take that later analysis a little further by breaking out passenger activity into lines. Go ahead and take a look at the graphic, which I’ll explain in more detail below.
Ok. So first, it’s important to understand what we’re measuring here. Our dataset tells us the average number of people getting on a bus (boarding) and the average number getting off (alighting) for each scheduled stop. There are1 about 162,000 scheduled stops on a weekday. Of those, I was able to identify a precise, scheduled time for all but ~ 2,0002. Of the remaining ~160,000 the dataset tells me that 77,763 have at least 0.1 people boarding or alighting on an average weekday. I used those stops to calculate a weighted density plot over the span of the service day for each route. Added together of course, the individual routes sum to the total ridership for the system3. I then sorted the routes by their total ridership and plotted them.
The first thing that becomes clear, to me at least, is that a minority of SORTA’s lines account for a large majority of actual riders. These lines by the way are precisely the ones featured in the Cincinnati Transit Frequency Map, and I’ve used their color from that map to distinguish them in the chart above. The remaining routes, as I knew even before I had this data, are relatively unimportant.
The one grey line mixed in among the colored lines is the m+ (a latecomer to the frequency map), which does actually run all day on weekdays.
Now another interesting question, to me at least, is what this would look like without the pea under the mattress; how large are the rush-hour peaks if we exclude the peak-only lines from the chart? Let’s try it. I’ll also reverse the order, so we can see some of the larger lines with less distortion.Well, the rush-hours are still pretty distinct. More distinct than I would have expected. It’s an open question whether this is the result of more service in the rush-hours, or more crowding at the same level of service.
One last way (for now) to slice the data will be to take the total ridership at any given moment, and relativize each line’s total, showing each line’s percent share of the total. To keep it easy to read, I’ll leave the peak-only lines out of this one too.I found it slightly surprising how straight these lines are. Only toward the end of the day do we see a major wobble in any direction, and that’s essentially the result of a few lines shutting down earlier than the others.