I’m just beginning to play with the new, highly detailed ridership data I got from SORTA, and boy is it a treat. I’ll start here with a high-level overview of the temporal dimension of the data, before looking at spatial aspects and breaking it down by line, stop, service type etc as the summer progresses. I think I may also use this data as the basis of my study in R this coming semester (Hi Michael!), so perhaps we can count on seeing some more detailed and particularly nerdy and multivariate analysis through the cooler months as well. I am, by the way, acutely aware that I’ve started a number of little projects on this blog, and have failed as yet to carry them through to their completion. I keep getting distracted by the realization that I have no idea what I’m talking about, the inevitably illusive prospect of making money some way or another, and the all too comforting thought that no one is reading this or taking it seriously anyway. But hopefully, this is a small enough commitment and certainly it’s interesting enough for me to actually provide a reasonably complete picture of this particular dataset before altogether too long. Perhaps I can even apply the same techniques to the TANK ridership dataset that I’ve been meaning to get to and publish for more than a year.
Anyway, let’s actually get to that temporal overview. Since we know the trip each record belongs to, identified by the line number and the trip’s start time, I was able to identify the actual scheduled time for the great majority of stops in the data set by matching the records to GTFS schedule data for the same period. About 170,000 of the 230,000 records matched to a precise time. The remainder account for a very small portion of total activity, about 2%, and I think it’s most likely that many of these records are an artifact of the way SORTA’s database is structured and not actual stops belonging to a trip. I’ll dig into that more some other time though.
For the ~98% of boardings and alightings that I could pin to a precise time of day, I created a histogram:
As would be expected, alightings(that is, people getting OFF), trail boardings(getting ON) by a half-hour or so. People need to get on and get somewhere before they’ll get off at their destination. The difference therefor is peoples’ travel time.
Anecdotally, the temporal distribution of transit users closely mirrors the distribution of actual service. This is a chicken/egg situation, and it would make good sense to inquire what ridership might look like late at night if service itself didn’t trail off into hourly or half-hourly frequencies where it continues at all past 10pm. There’s also good reason to suspect that changing service levels at one time of day could effect ridership at another. Might we, for example, see differently shaped rush-hour peaks if suburbanites had and got used to having the option of staying late at their downtown office? If service continued all night, might we see echoes of the main rush-hours as second and third-shifters head for work? Might there be a night-life peak if night service weren’t so abysmal?
EDIT: For those of you who don’t have huge computer monitors…