I’m being urged to get my act together regarding my masters thesis. I have a set of datasets I know I want to explore but I need to find a question of sorts that I can quite thoroughly answer with them. I also need to decide what type of person would be good to oversee this project — the ‘committee’ and whatnot. As I so often do, I’ll use you anonymous readers as the spur to set my thoughts to bytes and thereby make rigorous my abstractions.
SO: My dataset is real-time transit data feeds. I don’t care what buses are doing right now unless I’m waiting for them — I care what patterns they’re scratching into our lives. I’ve already demonstrated a Python script that will make random requests from a real-time API and store the results. There exist comparable API’s from other agencies that this script can easily be adapted to. As many agencies as have APIs I could squirrel data from. That’s the dataset or set thereof.
My question has been more difficult to discover. I have so many! Here are a few:
I suppose the first question is probably my best shot. Though #5 is certainly intriguing. Now on to the lit review I suppose? *deep breath*
And then the committee! Beside my adviser, who is a regular transit user and quantitative geographer, I want another statistician/data-person, and this shouldn’t be too hard to find. I also want someone really good at graphic communication. For that latter, I want someone from DAAP. But I want to be sure that they don’t think or feel or act as though I’ve invited them to proof my presentation while others address it’s content; content is inseparable from presentation. Form does not follow function; rather both form and function must mirror each other. If I fail to make that happen, I will have miscommunicated or misunderstood my project.
Oh dear readers, what would you want to know if you knew, as I may, where all the buses are all the time?
You gain a lot of personal motivation in exploring inefficiencies in service deployment so no doubt your question is going to explore a dataset rich in performance measures that are relatable to passenger experience.
“Delay” is going to be a tricky topic to work with for two reasons:
1) Delay is relative, and
2) Delay is a symptom of a different problem: reliability of the service in terms of on-time performance and travel-time competitiveness.
There is definitely a fine line to be walked in scheduling a service that “usually on time” and “never early”. To be safe most schedulers pad the schedules with a few extra minutes for those days when traffic sucks, but that means on good days operators are trying to catch red lights or worse – pulling offline to let a few minutes pass. It’s so much easier for an agency to run a bit slower than to manage deploying reserve buses when a trip falls too far behind. But we know that in aggregate a few extra minutes across several hundred trips per day is a lot of time; time that we might be able to reinvest in other services.
I’d love to see a question like one of these:
“How much faster could trips be scheduled without significantly impacting service reliability?” or
“How much tolerance does the typical rider have to reduced on-time performance if it resulted in perceptively faster travel times?”
As always, if there’s anything I can do to assist shoot me an email.
I will definitely have some questions for you! I’ve been dwelling on your comment for a few days now and I think you may have just put me on exactly the right course…
Here’s a new short version of the current question, which I’m currently writing and re-writing:
Where and when is scheduled and unscheduled delay? Particularly, since I don’t trust planners, how can this be visualized and explained in a way that let’s lay people understand and advocate for delay-reduction strategies in spots where they could have the greatest effect?
Basically, your first question, but assuming that the way out is infrastructure changes that mitigate or circumvent the source of delay. Bus-only lanes or flyovers or signal prioritization, or stop dispersion, etc.