I’m being urged to get my act together regarding my masters thesis. I have a set of datasets I know I want to explore but I need to find a question of sorts that I can quite thoroughly answer with them. I also need to decide what type of person would be good to oversee this project — the ‘committee’ and whatnot. As I so often do, I’ll use you anonymous readers as the spur to set my thoughts to bytes and thereby make rigorous my abstractions.
SO: My dataset is real-time transit data feeds. I don’t care what buses are doing right now unless I’m waiting for them — I care what patterns they’re scratching into our lives. I’ve already demonstrated a Python script that will make random requests from a real-time API and store the results. There exist comparable API’s from other agencies that this script can easily be adapted to. As many agencies as have APIs I could squirrel data from. That’s the dataset or set thereof.
My question has been more difficult to discover. I have so many! Here are a few:
- What is the distribution of delay? How does it vary? Spatially, temporally?
- What kinds of lines/agencies/times have non-random, systematic delay?
- How does the delay spread of ‘good’ transit systems compare to that of ‘bad’ transit systems and what might explain this?
- Good scheduling should minimize systematic delay: what sort of delay remains after that and what might riders learn from it? How should they learn to best accommodate this delay?
- What is the space-time trajectory of a vehicle in various states of delay?
- How different is the delay of lines that don’t mix with traffic?
- What relation does frequency have to delay? At what service frequency can we say quantitatively that schedules should be abandoned and headways maintained instead?
- What is the accuracy of arrival time predictions? What margin of error exists around predictions at various space-time distances?
I suppose the first question is probably my best shot. Though #5 is certainly intriguing. Now on to the lit review I suppose? *deep breath*
And then the committee! Beside my adviser, who is a regular transit user and quantitative geographer, I want another statistician/data-person, and this shouldn’t be too hard to find. I also want someone really good at graphic communication. For that latter, I want someone from DAAP. But I want to be sure that they don’t think or feel or act as though I’ve invited them to proof my presentation while others address it’s content; content is inseparable from presentation. Form does not follow function; rather both form and function must mirror each other. If I fail to make that happen, I will have miscommunicated or misunderstood my project.
Oh dear readers, what would you want to know if you knew, as I may, where all the buses are all the time?