## Recursive grids for density measures

I’ve been wanting to do this for ages… and I finally did!

The idea is this:

1. Create a square.
2. If the square has more than n things inside it, divide it into four parts.
3. Do the same for each of those squares and so on recursively.

My recent pet project1, cul-de-sacs (threshold = 10 or less per):

Dayton got a bit cut off, unfortunately. Trust me though: it has plenty of sprawl! This image sort of gives you an idea how the process works.

Population density, based on 2010 census blocks (threshold = 100 people or less per):

14 levels deep! This one took a couple hours. Click through for a larger image.

Many more to come, I’m sure, now that I’ve written the script…

Show 1 footnote

1. I love the idea of a density measure of sprawl!
Posted in: Data | Maps | Silly Bullshit
Tags: | | |

## Another Stab at the 2014 Ridership Dataset

I’m taking a self-guided course in R this semester — that is, teaching myself, but with deadlines — and since I’ve been playing with transit data for the most part, it seems appropriate to tickle y’all with some of the mildly interesting data visualizations that I’ve so far produced.

I’ll be using the 2014 SORTA spatio-temporal ridership dataset, which I’ve already sliced a couple different ways on this blog. The first was here with a set of animated maps andthe second here showing basic peaking in passenger activity through time.

This time, I’m going to take that later analysis a little further by breaking out passenger activity into lines. Go ahead and take a look at the graphic, which I’ll explain in more detail below.

Ok. So first, it’s important to understand what we’re measuring here. Our dataset tells us the average number of people getting on a bus (boarding) and the average number getting off (alighting) for each scheduled stop. There are1 about 162,000 scheduled stops on a weekday. Of those, I was able to identify a precise, scheduled time for all but ~ 2,0002. Of the remaining ~160,000 the dataset tells me that 77,763 have at least 0.1 people boarding or alighting on an average weekday. I used those stops to calculate a weighted density plot over the span of the service day for each route. Added together of course, the individual routes sum to the total ridership for the system3.  I then sorted the routes by their total ridership and plotted them.

The first thing that becomes clear, to me at least, is that a minority of SORTA’s lines account for a large majority of actual riders. These lines by the way are precisely the ones featured in the Cincinnati Transit Frequency Map, and I’ve used their color from that map to distinguish them in the chart above. The remaining routes, as I knew even before I had this data, are relatively unimportant.

May 2013 routing

The one grey line mixed in among the colored lines is the m+ (a latecomer to the frequency map), which does actually run all day on weekdays.

Now another interesting question, to me at least, is what this would look like without the pea under the mattress; how large are the rush-hour peaks if we exclude the peak-only lines from the chart? Let’s try it. I’ll also reverse the order, so we can see some of the larger lines with less distortion.Well, the rush-hours are still pretty distinct. More distinct than I would have expected. It’s an open question whether this is the result of more service in the rush-hours, or more crowding at the same level of service.

One last way (for now) to slice the data will be to take the total ridership at any given moment, and relativize each line’s total, showing each line’s percent share of the total. To keep it easy to read, I’ll leave the peak-only lines out of this one too.I found it slightly surprising how straight these lines are. Only toward the end of the day do we see a major wobble in any direction, and that’s essentially the result of a few lines shutting down earlier than the others.

Show 3 footnotes

1. or were when this data was collected
2. These ~2,000 stops seem to account for about 1,000 passengers
3. Minus the missing values for the records which couldn’t be matched.
Posted in: Data
Tags: | | | |

## Connectivity

I was trying to de-clutter a streetmap I’m making and I found some interesting patterns along the way :-)

Cool colors are areas with more dead-ending street segments and warm colors indicate more connected streets than otherwise. That big top blob is Dayton, the lower Cincinnati sitting on the Ohio River. Disconnected streets simply cancel out connecting streets, so you can sort of consider this corrected for density.

Here is the relative intensity of dead-ending streets by themselves:

Since most streets connect to others at both ends1, the inverse of the above map doesn’t show much that the first one didn’t. It’s interesting to note the distinctly different patterns here. Clusters of connecting streets, many of their more intense appearances in gridded arrangements, form relatively distinct places. You can easily make out Hamilton, Middleton, Richmond or Oxford in the first map if you know where to look. The disconnected streets though seem to really blur recognizable places, totally changing the shape of Cincinnati and smearing it into Dayton, a visible connection not so apparent in the first map.

What’s going on in Kentucky? The rural area south of Cincinnati is a lot hillier than that to the North and there are a lot of long streets that branch out along the tops of hills and then end where the hills themselves do. In flatter places, such streets would pretty naturally just continue straight on until they met the next road.

Since you’re probably wondering if you made it this far just what counts as a connecting street, it’s a segment that connects to another at both ends. In fact, here they are below. You’re gonna want to click the image for the full resolution.  Red is connecting, blue unconnected.

Technical stuff:

Step-by-step:

1. Create a routable topology from OSM data using osm2po.
• Identify nodes(‘source’ & ‘target’ fields) that are connected to only one edge
• Identify the edges that are connected to those nodes
• Isolate those edges from the rest of the network and recurse until everything you have left is connected at both ends. This took me about 20 iterations for this dataset and identified ~81,000 segments out of ~300,000
3. Create a centroid geometry from the linear geometry of the edges
4. Calculate a weight for each edge as it’s distance in miles, signed negatively for the dead-ending segments identified in #2
5. Compute a kernel density surface using the centroids and weight values. I used an 8KM radius, and tri-weight kernels with the QGIS raster plugin which I think is simply a GUI for GDAL.

And then I made it kind of pretty :-)

Some problems:

1. Some very long dead-ending segments appeared around the edges as a result of clipping the original dataset out of it’s global context. Concentrating their weight in a centroid resulted in strongly negative spots which simply shouldn’t exist.
2. Lines that turned back on themselves, or sub-networks of streets which where ultimately connected to the main network by only one edge, and which may thus reasonably be considered entirely dead ends were not identified at all.
3. OSM data in the US is mainly derived from low-quality TIGER data that was imported several years ago. Many rural areas seem to have an enormous number of driveway type paths identified, many of them mislabelled as residential streets. There are also some places where actual suburban driveways have been identified as dead-ends, which may or may not be misleading to some degree. Most of these however are very short and so their weight shouldn’t be overwhelming. Though that huge negative area West of Dayton is Brookville, where someone seems to have added driveways for every house in town.

Show 1 footnote

1. about 73% when weighted by their lengths