August 27, 2014
Canarsie Tops List of Most Flooded Neighborhoods According to ClaimStat Data

In a quest to do for claims against the city what Crime Stat did for crime, Scott Stringer recently released ClaimStat. It maps out claims against the city, including the date of occurrence, type of claim, location and amount paid out in the claim.

It’s a great step for transparency, but once again the city failed to release the data in a machine readable format.  But fear not: the amazing Chris Whong (of Taxi Visualization fame and an organizer of BetaNYC, NYC’s network of civic-minded hackers who are opening government data) posted a quick how to on extracting data files from the maps.  So with that, I was off and running.

Today’s post is on Sewer Overflow Claims. What are Sewer Overflow Claims you ask?  Well, the city’s network of sewers can only handle so much water at once.  During large storm events, they can overflow and back up onto city streets or they can occasionally get clogged for other reasons.  When the sewers do overflow, the water can cause damage to property and thus property owners can file a claim against the city for that damage. The good news in all of this is that the claims data can help us identify where flooding from sewer overflow is happening the most.  If we can identify the worst offenders, the DEP can better target infrastructure projects. 

First some quick stats.   In the two years present in the data, there were 1,168 claims filed.  The bulk of the claims were in Brooklyn and Staten Island:


The average payout for those claims which have been paid is around $4,000.

Scott Stringer’s ClaimStat report explores flooding by Community District. I decided to take a more detailed look at the underlying neighborhood data given that Community Districts span many neighborhoods.   So I split the data into Neighborhood Tabulation Areas (NTAs), which are neighborhood designations used by the Department of City Planning for population projections.  I then counted the number of claims filed in each area over the two year period to get a view of which neighborhoods experience the most flooding:


The results show that Canarsie fairs worst, with 16% of all claims.  But second behind it is the Bergen Beach NTA, which is adjacent to Canarsie in Brooklyn, and third is the Sheepshead Bay NTA, which is just south of that.  These three contiguous NTA’s make up almost a third of all Sewer Overflow Claims in the city.  The top 10 neighborhood areas on the list made up half of all claims, making these prime areas to reinvest in new infrastructure.

To get a different view,  I created a heat map, or should I say a wet-map, of all the claims in the city:


The map clearly identifies hot- errr I mean wet-spots to focus on as far as mitigating these issues.

I was also curious to see how consistently the flooding hits the same neighborhoods.  So I chose the three largest storms in the data: Hurricane Irene, Hurricane Sandy, and a record breaking rainstorm that hit the city a few weeks before Hurricane Irene.  

I mapped out the claims for each storm below, as well as all the remaining claims that were not one of the three storms.  


Interestingly, although there is some overlap, there are also distinct areas affected by each storm. To look at it another way, we can break those same three storms down in a table to see their individual effects on the neighborhoods with the most claims: 


Only Canarsie and the Bergen Beach NTA were hit severely by all three storms. The other top flooding neighborhoods seem to be storm-dependent.    

One more way to look at the severity of flooding is to explore the number of unique days in that claims occurred, e.g. if 10 claims were filed in one day it would still only count as one in this metric.  


Once again, Canarsie tops the list.  Sewer overflows that resulted in claims happen about every 5 weeks on average there. But the next five NTAs are in Staten Island according to this measure. 

So the conclusions?  Well Canarsie, Canarsie, Canarsie for one.  It appeared on the top of the list no matter how I sliced it.  It’s pretty clear that the sewer infrastructure needs upgrading there and I hope the DEP has it on the top of its list as it is by far the area with the most flooding claims. Other than that, several neighborhoods seem to vie for second place depending on what measure you use. Neighborhoods in South East Brooklyn, as well as many along the water in Staten Island seem like good contenders.  

It’s important to note that while increased claims indicate increased flooding, its not necessarily true that increased flooding will create increased claims in all affected neighborhoods .  One could imagine that there might be information gaps around this issue, leading some neighborhoods to be undercounted in these numbers.  Also, there is flooding not attributed to sewer overflow.  So as always, this data should not be looked at in isolation.  

And finally a closing thought: reducing the number of claims against the city can only be a good thing. No one wants our fine city to end up under water. 

Neighborhood Tabulation Area Data here.
Claims data here with extraction instructions here
Maps made in QGIS with Open Layers Plugin for Google Maps, Raster Plugin for Heat Map.
Charts made in Excel.

August 22, 2014
Analyzing New York’s Acquisition of Surplus Military Equipment

The recent unrest in Furguson, MO has raised questions about the use of surplus military equipment by local police forces. Several months ago, The New York Times obtained from the Pentagon a list of where excess military equipment given to state and local law enforcement ended up.  

The Times used the data it in a few  articles, but they also released the data on a GitHub account.  It’s amazing how rare it is for a news organization to release the raw data behind its stories, even though the data is technically public once released.  I praise the Times for this, and I really hope to see more of this from them and that other news organizations follow their lead.

And if they put the data out there, we have to use it right?  

At I Quant NY, I usually look at New York City data, but in this data set there was not much to be found.  The city received two items: An Armored Truck (worth $65,000) and a 107 mm Mortar (worth $205,000).  You do not want to be on the receiving end of that thing.  NYC has the largest police force in the country, and so they likely have purchased a lot of their own equipment. Those purchases would not end up in this data set, and so we can’t use it to see much about the cities equipment.

Since the city data did not give me much to analyze, I decided to explore the distribution statewide.  After all, the site is not called I Quant NYC.

The map below shows the total number of dollars in military supplies sent to each county in New York State between 2006 and mid-2014 under the Defense Department’s 1033 program:

About 25 million dollars in supplies were distributed throughout the state. And which county topped the list? Albany.  That was followed by two counties bordering New York City, Nassau and Westchester, which are two of the most populous counties in the state outside the city (only behind Suffolk).  It might make sense that more populous counties need more police resources, but that does not explain Albany’s position.  The table below shows the distribution of equipment, as well as a value normalized per numbers of residents in the county.  


Even on a population adjusted basis, Albany is still in the top three recipients. But the most dollars of equipment per resident went to Hamilton and Clinton counties.

Given that, I was  curious to learn what all this equipment was, so I looked at the equipment which cost the most in aggregate:  


The state as a whole received lots of vehicles, but also 55 night vision goggles and one half-a-million dollar “combat/assault/tactical” vehicle that found its way to Broome County.  So keep yours eyes peeled for that Binghamton!  And that’s not to mention the eight mine resistant vehicles that went to 8 different counties, and the 293 military rifles (Either 5.56mm and 7.62mm).

Digging in on specific equipment per county, it turns out that a mine resistant vehicle and 4 trucks propelled Hamilton County to the top of the per resident list, and two trucks made tiny Clinton County, home to less than 5 thousand people, second.  What made Albany number three on a per resident basis? Well, 98 rifles, 4 utility trucks, 49 pairs of night vision goggles, a mine resistant vehicle, an air plane and a helicopter probably helped.   

As a reminder, the data does not show what equipment goes to state vs local law enforcement, so it’s not clear where this is all ending up.  The extra equipment in Albany may be going to State Police.  But what is clear is that our state capital made out pretty well in all of this.   So one thing is for sure.  The next time there is a Battle of Saratoga, we know one neighboring county who can help out!  (Especially if the battle is at night.) 

Analysis done in Excel and QGIS.
County Shape Files for mapping here.
Data on equipment here.


Filed under: statepolice nypd foil opendata nyc 
August 18, 2014
Fatal Cyclist Accident this Morning was Tragically Predictable

Tragically, a cyclist was killed this morning at the intersection of 108th St and Park Avenue in New York City.  (DNAInfo, Gothamist, h/t StreetsBlog).

Each time I see something like that happen in the city, I ask myself “was this predictable?”  Obviously no single accident is, but with some data we can identify hot spots and fix them to reduce fatalities.

Looking at this mornings accident location (thanks to NYC Open Data) showed something very surprising:

  • In 2012, there were 20 cyclist fatalities in NYC.  At least one of them was at or next to that intersection. (Public data only goes back to July) 
  • In 2013, there were 13 cyclist fatalities in NYC.  One of them was at or next to that intersection.
  • In 2014, there will be ?? fatalities of cyclists in NYC.  At least one of them will have been at or next to that intersection

Over this period of time, these were the only cycling fatalities in the zip code 10029, and they were all in the same place.

We can also see a pattern of cyclist injuries at other nearby intersections on the same route:


Obviously no single accident is predictable at any given time, but if we want to get serious about VisionZero, let’s make patterns like these repeat less often.

Related: I Quant NY posts on VisionZero, cycling or the DOT.

Cyclist Injury Data is here: Jul 2012-May 2014
Fatality counts found here

I continue to ask the DOT to release more history on the NYC Open Data portal. It currently only goes back to July, 2012, which is unfortunate and limits statistical significance on any studies.

Filed under: visionzero dot cycling 
August 14, 2014
Interborough (Less) Rapid Transit: Analyzing Citi Bike Treks Over East River Bridges

If you have tried riding a Citi Bike over the East River, you know that it’s not easy. The bikes weigh quite a bit and the  East River bridges that make up your three options all climb about 100 feet in vertical from either side. Nevertheless, when I cross one of the bridges I usually see a handful of brave blue bikers.  This got me thinking: How often does that happen? How many people really brave the largest Citi Bike climbs in the system?

The answer:  Surprisingly many.  The data shows that Citi Bikers have used peddle power to cross one of the three bridges 383,125 times from July 2013 to May 2014 or about 1,100 times a day.   That averages to about 1 rider every minute and 15 seconds, 24 hours a day!  (Obviously this is much higher in the summer than in the winter, and during the day, but I’m average it all together here.)

These trips make up over 5% of all trips in the system. Slightly more people used the bikes to go from Manhattan to Brooklyn (199,491 trips) than from Brooklyn to Manhattan (183,634 trips).

I made a little Markov Chain to show the riders movements:

The result shows that about a third of all riders who start in Brooklyn are Manhattan bound, while only about 3% of Manhattan Riders are Brooklyn bound.

The distribution changes radically depending on the hour of the day though. We can see that a big chunk of Citi Bikers are clearly commuters:

Interestingly, from 5AM to noon, the flow is towards Manhattan, but at all other hours it is towards Brooklyn.  It is also interesting to note that the evening rush hour peak happens in the 5PM bucket for those leaving Brooklyn but in the 6PM bucket for those leaving Manhattan, even though they share the same 8AM peak in the morning.

The dates with the most crossings in history were: August 10th (2,961 trips), August 17th (2,948 trips) and August 3rd ( 2,752  trips.).  What do these all have in common?  Summer Streets!   That shows that if you put the cars away for a bit, the bikers come out.

The top 5 destination for riders from Manhattan to Brooklyn are spread around Brooklyn:

The top 5 going the other way are all below 14th Street:

So to summarize, a whole lot of riders brave the climb.  If your on one of the bridges just expect to see a bike rider “out of the blue”.

More Citi Bike Analysis from I Quant NY is here.

-Citi Bike System Data (July 2013-May 2014) available here.
-Borough’s were applied to stations using QGIS intersection on Borough Boundaries shape file here.
-Analysis done with IPython/Pandas
-Charts made in Excel, Powerpoint

Filed under: citibike dot cycling 
August 12, 2014
In Search of the Safest Citi Bike Stations using Open Data

Since Citi Bike’s launch, there have been no recorded cycling fatalities on one of their bikes.  For Citi Bike riders like me, that’s good news in a system that some said would lead to a bicycling apocalypse

That being said, some people are still nervous to hop on the blue stallions and take a ride, so I decided to crunch some numbers to find the stations in the system that have had the least/most number of reported cycling injuries around them from Jan 2013 - May 2014.  (Note that these are not Citi Bike specific injuries, but rather all cycling injuries reported to the police involving a Motor Vehicle)

I paired Citi Bike station data with vehicle/bicycle collision data from the city’s Open Data portal.  For each accident resulting in a death or injury, I mapped it to the closest Citi Bike station.  That gave an injury count per station. I then divided the result by the total area covered by that station. The results in an injury-per-square-foot metric. 

The following table displays the top 10 stations for near-by injury density. The index given is compared to the top station, which is given a 1.0.  I also supply the raw count of accidents.  


The findings?  Three of the top ten stations with the highest injury density are located just off the Williamsburg Bridge.  Note that we can’t control for bicycle traffic here, so this does not mean that any given rider has a higher chance of getting an injury;  more riders means more injuries. That being said, any one who has biked near the exit of the Williamsburg Bridge knows it’s a bit of a tricky place.  Other top stations are listed above, including three on Broadway.  

The 32 stations that have not had a single reported injury nearby since 2013 are as follows: 


Many of these stations are in Brooklyn or line the edge of the city, near the waterfront. Presumably, people biking along the river are more likely to use the bike paths there and thus there are less injuries.

To find the safest among those stations, I chose the one that was farthest from any injury at all in the data set.  The winner was Columbia Heights & Cranberry St in Brooklyn, but up second was E 20th St and FDR Drive.   Both of these were at least half a kilometer from the nearest accident . 

Lastly, I mapped the injury index for each station below.  Darker colors indicate more injuries.  Clicking on any area gives you the station address that is closest to that point as well as the index value and raw accident count for that station:

So if you have a friend who is saying that they are too scared to Citi Bike in NYC, bring them to one of the 32 stations on the list above.  You can reliably say “Since the launch of Citi Bike, no Citi Biker (and even no cyclist!) has gotten in an accident with a car around this station… I Quant NY says so!”  Who knows- maybe that will give them enough confidence to give it a shot.  

Also on I Quant NY: More posts on Citi Bike and Vision Zero.

Tools used above: QGIS, Pandas, iPython, Excel
Cyclist Injury Data: Jan 2013-May 2014

Note that the map does not normalize for cycling traffic and only tracks injuries reported to the police.

Liked posts on Tumblr: More liked posts »