NYPD Goes Out of Its Way to Obscure Street Safety Data

It took several months and some needling from the Daily News, but NYPD this week finally complied with a new city law requiring the department to release data on traffic crashes. Unless you have the time and resources to comb through and analyze hundreds of pages of rows and columns, however, good luck getting much use out of it.

Traffic crash data as it appears on CrashStat...

The Saving Lives Through Better Information Act was passed by the City Council, over the objections of NYPD, in February. The intent was to provide the public with monthly data on summonses and traffic crashes, with the crash data “searchable by intersection” and “disaggregated by the number of motorists and/or injured passengers, bicyclists and pedestrians involved; and the apparent human contributing factor or factors involved.” If you want to find out how many crashes occur in your neighborhood, what’s causing them, and which streets are the most dangerous, this law is supposed to help you.

...and as it appears in NYPD's data dump.

But those looking to learn what’s happening on their streets will have their work cut out for them. Far from the model developed by Transportation Alternatives for CrashStat, which plugs geo-coded crash information into a map of NYC, the initial NYPD crash data dump consists of six PDF files. You can’t search a map for the intersections near your home, school, or office.

Here’s the punchline: You can scour a 300-page doc for the streets near you, and with sufficient time and effort, eventually piece together what’s going on — until the next month, when your slog begins anew. Data on summonses, meanwhile, are issued in a separate report in much the same format. If you want to know where drivers are getting ticketed for violating traffic laws, these documents won’t help you.

“If the NYPD really wanted to make the streets safer for pedestrians, it would make it easier to look at the agency’s traffic data and not bury the public in hard to search documents,” said Gene Russianoff, staff attorney for the New York Public Interest Research Group.

That would have been simple enough, according to Noel Hidalgo, co-organizer of the Open NY Forum and former director of technology innovation for the New York State Senate. “NYPD crash data should be going into the city’s data mine, where every other city agency places their data,” he said. “It’s three years old. From the Department of Education to public safety, data should be coming through a centralized website.”

Hidalgo said NYPD could release geo-coded crash data using MapPLUTO, which the Department of Information Technology and Telecommunications has put together to serve as a common platform for all city agencies. “This is like the 17th century of information technology,” he said. “If anything, this adds more confusion to incident reports, and I can promise you that displaying information in this manner adds more burden to the NYPD.”

Juan Martinez, general counsel for Transportation Alternatives, questioned whether the data release complies with the law. “The way the PD chose to publish the data makes me wonder if it is in fact ‘disaggregated’ — you can’t separate out crashes caused by trucks, for instance, short of hitting ‘control+f’ a bunch of times. Had the PD simply released an Excel sheet, it would have been better.”

Martinez pointed out that Mayor Bloomberg “made a career of taking big, complicated datasets and making them accessible,” and wonders how a major agency like NYPD could be exempted from following the city’s commitment to open data.

“We recently reached out to NYPD and offered them the code for CrashStat, and offered to work with them to port their data into CrashStat,” says Martinez. “Our offer still stands.”

  • Anonymous

    Bloomberg, given his background, should be embarrassed by this.

  • Jeff

    Okay, developers, we have our work cut out.  First to write a script that scrapes and parses these PDFs and re-publishes it into a standard format wins a prize.  Extra credit if you can automatically grab the PDF from the server and re-run the script on a monthly basis, and even more extra credit if you can built an end-user web app from it.

    Ready?  GO!

  • CrashStat.org

    Anybody want to lend a hand at parsing this data?

  • J

    Umm… This isn’t surprising at all. NYPD didn’t want to release the data and fought tooth and nail to avoid doing so. Publishing the data will result in greater accountability, and if there is one thing NYPD hates it’s being accountable to anyone. Finally, they broke the law by not releasing the data for 4 months. In light of this, of course they’re gong to make the data as obscure as possible, as a sort of “screw you” to the folks who forced them to release the data and to the citizens of New York who stand to benefit from the data. Fortunately, among our 8 million residents, we have some very smart people who can quickly take data like this and make it relevant and useful. That said, Bloomberg has exhibited ZERO guts when it comes to standing up to the NYPD. They are running amok (corruption, brutality, and general disrespect to everyone), and their antics will leave an indelible stain on his legacy.

    To be fair, though, when DOT released the PPW data, they did so in a similar pdf form. That was probably more effective, since the curmudgeons at NBBL are few in number, and they aren’t so tech savvy.

  • @Jeff:twitter  I’m going to go on a limb and say that these PDFs started out as Excel spreadsheet, which was then pasted into Word, and then dumped as PDF. Each of those two steps inadvertently adds catches and stumbling blocks for parsers. And those catches will change each month. 
    It is possible to produce Word files and PDFs that are parseable for data analysis, but it is not easy, and not recommended. Both formats exist to present information to human readers.

    What needs to happen is for the NYPD to quit acting like a kindergartener. There is a law, and they need to obey it. There is a public and they need to serve it. 

  • Am trying adobe’s pdf to text service as well as a Google doc conversion.
    There’s a tool called poppler (http://poppler.freedesktop.org/) which could theoretically extract text into an xml format but I can’t get it to compile on OS X.

  • I tried uploading to Google docs and GDocs rejected it (no idea why, the entire error message is “Server rejected”).  The text can be copied & pasted but loses all formatting on pasting (and the precinct info isn’t in the table but is a separate line of text).  

    If someone else can try compiling poppler, it allegedly has an (pseudo)xml output option to extract text.

  • …Adobe Acrobat will export it as flat text but all structure is lost, it’s literally line–by–line (so you get one line reading “BRIGHTON BEACH pedestr 0 0 Unknown 1” and then the next line is “AVENUE TOTAL 0 0” as one example).

  • e.p.c., send mail to my Yahoo profile. I have your XML.

  • Ray K.

    Sure, the data is four months late, and it’s not really usable, but have you heard about our new anti-aircraft weapons systems?  How can anyone say we’re not concerned about safety?Anyway, we know that despite his hair-trigger for yelling at TA and ordinary citizens, Vacca doesn’t have the cojones to cross our thin blue line and call us on the carpet.

  • David Turner

    I went ahead and wrote a scraper.  It’s a bit crappy since I am busy working on a completely different project, but it’s here in case someone wants it. http://novalis.org/programs/scrapeintersections.txt

  • David Turner

    I went ahead and wrote a scraper.  It’s a bit crappy since I am busy working on a completely different project, but it’s here in case someone wants it. http://novalis.org/programs/scrapeintersections.txt

  • Iris W.

    When will Lois Carswell and Seniors for Safety issue a statement condemning the NYPD for hiding this information and threatening to sue them if they don’t cooperate?

  • At least it’s not as bad as the courts, whose PDFs are assemblies of 1-page JPGs which appear to have been scanned from fax printouts.

  • TJH

    I don’t see a problem, just look up the pct. you live in and it gives you the intersection crashes happened at.

  • Allright, guys, available here briefly: 


    Looks like parsing might work. 

  • Allright, guys, available here briefly: 


    Looks like parsing might work. 


Data-Driven Traffic Enforcement Saves Lives. NYPD Only Halfway There.

Photo: scubaham/Flickr With good data and targeted traffic enforcement, police departments around the country are saving lives. The Data Driven Approach to Crime and Traffic Safety policing system, or DDACTS, run by the National Highway Transportation Safety Administration, is reducing crashes by putting officers where they need to be to address the traffic violations most […]

NYPD Issues More Tickets to Drinking Pedestrians Than Speeding Drivers

NYPD issued more summonses for open container violations than for speeding in 2011, one of a number of law enforcement oddities revealed through data issued by police and compiled from court records. Drawing on data obtained from city criminal courts, the New York World, a project of the Columbia Journalism School, analyzed summonses issued by NYPD […]

NYC Open Data Law Will Sort Out NYPD’s Jumbled Traffic Crash Data

When the City Council passed Jessica Lappin’s Saving Lives Through Better Information bill last year, traffic safety and open government advocates cheered. Under the law, the NYPD is required to provide monthly data on both traffic crashes and traffic summonsing, shedding light on the hazards of city streets and what steps police take to protect […]

NYPD Starting to Roll Out Traffic Safety Data Online

Traffic crash data, long a closely guarded secret of the NYPD, is now slowly being released online. Pursuant to the Saving Lives Through Better Information Act, which took effect Wednesday, the department will begin posting monthly updates on summonses and crashes, differentiated by mode of travel and contributing factors and broken down by precinct and […]

CrashStat Upgrade Provides Interactive, Up-To-Date Street Safety Data

Transportation Alternatives launched an updated version of its CrashStat website today, providing a wealth of new data about street safety in New York City and where pedestrians and cyclists are most at risk. The upgrade adds four years of geo-coded data about traffic injuries and fatalities, a smoother interface, and a wealth of interactive features. […]