Cookie Control

This site uses cookies to store information on your computer.

Some cookies on this site are essential, and the site won't work as expected without them. These cookies are set when you submit a form, login or interact with the site by doing something that goes beyond clicking on simple links.

We also use some non-essential cookies to anonymously track visitors or enhance your experience of the site. If you're not happy with this, we won't set these cookies but some nice features of the site may be unavailable.

By using our site you accept the terms of our Privacy Policy.

(One cookie will be set to store your preference)
(Ticking this sets a cookie to hide this popup if you then hit close. This will not store any personal information)

Toolbox: Scraping App for the Digital Gumshoe

August 28, 2013

Here's a nifty web-scraping tool we think investigative journalists might want to know about or use. What's web-scraping, you ask?

In the old days, gumshoes used shoe leather, telephones, and index cards. Today, they are gathering data from the web to drive investigative projects. As many journalists know, the data on the web is not always in a very convenient or usable form. "Scraping" means a methodical effort to capture data disclosed on the web and put it into useable form — typically a database.

First, let's note that neither journalists nor the public should have to do this, at least for federal data. The Electronic Freedom of Information Act of 1996 decreed that government records that exist in electronic form have to be made available in electronic form. But data published on the web in html "table" format may be hard to wrangle into a structured database, where it can be queried for investigative purposes.

Software exists to help with this. But it is often expensive, bewildering, bloated, and untrustworthy.

That's why the WatchDog was pleased to discover a free, open-source add-on to Mozilla's Firefox web browser called ExportToCSV, by Souvik Chatterjee. Even though it is nominally in "beta" (test) release, it works fine for us. You simply right-click on any html table, and it exports the table's data to a "comma-separated variable" (CSV) text file. Such CSV files can be easily imported into databases like Microsoft Access or spreadsheets like Excel. It has cleared preliminary vetting by Mozilla, the organization that produces Firefox.

Find it here.

SEJ Publication Types: 
Topics on the Beat: 
Visibility: