ASF Integral to Panama Papers Investigation

April 17, 2017

At 2.6 terabytes of data, the Panama Papers is the largest leak of all time, comprising 11.5M financial and legal records sent from an anonymous source. The journalistic cooperation involved more than 400 journalists from 100 publications on six continents over the course of a year. The discovery exposed a complex system of criminal and corrupt activities secretly hidden by offshore concerns. The investigation recently received a Pulitzer Prize in the Explanatory Reporting category.

"The Apache Software Foundation incorporated 18 years ago with the mission to create software for the public good," said ASF President Sam Ruby. "We are honored that Apache software played a critical role with the Panama Papers, and congratulate the International Consortium of Investigative Journalists and their media partners on this prestigious award."

The discovery, exchange, and management of information that involved 214,488 entities was made possible by:

•Tika --toolkit that detects and extracts metadata and structured text content from various documents. Used for document processing.

•Solr --enterprise search server, based on the Lucene Java search library, with advanced highlighting, faceted search, caching, and replication capabilities. Used for search and indexing.

•PDF Box --Open Source Java library for working with PDF documents. Used for capturing text from PDF documents.

•POI --Open Source Java library and APIs for various file formats based on Microsoft Office. Used to extract and manipulate Excel, Word, and PowerPoint files.

•Commons --40+ projects for reusable Open Source Java components. Used to boost cross-platform development and productivity.

In addition to Apache software, a number of other Open Source projects were also integral to the investigation. This includes Tesseract-ocr (whose optical character recognition engine was used for capturing text from images), Project Blacklight (used as a discovery interface), and Jackcess (used for reading and writing MS Access databases): three examples of the millions of software solutions distributed under the Apache License v2.0, that allows for their free use, modification, and sharing.

Terms of Use | Copyright © 2002 - 2017 CONSTITUENTWORKS SM  CORPORATION. All rights reserved. | Privacy Statement