Redacting information from documents doesn’t have to be hard

The title says it all. Redacting information from documents doesn’t have to be hard. But yet it still seems it can be without applying the appropriate level of care. Whether you are manually redacting information from a document or using a tool like Philter, care is required to make sure the redaction is permanent and the redacted text is unavailable when complete.

The American Bar Association details some notable embarrassing redaction failures that have happened in the legal system. On that link the American Bar Association describes how information was redacted from PDF documents by having black boxes drawn over the text. At first glance it appears the information under the black boxes has been redacted. However, by simply selecting all of the text in the PDF and pasting it to another application, such as Notepad or Microsoft Word, the redacted text seemingly magically becomes available! This is not only an embarrassing failure on the legal firm but it can also present a very serious breach of sensitive information. That information was not supposed to be available for very specific reasons.

Philter can redact information from PDF documents. For security and to prevent instances such as those described on that page, Philter returns image files instead of modified PDF files. The images are the PDF files but with the sensitive information blacked out. The text under the black rectangles in the image cannot be recovered through copy and paste since there is no text under the black rectangles.

Philter’s approach to redacting information from PDFs is that the once processed the information is permanently inaccessible. You still have your original documents that were provided to Philter and now you have the image files containing the permanently redacted text. PDF filtering is available in Philter as of version 1.9.0. We are very excited to offer this capability in Philter and look forward to expanding it through your comments and feedback.

To filter a PDF document, just set the Content-Type header to application/pdf in your request:

curl -k -X POST https://localhost:8080/api/filter -d @file.pdf -H "Content-Type: application/PDF" -O

The response will be saved to the file and it will contain the redacted PDF pages as images.