Kennis Preserving Documents Forever

Preserving Documents Forever

The goal of my current project is to save documents forever. Yes, forever! I know that can sound a bit silly, but the idea really is that these documents should be preserved (and be usable) for generations to come.

The Background

So first, a look back at recent history... Think about a document created in 1991, about twenty years ago. There's a very good chance it was made in WordPerfect 5.1. The old-timers reading this blog will no doubt have fond memories of that one... But back to today, how on earth are we going to open that file? Well it's Google to the rescue!

But I guess you do get my point.

If we've learned anything in the last twenty years, it's that predicting the future of computing is risky business. Here are a couple of Bill Gates gems that illustrate this rather well:

- 640K ought to be enough for anybody (1981)

- I see little commercial potential for the internet for the next 10 years (1994)

The Requirements

So we're looking at:

  • Storing device independent documents
  • Documents must be self contained (E.g. included fonts vs. linked externally)
  • Using an international standard (ISO?)

The Search

These requirements mean that most document formats are eliminated right away. These include any sort of 'proprietary' formats like MS Word and OpenOffice. XML on the other hand could have been an option, but unfortunately, with XML you loose some important features like page layouts and image data.

PDF or Portable Document Format seems like it could be the right choice. But then you run into the problem of all the different PDF versions...

Fortunately, research on the subject reveals that this has been addressed!

The Solution

For long-term preservation of documents PDF/A (where A = Archive) is the solution! No chance of hidden royalty-fees or obscure pending patents. And internationally, it looks like most governments are heading in this direction.

The Compliance Levels

So while PDF/A is the way to go if you want to preserve documents for the next generation, note that there are also different versions of PDF/A in existence. For instance PDF/A-1 is based on PDF 1.4 and consists of two compliance levels:

  • PDF/A-1a: The most strict format. It's only suitable for documents that are digitally created (on a computer), since it preserves the underlying structure of the original document.
  • PDF/A-1b: Is suitable for both digitally created documents and scanned documents. This is probably the most common PDF/A format in use today.

The Future

Although PDF/A-1 is the most widely deployed version, its successor PDF/A-2 is also available. Furthermore PDF/A-3 is currently in the process of standardization.
 
The great thing with PDF/A versions though, is that a new version doesn't invalidate previous ones. Meaning every PDF/A-1 document is also a valid PDF/A-3 document. Preserving backwards compatibility is certainly one of the most important aspects of PDF/A.
 
So you can store your documents forever! Just remember to back them up properly!