A little while back, I answered a few questions I’ve been getting via email about what I do daily. After a new series, I decided to let someone else, who’s far more eloquent and is actually an educator in the field, explain the thought process that goes into decision-making.

Why can’t you just answer the question?

Instead of going through an explication todo-list that you must finish unless that train of thought gets derailed.

I was on a conference call Saturday when our boss, Mr. Dick Hardass, asked one of my colleagues a seemingly simple question about a deadline. “This couldn’t take more than 5 seconds to answer” we all thought.

Mr. Shakes-like-a Twig — nice fellow, neat hair —  went on to take 2 minutes to say “a week”.

You see, Mr. Twig had to go through how he was examining three months of business intelligence reports to figure out the right algorithms to sort through it all. Then he had to describe our current layout (which we were all familiar with), that he had just moved, that our report templates are “adequate”, that he just had 2 cups of coffee, that we had setup a second database server to mirror the original data (also something we all knew) and that his parents are from Ohio.

Mr. Hardass is a new boss, so we didn’t get a chance to tell him to never ask Mr. Twig for a status report unless it’s by email. He also has the patience of a fruit fly, the attention span of a hummingbird and the temper of a wounded leopard being poked with a stick, while being forced to watch a Dharma and Greg rerun.

And while Mr. Twig, being a nice fellow, isn’t capable of understanding that silence on the other end is often a sign of a boil-over to come.

“I don’t have the bandwidth to deal with all that, I just want to know how we’re doing!”
etc… etc…

Mr. Hardass

The flustered Mr. Twig replied with a wavering voice, “a week”.

Answer the question!

I’ve heard this said to many people who travel down the winding path of explanations, touching all subjects except the point.

The older I get the more I realize that these people are actually answering a long list of questions in their head accumulated through the hours, days, years and even a lifetime. The one they just heard is actually tacked on at the bottom… which they will get to eventually, time be damned.

Moreover, as I get older, I feel that I’m turning into one of these explanation adventurers touring the treacherous waters of societal impatience. It’s not that I want to delay, frustrate or otherwise confuse the listener, but this is more of an external monologue that I go through to build that very important answer. I’m thinking and I desperately want to give an answer you’re happy with as soon as I can compile a program that displays it — to me first — so I can relay it to you.

Comments are necessary in the code of this program so I can keep track of what I’m doing.

Of course, that’s not to say that you should be wasting your time listening to a dissertation on the consistency of yak dung (which is actually different from cow and buffalo dung, but not many people know this) when you just need a two word answer.

Diatribes, while they personally satisfying, don’t really help people like us. A sudden change in facial expression — raised eyebrows and tilted head works — that you’re waiting is usually the most reasonable thing you can do to get us back on track.

Stop thinking!

Because we haven’t found a way to archive all of those thoughts yet.

There was once a wonderful project called GNUPedia, later GNE, that attempted to do just that (though it failed spectacularly), and didn’t amount to much. The closest to come was Everything2 in a similar spirit, but this particular venture is very convoluted and is, often, not as useful or interesting to read. Sure, you will come across the occasional bits of gold, but that’s similar to finding a granule of wisdom among a sea of nonsensical Digg comments.

Whatever happened to the idea of archiving human thought? I don’t mean as blog posts, though they may have come close, I mean as a central repository of worthwhile information in the same vein as Project Gutenberg. PG, even today, is a mere 17000 books. Many of which were poorly archived and had most of their formatting lost as per archive practices. And they seem to suffer from some proofreading issues.

Google books is an interesting alternative, that makes it conveniently inconvenient to search for any meaningful information among books in the public domain, and contain very little detailed, search-enabled, information on each entry compared to any public library. Then what about publications that are not books, but pamphlets, essays, quotes, and scribbles?

What about Wikipedia and Wikiquote? What makes them incredibly powerful and incredibly large is the same reason why neither can ever be comprehensive or accurate to a high degree. While being accessible to everyone, it also makes verification and attribution extremely difficult. What’s more, the recourse taken by admins to prevent vandalism and other childish or malicious behavior is what would be expected of any popular repository : Restricted editing rights. The content on Wikipedia, and to an extent Wikiquote depend on external references, which may or may not be available at any given time. These archives, therefore, are not factually self-supporting.

Also, the definition of a “reputable source” is often grounds for debate (as seen in many discussion pages on articles) and so are the arguments over point of view or lack thereof.

Anonymous contribution, while ideologically sound, is impractical for reliable information as there is no accountability and no means of checking the authenticity of editors except for an easily spoofed IP address.

Rather than just complain about a lack of any comprehensive or “cruft free” thought archive, let’s dig into how we might create one now…

So what is really required to produce “an archive of opinions” as per GNE and perhaps “an archive of facts” as per an encyclopedia?


Often forgotten in any project, there has to be some number of people who care enough about what they are doing to get it right. These individuals need to be qualified and not care about praise or recognition. Harsh, yes, but that is where many projects fail. If they expect a resounding success (or any success for that matter), they must not be hung up on the short-term : Which is always bleak.

A passion about what you are doing, the preservation of thought in whatever form, is absolutely necessary. As is rigor (intellectual honesty) and accountability.


So what the devil are these people going to archive? Books? Essays? Research? Commentary? Humor/Horror/Drama/Fiction?
How about all of the above?

The down side to going with the “culturally significant” route the Library of Congress has taken, is that one culture may not care much for what is significant in another culture. This also ties into what is morally significant, after all, as who is authorized to say what is moral? There are obvious “content” that are not worthwhile for preservation, but we all can tell what those are. If something is not obvious for preservation, then it should be preserved.

The LOC, while being a wonderful and large resource, is very singular when it comes to its archive. Being a U.S. institution, this is hardly surprising as only international efforts will be internationally comprehensive.


If possible, everything about the piece of work to be archived must be preserved. That includes, everything from the media to author and date information (as expected) as well as sources, images, fonts, versions, titles, editions, additional contributors, and any and all extra information (E.G. The historical context of the work and any related titles etc…) And all this information must be available for searching. Which means more effort than automated book scanners are required. Or else the context of the work may be lost. There must also be a way of quickly and transparently identifying and correcting any mistakes which are bound to occur during the archive process.

Now you know why the right people are important.


Obviously, digital storage is the only viable means of archiving this much information for the long term.

Databases must be maintained, regularly backed up and preferably mirrored on nodes dedicated to public access while central master databases are kept for updates and entries. These technical issues can be resolved somewhat by following the example laid down by Wikipedia. However, stored information needs to be far more comprehensive, thus the storage and retrieval means must be more appropriate.

An expandable set of meta parameters is the answer.

We have no practical way of predicting or assessing all the necessary “fields” required to store everything submitted for archival. So, the only solution is to leave that method open for expansion… Using XML, for example, where each bit of information on any work, including the work itself, is a new node/value pair. This is where XML databases would really come in handy.

To prevent the loss of context and accuracy (and avoid the Project Gutenberg problems), content must be stored as text as well as binary whenever possible. The original in binary form for verification and historical purposes and the text form for easier searching (to avoid the Google Books problem).


Care and diligence cannot be over-emphasized during the archive process as well as research into whatever item is being archived. Everything must be verified for accuracy whenever possible and should not be publicly accessible if verification isn’t immediately possible. Show what is correct, complete, and checked. Keep in memory what isn’t so those points can be addressed later.

The expertise of the “people” can really shine when they list all available (and correct) information on any piece of work. This meta information can be attached and expanded at leisure once the main piece is archived. A comprehensive archive will never contain a “Trivia sections are discouraged” tag on any item.

Whatever content, method or means are used, the people involved must be willing to follow through with the ultimate goal being the preservation of the content, completely, without contamination or bias.

If you managed to read through completely to the end of this post, then congratulations! You may be a candidate to become one of the people needed to get this project off the ground.

(An interesting side-note: During the writing of this post, I discovered the spell-checker was unable to recognize the word “blog”. Oh irony!)