Because we haven’t found a way to archive all of those thoughts yet.
There was once a wonderful project called GNUPedia, later GNE, that attempted to do just that (though it failed spectacularly), and didn’t amount to much. The closest to come was Everything2 in a similar spirit, but this particular venture is very convoluted and is, often, not as useful or interesting to read. Sure, you will come across the occasional bits of gold, but that’s similar to finding a granule of wisdom among a sea of nonsensical Digg comments.
Whatever happened to the idea of archiving human thought? I don’t mean as blog posts, though they may have come close, I mean as a central repository of worthwhile information in the same vein as Project Gutenberg. PG, even today, is a mere 17000 books. Many of which were poorly archived and had most of their formatting lost as per archive practices. And they seem to suffer from some proofreading issues.
Google books is an interesting alternative, that makes it conveniently inconvenient to search for any meaningful information among books in the public domain, and contain very little detailed, search-enabled, information on each entry compared to any public library. Then what about publications that are not books, but pamphlets, essays, quotes, and scribbles?
What about Wikipedia and Wikiquote? What makes them incredibly powerful and incredibly large is the same reason why neither can ever be comprehensive or accurate to a high degree. While being accessible to everyone, it also makes verification and attribution extremely difficult. What’s more, the recourse taken by admins to prevent vandalism and other childish or malicious behavior is what would be expected of any popular repository : Restricted editing rights. The content on Wikipedia, and to an extent Wikiquote depend on external references, which may or may not be available at any given time. These archives, therefore, are not factually self-supporting.
Also, the definition of a “reputable source” is often grounds for debate (as seen in many discussion pages on articles) and so are the arguments over point of view or lack thereof.
Anonymous contribution, while ideologically sound, is impractical for reliable information as there is no accountability and no means of checking the authenticity of editors except for an easily spoofed IP address.
Rather than just complain about a lack of any comprehensive or “cruft free” thought archive, let’s dig into how we might create one now…
So what is really required to produce “an archive of opinions” as per GNE and perhaps “an archive of facts” as per an encyclopedia?
Often forgotten in any project, there has to be some number of people who care enough about what they are doing to get it right. These individuals need to be qualified and not care about praise or recognition. Harsh, yes, but that is where many projects fail. If they expect a resounding success (or any success for that matter), they must not be hung up on the short-term : Which is always bleak.
A passion about what you are doing, the preservation of thought in whatever form, is absolutely necessary. As is rigor (intellectual honesty) and accountability.
So what the devil are these people going to archive? Books? Essays? Research? Commentary? Humor/Horror/Drama/Fiction?
How about all of the above?
The down side to going with the “culturally significant” route the Library of Congress has taken, is that one culture may not care much for what is significant in another culture. This also ties into what is morally significant, after all, as who is authorized to say what is moral? There are obvious “content” that are not worthwhile for preservation, but we all can tell what those are. If something is not obvious for preservation, then it should be preserved.
The LOC, while being a wonderful and large resource, is very singular when it comes to its archive. Being a U.S. institution, this is hardly surprising as only international efforts will be internationally comprehensive.
If possible, everything about the piece of work to be archived must be preserved. That includes, everything from the media to author and date information (as expected) as well as sources, images, fonts, versions, titles, editions, additional contributors, and any and all extra information (E.G. The historical context of the work and any related titles etc…) And all this information must be available for searching. Which means more effort than automated book scanners are required. Or else the context of the work may be lost. There must also be a way of quickly and transparently identifying and correcting any mistakes which are bound to occur during the archive process.
Now you know why the right people are important.
Obviously, digital storage is the only viable means of archiving this much information for the long term.
Databases must be maintained, regularly backed up and preferably mirrored on nodes dedicated to public access while central master databases are kept for updates and entries. These technical issues can be resolved somewhat by following the example laid down by Wikipedia. However, stored information needs to be far more comprehensive, thus the storage and retrieval means must be more appropriate.
An expandable set of meta parameters is the answer.
We have no practical way of predicting or assessing all the necessary “fields” required to store everything submitted for archival. So, the only solution is to leave that method open for expansion… Using XML, for example, where each bit of information on any work, including the work itself, is a new node/value pair. This is where XML databases would really come in handy.
To prevent the loss of context and accuracy (and avoid the Project Gutenberg problems), content must be stored as text as well as binary whenever possible. The original in binary form for verification and historical purposes and the text form for easier searching (to avoid the Google Books problem).
Care and diligence cannot be over-emphasized during the archive process as well as research into whatever item is being archived. Everything must be verified for accuracy whenever possible and should not be publicly accessible if verification isn’t immediately possible. Show what is correct, complete, and checked. Keep in memory what isn’t so those points can be addressed later.
The expertise of the “people” can really shine when they list all available (and correct) information on any piece of work. This meta information can be attached and expanded at leisure once the main piece is archived. A comprehensive archive will never contain a “Trivia sections are discouraged” tag on any item.
Whatever content, method or means are used, the people involved must be willing to follow through with the ultimate goal being the preservation of the content, completely, without contamination or bias.
If you managed to read through completely to the end of this post, then congratulations! You may be a candidate to become one of the people needed to get this project off the ground.
(An interesting side-note: During the writing of this post, I discovered the spell-checker was unable to recognize the word “blog”. Oh irony!)