I just found the oldest file on my computer

Named “DFIX.EXE” and last modified December ‎10, ‎1986, ‏‎1:44AM, I can’t remember what it is or what it belonged to originally, but it was probably from one of those DOS apps and games that got passed around while I was in school. Uh, yeah, this was before “filesharing” was taboo, anti-virus was hardly common and Napster didn’t exist.  The “Created date” is listed as 2001, which means it had a rather interesting journey.

This was probably copied from a much older floppy to my old Packard Bell PC (the brand no longer sells in the U.S.) that I got in the mid 90’s with the then state of the art Windows 95 as I was just entering high school. I used this PC for most of my school work, games, MP3s and surfing until graduation. It was, cue massive nostalgia, the same PC that I used to setup a small webserver and start a little community hub called Ghostnetworks.

The PC went through a Win 98 upgrade, at which point the file was probably copied over along with my entire (“gasp”) 2GB worth of “stuff” from the PCBells’ 4GB drive into my Dell Inspiron 8200 laptop with its stunning 60GB capacity. This was so I can use the PC as a dedicated server running Apache for GN. The inspiron served me well (it still works!), but was starting to lag behind on my work and it was pretty heavy at almost 8Lbs.

I didn’t carry the laptop around all that much and this file, among countless others, was long forgotten in a “backup” folder in My Documents. I then got an eMachines box (can’t remember the exact specs) in the early 2000’s on sale at our local Staples and it received the entire hard drive contents of the Dell (about 20GB worth of files “created” on the new PC) and this file too sat there for many moons.

I went through all of 2004 until I cracked open the old eMachines again and Lo and behold! Old stuff!!

This, along with the rest of the super massive 60GB that had accumulated on the eMachines ended up on a second custom PC and it’s 500GB drive that I got from a reseller after the DotCom bust. And there it sat again until the drive was moved to yet another custom PC and then another, until finally, after the drive started making the proverbial death clicks, it was moved to its final (maybe) resting place on a new-ish 1TB WD drive after almost 14 years. My, how far it’s travelled.

All this just goes to show how far our digital legacy has come as I’m sure for some people there are older, “proper”, documents and files that have survived to this day.

Of course, this is far from the oldest file out there. On the Internet at least, I came across a pretty old W3C document (RTF) that discusses a proto-WWW by Tim Berners-Lee. The last modified date of this document is in August of 1990.

Anyone else come across a piece of digital nostalgia?

Google Groups = Blogspot

And here’s the proof…
Google Groups spam

What’s really sad is that they’ve even managed to spam the Web Archive
Archive spam

This proves, once again, my point about building a thought archive. The much overlooked resource in building any sort of archive are the people. The Web Archive is far too automated and this shows the result of such automation. Nusance sites that were taken down are still available thanks to poor filtering.

Until truly intelligent machines are invented, a human review system is absolutely vital for any mass collective. I.E. Open directory.

Stop thinking!

Because we haven’t found a way to archive all of those thoughts yet.

There was once a wonderful project called GNUPedia, later GNE, that attempted to do just that (though it failed spectacularly), and didn’t amount to much. The closest to come was Everything2 in a similar spirit, but this particular venture is very convoluted and is, often, not as useful or interesting to read. Sure, you will come across the occasional bits of gold, but that’s similar to finding a granule of wisdom among a sea of nonsensical Digg comments.

Whatever happened to the idea of archiving human thought? I don’t mean as blog posts, though they may have come close, I mean as a central repository of worthwhile information in the same vein as Project Gutenberg. PG, even today, is a mere 17000 books. Many of which were poorly archived and had most of their formatting lost as per archive practices. And they seem to suffer from some proofreading issues.

Google books is an interesting alternative, that makes it conveniently inconvenient to search for any meaningful information among books in the public domain, and contain very little detailed, search-enabled, information on each entry compared to any public library. Then what about publications that are not books, but pamphlets, essays, quotes, and scribbles?

What about Wikipedia and Wikiquote? What makes them incredibly powerful and incredibly large is the same reason why neither can ever be comprehensive or accurate to a high degree. While being accessible to everyone, it also makes verification and attribution extremely difficult. What’s more, the recourse taken by admins to prevent vandalism and other childish or malicious behavior is what would be expected of any popular repository : Restricted editing rights. The content on Wikipedia, and to an extent Wikiquote depend on external references, which may or may not be available at any given time. These archives, therefore, are not factually self-supporting.

Also, the definition of a “reputable source” is often grounds for debate (as seen in many discussion pages on articles) and so are the arguments over point of view or lack thereof.

Anonymous contribution, while ideologically sound, is impractical for reliable information as there is no accountability and no means of checking the authenticity of editors except for an easily spoofed IP address.

Rather than just complain about a lack of any comprehensive or “cruft free” thought archive, let’s dig into how we might create one now…

So what is really required to produce “an archive of opinions” as per GNE and perhaps “an archive of facts” as per an encyclopedia?


Often forgotten in any project, there has to be some number of people who care enough about what they are doing to get it right. These individuals need to be qualified and not care about praise or recognition. Harsh, yes, but that is where many projects fail. If they expect a resounding success (or any success for that matter), they must not be hung up on the short-term : Which is always bleak.

A passion about what you are doing, the preservation of thought in whatever form, is absolutely necessary. As is rigor (intellectual honesty) and accountability.


So what the devil are these people going to archive? Books? Essays? Research? Commentary? Humor/Horror/Drama/Fiction?
How about all of the above?

The down side to going with the “culturally significant” route the Library of Congress has taken, is that one culture may not care much for what is significant in another culture. This also ties into what is morally significant, after all, as who is authorized to say what is moral? There are obvious “content” that are not worthwhile for preservation, but we all can tell what those are. If something is not obvious for preservation, then it should be preserved.

The LOC, while being a wonderful and large resource, is very singular when it comes to its archive. Being a U.S. institution, this is hardly surprising as only international efforts will be internationally comprehensive.


If possible, everything about the piece of work to be archived must be preserved. That includes, everything from the media to author and date information (as expected) as well as sources, images, fonts, versions, titles, editions, additional contributors, and any and all extra information (E.G. The historical context of the work and any related titles etc…) And all this information must be available for searching. Which means more effort than automated book scanners are required. Or else the context of the work may be lost. There must also be a way of quickly and transparently identifying and correcting any mistakes which are bound to occur during the archive process.

Now you know why the right people are important.


Obviously, digital storage is the only viable means of archiving this much information for the long term.

Databases must be maintained, regularly backed up and preferably mirrored on nodes dedicated to public access while central master databases are kept for updates and entries. These technical issues can be resolved somewhat by following the example laid down by Wikipedia. However, stored information needs to be far more comprehensive, thus the storage and retrieval means must be more appropriate.

An expandable set of meta parameters is the answer.

We have no practical way of predicting or assessing all the necessary “fields” required to store everything submitted for archival. So, the only solution is to leave that method open for expansion… Using XML, for example, where each bit of information on any work, including the work itself, is a new node/value pair. This is where XML databases would really come in handy.

To prevent the loss of context and accuracy (and avoid the Project Gutenberg problems), content must be stored as text as well as binary whenever possible. The original in binary form for verification and historical purposes and the text form for easier searching (to avoid the Google Books problem).


Care and diligence cannot be over-emphasized during the archive process as well as research into whatever item is being archived. Everything must be verified for accuracy whenever possible and should not be publicly accessible if verification isn’t immediately possible. Show what is correct, complete, and checked. Keep in memory what isn’t so those points can be addressed later.

The expertise of the “people” can really shine when they list all available (and correct) information on any piece of work. This meta information can be attached and expanded at leisure once the main piece is archived. A comprehensive archive will never contain a “Trivia sections are discouraged” tag on any item.

Whatever content, method or means are used, the people involved must be willing to follow through with the ultimate goal being the preservation of the content, completely, without contamination or bias.

If you managed to read through completely to the end of this post, then congratulations! You may be a candidate to become one of the people needed to get this project off the ground.

(An interesting side-note: During the writing of this post, I discovered the spell-checker was unable to recognize the word “blog”. Oh irony!)