You've probably heard "Sin City's" marketing slogan on their television commercials: "What happens in 'Vegas, stays in 'Vegas!" Well, I have my own version: "Everything you do online, stays online!" – and since 1996, one of the largest publicly-accessible repositories of online data has been the Internet Archive (www.archive.org), an immense virtual library dedicated to "Universal Access to Human Knowledge" – which is currently growing at a rate approaching one Terrabyte a day. For the layman, one Terrabyte is the equivalent of a metric ass load of data.
I'm writing about the Internet Archive today after watching a C-Span rerun in the wee hours last night that featured Brewster Kahle, a Digital Librarian, and Director and Co-founder of the Internet Archive. Kahle is the architect of the ideas and tools being used to archive the web, and his talk, "Universal Access to Knowledge," was an especially fascinating glimpse into the challenges and opportunities of preserving, and disseminating, our rich, and rapidly growing, wealth of knowledge. At the time of this writing, the link on C-Span's website to Kahle's presentation 404'd, but I encourage you all to seek it out, either online, or in event of a future televised re-run.
The Wayback Machine
One of the most amazing features of the Internet Archive is the Wayback Machine; a search tool that displays archived snapshots of the Internet – including individual websites (and their successive updates) searchable by URL. According to their website, "Visitors to the Wayback Machine can type in a URL, select a date range, and then begin surfing on an archived version of the Web. Imagine surfing circa 1999 and looking at all the Y2K hype, or revisiting an older version of your favorite Web site. The Internet Archive Wayback Machine can make all of this possible."
While I've known about the Wayback Machine for a couple of years now, having learned about it in the course of my provocation of Alexa, I initially viewed it as a novelty and filed it away in the back of my head. Now, I intend to take another look at this valuable resource, and examine ways in which it can be mined for useful data when developing business and marketing plans, as well as to find some of my long lost virtual treasures, and more.
But scouring through your own past treasures (which one might reasonably assume are also archived on your own computers or in a stack of CDs sitting in your closet) is only the tip of the iceberg, as the Wayback Machine lets you rummage through other folk's virtual attic too; which is something I tried this morning, by entering the URL of my wife's old website: sure enough, there she was, a few years before I met her, inserting a "Tootsie Pop" into her lovely self, and offering these "personally flavored lollipops" for sale – for only $5.95 each or a bouquet of 8 for $24.95 (plus shipping and handling). Ahhh, yes, "That's my girl!"
Sins Of The Past Revisited
For some, the Internet Archive and Wayback Machine is a welcome safeguard of valuable data. For others, it's a fun-filled trip down memory lane. For me, it's a newly discovered treasure trove of fan-recorded live sets from The Grateful Dead. For others still, it's a source of evidence to be used against them in a court of law...
In an age where it's increasingly difficult to take back one's words, the existence of a publicly available archive of this magnitude is a powerful weapon for intellectual property litigants, and for others seeking the truth of not only "what is" but of "what was" published online – and when. The record of which has been held admissible in a court of law.
While the archives may currently be too "new" for prior art searches of material to counter Acacia's patent claims on Digital Media Transmission technologies, for example, they will doubtless prove effective at countering – or supporting – other intellectual property claims, and as such will prove an invaluable research tool for legal teams and activists.
The Forefront Of Search Technology
One of the intriguing points brought up in the discussion of the Internet Archives is how to manage the vast amounts of data that are contained within. As an industry that has been on the cutting edge of technical innovation, particularly in the online arena, some of adult's brightest minds are always seeking a next-level solution to making our products more readily, profitably – and safely – available. These folks would do well to keep an eye on the developments in search technology and relational data management being undertaken by the Internet Archive, as well as some of the implications of that technology, including ways to enhance, and where desirable, defeat it.
Consider my own case: As a responsible adult webmaster, I closely monitor the evolving legal environment we operate in. As such, when DOJ proposed changes to the 18 USC '2257 statute that could require secondary producers to display Custodian of Records information, backdated in such a way as to impose a nearly impossible burden for compliance, I simply elected to delete my potentially problematic properties instead.
While a good case could certainly be made to show a "good faith effort" on my part to remedy the situation, the existence of such a publicly available archive which would display material that was not illegal at the time of its publication, but made so since by evolving statute, is quite troubling. Although a well crafted robots.txt file would solve some of the issues going forward, it's unclear as to the feasibility of removing previously archived material from the public record. As such, a savvy prosecutor could cite the existence of an old – but still publicly available (through the Wayback Machine) – copy of an offending property, as evidence of non-compliance with '2257 or other statute and obtain a conviction on that basis.
This is only one small example of the challenges publishers face in the digital age, but the opportunities for ourselves and for future generations are staggering in comparison. Only time will tell how we faced these challenges, and embraced these opportunities – and the Internet Archive will be there to provide a record of it all...