Wednesday, October 3, 2007

Virtually Ideal

One of the noble, iconic, pastimes of the internets is the opportunity it gives us to laugh at the media and creations of previous generations. Hence the transplantation from the realm of medicine to humour, of ads for Thorazine ("for prompt control of senile agitation!"), or insta-classic slogans like "More doctors smoke Camels than any other cigarettes", and other gems.

The subsequent understanding that develops over time around a subject, can allow us to re-interpret information we were once more prepared to accept at face value, and see that information in a more informed light. Matured understanding is to us the clearer vision revealing the Emperor's New Clothes, and this understanding can similarly leave us reaching for the eye-bleach, by giving us insights we wish we'd had forewarning of.

In this vein, consider our past and current expressed feelings around such wonder-products as Asbestos, 2-4-5-T, DDT, lead-based facepaint, or Television. TV is a particularly interesting example, because one of the many ways it became embedded in our culture was the promise it had as an educative tool, and this particular use was one of the key marketing points behind the indefatigable push the TV made into our living-rooms, bedrooms, studies, luxury cars, arm-rests, bathrooms, fish'n'chip shops, sports bars, and other great areas of intellectual dissemination.


Who, reading this, now watches telly for their intellectual nourishment? Every once in a rare while, we get a great BBC series like “Civilisation” by Kenneth Clarke or “The Planet”, that fulfills this original promise. However, for the other 99% of the time, there’s nothing on but the trash used to decorate prime advertisement timezones, and people watch it nonetheless.

Television was often shown in newsreels typically being discussed by either a guy in a white labcoat, who we took seriously because he was standing beside a bench with test-tubes on it, or by an authoritative man in a tweed suit, palpably reeking of higher education and Camel cigarettes. These patriarchal figures representative of the high state of human progress would cite fantastic examples "...and so, if you want to learn about Plato's Symposium, you could just turn on your tele-vision and watch a program about it being broadcast.", and the intense technological lust would spin our heads with the potential of the device. I'm still waiting for a decent broadcast on Plato's Symposium, btw.

Even the internet has been around long enough for us to start seeing it in a less idealistic light than when it wormed its way down our telephone lines at 300 baud. Yes, we use it for a great range of useful, practical, and enlightening things, but how much of that bandwidth is porn, useful, practical, enlightening, or otherwise? The point of this, is that there can be a disparity that becomes evident over periods of time, between initial intentions and subsequent realities.

Where is this Abe Simpson ramble going?

The next obvious wonder-technology (relating to what I do), that I think will show a similar disparity over time, is virtualisation. Virtualisation is the promising silver bullet to the technologist tasked with maintaining a digital information system across a large period of time, in that it can theoretically prevent the information archive from having to become a technology archive in order to remain an information archive. Over the course of the next year, the two major CPU vendors, Intel and AMD, will be making a big deal about their new hardware-based virtualisation models, dubbed Intel VT and AMD-V (they really burnt the creative midnight oil with those names, you can tell). Even the existence of these technologies shows the perceived importance of virtualisation to tomorrows computing world, and the respective manufacturers’ intended inference is the importance of choosing the "right" implementation, which is of course their own.

To illustrate the allure of virtualisation technology: If, in 2017, I have some published resource that only ran on Windows 95 that I have to make accessible, but I can't buy/hire/borrow/steal any hardware old enough to be compatible with Windows 95 and the resource in question, then the possibility of running an instance of Windows 95 on top of virtual compatible hardware has immense appeal to me. The capacity for simulation by digital information systems is unparalleled and self-evident. With digital simulation we can examine and re-examine any information we have suitably represented, in a consistent manner, allowing us to slow it down to the bite-sized pieces necessary for fruitful human interpretation.

So, in this way, I can simulate a computer presenting parameters acceptable to Windows 95, which means I can theoretically install Win95, the required supporting app/s for the digital object, and thereby make accessible the published resource I wanted. This is a concept with immense potential, and has been promoted favourably by very smart people with lots of extra letters after their names - Cornell University are heavily behind it as a key technique in digital preservation management, to name but one group of proponents.

Therefore, I feel somewhat of a grinch in saying that the archivists’ hopeful faith in this solution is misplaced. It is true that it would allow us to extend the effective access period to a given form of a digital resource - there is a huge degree of complexity supporting any digital thing, and the ability to virtualise some of that complexity is a way to apply a technological shorthand to a very, very, long chain of logic and knowledge. This is why virtualisation has emotional appeal (ok, a cold, geek, emotional appeal) - it is the implied promise of not having to get your head around that entire chain of supporting complexity.

The unconcious notion of computers as digital systems in the abstract sense, pure, precise, and clean-room ISO9001 compliant, under diffuse lighting in a white room, is a total defiance of the reality of short-lived electrolytic capacitors, mechanical storage devices dependent on aerodynamic principles in order not to "crash", vendors fighting each other with standards to sequester themselves a captive consumer base...anyone who hopes that virtualisation will give us back an echo of the willing, archive-friendly computers science-fiction promised us is clearly smoking something much more potent than Camel.

Then what's the problem? Why can't they smoke what they want?

The problem is this: Ordinarily, in the non-virtualised world, to support a digital object you start by looking at what it needs to run - does it need a specific application to run? Does that application have specific system requirements? Can those system requirements be met by the supporting hardware/software platform? Is that hardware/software platform a commodity item that can be accessed as required?

In a virtual environment, in which, lets assume that all of these requirements can be met with a virtualisation product of some sort, all we have done is make the virtualisation platform an extra part of the complexity supporting the digital object. This means that we still have an initial set of supporting conditions to meet, but they are the conditions allowing the virtual environment to exist. Then we have the secondary set of supporting requirements to be met by the simulated environment. All we have done is create the ability to transplant a layer of legacy dependencies into the virtual layer, and created the need to support them with contemporary dependencies. Net complexity increases. And then, once you get to the next major technology refresh, a digital generation later, it has the potential to get really recursive - do we need to virtualise the environment supporting the virtual environment to maintain the data?

Virtualisation is generally invoked when no other options exist – when data or its dependencies have proven non-portable in the real world. What will happen to our Windows 95 virtual machine if the physical machine underneath it reaches the point of replacement - do we end up with a host running an emulator running an emulation of an emulated environment? Extrapolate that model out ten generations of digital system refresh cycles....ye gods, that gets ugly. Or do we instead re-invent ways to virtualise a target OS one layer above successively newer platforms, requiring a dedicated VM development cycle to explicitly accompany our technology refresh cycle? What a pain.

With a virtual machine, we are still dependent on the goodwill and whim of a vendor or developer to go in a direction that suits us - and we work with the hope that their intentions will correlate with our needs. The lack of standards around virtualisation mean that there is still an interdependence between a specific application or kernel (ie VMWare or XenSource, Intel VT or AMD-V hardware) and a corresponding set of data representing the logic in the virtual object (like a .VMDX file for vmware, or a xen-aware OS installation on a partition, or an Intel VT/AMD-V compatible technology stack).

Now the key reason I'm not shouting optimistically about how a drive for virtualisation standards can solve these problems, is that in nearly every prior example in human inventive history, success within an innovative field is most widely obtained by those that most efficiently benefit from it economically. I'm stung by the time I believed in the wonder of asbestos. That 2-4-5-T sure did get rid of the mosquitoes at the beach, just like the ad said it would. I saw a smart guy on the newsreel telling me that I could gain profoundly efficient learning capacities by watching TV.

Now, I've read quite a few digital archive authorities say that virtualisation is the answer to the short and fragile lives of digital objects. I'm inclined to disbelieve them. Virtualisation is a fantastic tool to a geek like me, it's a highly useful thing to have for some of the stuff I do (making software images, application re-packaging), but it shouldn't in its current form be viewed as the tool of the digital archivist. Until we learn to make self-descriptive digital objects, independent of applications, OS, and hardware - digital objects will keep disappearing from the collective digital memory. What is required is a deeper look at the way people create and store digital things, not another layer of digital things.

4 comments:

Aaron Tan said...

While I agree that virtualization isn't the magic bullet to digital preservation, a few assumptions you put forth aren't exactly true today. Virtualization standards are being developed right now, and there are virtualization tools that allow you to run applications without the operating system in a virtual infrastructure. The fact that we create digital objects in specific environments to provide a certain user experience requires that we preserve that environment. Digital objects on their own are meaningless to users without the presentation layer. Also, I'm not sure how we can create hardware and software agnostic digital objects, because that would defy the underlying characteristics of a digital object.

Emerson said...

Thanks for your comment. The standards drive is predominantly coming from either single private vendors, or clusters of two or three technology partners looking to leverage each other's development, as is the case with VMWare and the Pacifica extensions. I'm not nay-saying the reality of this, but I am trying to point out that this standards push is not something yet occuring within the ISO space or taking place under the terms and spirit of scientific inquiry, meaning that a significant percentage of the virtualisation models now in use will end up orphaned as consolidation occurs, resulting in the risk of data loss.
I am sure that to give an underlying robustness to digital records, the great breakthru will be when the presentation layer is made arbitrary. My feeling is that for this to occur, the digital object needs to be as self-descriptive as possible, which to me suggests that it would need to contain not just the essential data (ie the text of a novel or the PCM data of a sound) but also the immediate layers below - ie a playback application, and dependencies, ideally in source code form as well as compiled form, and all relevant assembly documentation. If an object-contemporary compiler were also included it would mean that the future archivist could have the option of emulating the original operating environment to make accessible the record, so including it would only add to the chance of the record surviving.
I think that adding these components to digital objects plays well with the fact that digital storage space is one of the resources we should consider almost inexhaustible bar technological/societal breakdown, so adding extra information to the digital record carries very minimal overheads.
Creating the agnosticism I refer to would also encompass a large workflow dependency - there's a lot that can be done on the proceedure side of a project to create ease of portability - simple, obvious things like having outputs referenced against zero-points, so that "default state" in "random app" is the correct relative setting, for example, or avoiding having offsets represented in metadata when it can be represented implicitly in the object.
The key reason resources like libraries and archives exist is that they are able to provide a resource far beyond the scope of an individual to create, and if we are unable to do something either difficult or impossible for an individual to do, then we have little relevance to society. Therefore I would state that attempting to defy the underlying characteristics of a digital object is something we have to pursue very strongly to deliver information of value to the greater world.

Emerson said...

Heh - I guess that was a long-winded way of saying that I don't disagree with you :-)

Aaron Tan said...

No worries, I enjoyed the conversation immensely. You might be interested to find out more about the project undertaken by the Nationaal Archief of the Netherlands. Their model uses a universal virtual machine and a modular emulator to recreate the host environment. I've just put up a new post on that on my blog...it be good to hear your thoughts on that :)