Tuesday, April 26, 2011

Inevitable Data Loss

The Encyclopedia Foundation, as described by Dr. Isaac Asimov in his “Foundation Series”, was an organization tasked with compiling the knowledge of mankind. All presented as an “Encyclopedia Galactic”. In another article, it has been pointed out that this would actually have to be a “set of book sets” as opposed to a “set of books”. “All” information, however simplified, generalized or modified is actually impossible to have in a simple set of books.

But for that matter, it’s impossible to have in an entire series of book sets. How did Janov Perorat, ancient history professor of Terminus put it to Golan Trevize? He quoted Gennerat’s Law which stated that “The falsely dramatic drives out the truly dull.” True enough. Stories of wars and assassinations and battles and intrigue are well documented in history. Not so much innovations and businesses and ideas of the same time periods. And certainly not the personal lives of the ordinary people and towns.

Nor was this because they didn’t keep records. The ancients did. Tax rolls, if nothing else, and one can see the Bible speaking of Joseph and his family travelling to register. That was not – if you were wondering – done by him saying his name and the tax collector just remembering him. No, they had scribes. Records were kept. Thoroughly enough that you don’t notice the peoples of those times laughing it off and ignoring the decrees to report.

Where are those records? Lost, mostly. Yes, they knew how to copy, they could have copied out a new copy any time the papyrus or such was deteriorating. And if stuff was important, they did. But most of it was not worth preserving, and so now is lost to us forever. We may say that Joseph registered for his taxes as that happened to be mentioned in a book that was thought worthwhile to preserve. But we have no idea whether Dan son of Jedediah of Damascus did. Or even if a person of such name ever existed.

We are coming up on a time where the same thing is going to happen. For different reasons than you think.

Most people might agree we are going to lose data for the reason that we stored so much on punch cards and micro-film and cassette tapes and other old fashioned mediums that we will lose it when the last machine that can read it breaks down. And there is some truth in that, and when it comes to a lot of the completely pointless data, we may well already have, or be about to.

However, there has been some notice of this, and others are taking action, notably our government. They wish to preserve data, pretty much all data, though they may not know what an impossible task that is. Others, such as private corporations, have a better for feel for the complexities of the task, and are only looking to preserve certain data, so are making better progress.

One thing agreed on is this: The electronic storage of data is foolhardy. Oh, it should be done, but it should be backed up with a hard copy. And not “hard copy” as in punch cards, tapes, spools, or even USB Flash Drives. Literal hard copy. As in print outs.

But printouts of paper – books, if you will – are not very practical for vast volumes of data. Especially as you’d like to be able to put it in any future computer for sorting and retrieval. One can feed in paper after paper to a scanner, but that could be a bit labor intensive. Not to mention that such large amounts of paper stored require a lot of space. And maintenance. And will deteriorate anyway.

So increasingly people in the know are looking to store information on small metal discs. Micro-engraved discs could put 10,000 to 100,000 book pages on a disc little bigger than two inches each side. And computers could be had which could read that micro-print in which case that little data disc would serve the purpose of old time punch cards or floppy discs.

Benefits? Incalculable. They would last for thousands of years. They could be read by computers we have now. Future computers will always have the means to read microprint, so the discs will not become obsolete. And – under certain conditions – they could be read without a computer, if one had a powerful enough microscope, in some extreme cases an electron microscope.

So we are not going to have a massive data loss due to our reliance on paper or floppy discs. That problem is solved. Those with foresight will have their data preserved. Those who don’t…will have to eventually anyway, if they are to stay competitive. We believe that such metal discs will be the norm in the future, and will cost remarkably little.

Where we will have the data loss is that while we could start saving data on computers and downloading it on to metal discs now, some data is not on computers, and would have to be entered in to one so as to be transcribed on a metal disc. In a lot of cases, this probably won’t be worth the bother. Some data may even be on a computer, but whoever put it there, or has access to it, may not get around to placing it on a metal disc.

Here we are referring to such things as the local property tax records of Unimportantville, Wyoming for 1872. Things like that may get lost.

Another data loss besides that is coming. Let us say that by massive effort, most all of the “useless” data is actually put on discs. In many cases these would be just a single person or small group, one with a special care for that information, who did it. And they would only make one or two copies.

You see where this is going? Even though it might just be one two inch disc, our population is growing, there's new knowledge, new people, and always new “just” one more two inch disc. At what point is the local library as full of those locally boring discs as some of them are with microfilm?

And while it’s popular for local entities to have back-ups in State facilities, those will get full, too. Oh, perhaps not soon. But after 1,000 years? You bet. At some point, not everything is actually worth storing a hard copy of. Even assuming a completely peaceful and disaster free 1,000 year period. There would also be the silliness of needing to have an inventory of millions of meaningless data discs. Which would mean a disc of that...etc.

But usually there is a flood of the local library. Or a town simply dies and the library is abandoned. Or a war. Or for all we know, an asteroid 7,438 years from now that takes out Earth, leaving only those who live in space to carry on. We cannot know what will happen, only that inevitably something will happen. And only the data current in electronic databanks, and valuable enough to have a million plus copies of it scattered about will be kept over the “long” haul. “Long” meaning ten thousand years from now.

It is possible that electronic storage will continue to develop. And perhaps it will become possible to have all data – even trivial data – stored electronically, accessible to anyone with net access. And perhaps civilization will never have any hiccups to ever interrupt service or cause a loss of stored data. Or any deliberate sabotage, from deletions to subtler re-writings.

But from a review of history, we are not going to bet that way. Information not backed up on a metal disc will inevitably be lost. And most metal discs with only a few copies will inevitably be lost. Dr. Asimov seemed to be aware of this. After all, he had a character in “Foundation’s Edge” who’s principle job was to try and re-discover lost knowledge about a topic blindingly basic to us – the location of man’s original home world! If that can be lost, we would think anything could be!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.