naturerecordists
[Top] [All Lists]

Re: Long term storage

Subject: Re: Long term storage
From: "Peter Shute" pshute2
Date: Fri Feb 17, 2012 9:02 am ((PST))
In my experience with archived data, nothing is totally safe. Archives need to 
be backed up, but then can you rely on the backup?

I recall desperately copying an archive of floppy disks to Syquest disks, 
fitting 100 or so floppies to each disk. Then a few years after copying the 
Syquest data to hard disk when they started dying too. Scary moments when the 
now irreplaceable drive began failing too.

At least on hard disk I could then regularly copy them to a DAT backup drive in 
one hit. Those DAT drives used to last about two years before they became 
unreliable, but at least the next model could read the previous model's tapes, 
unlike the 1/4" tape drives that preceded them. These days LTO drives live much 
longer and are backward compatible too.

I feel this regular copying is the key to safety, and these days it would be 
copying to a rotated regularly replaced set of removable hard drives. At least 
then one knows immediately when failure begins, and one can replace the drives 
before it's too late.

If random loss of bits occurs due to weakening of the tiny magnetic fields, 
perhaps just running regular low level checks on the disk will rewrite them and 
keep them at full strength? Not possible if the disk is a CD or DVD, of course.

But I've never been involved in audio archiving before. I'm interested in the 
comparison of storing digital audio with storing analog tapes. Unless 
physically damaged, a tape can always be placed in a player and you can at 
least hear *something* even if it's deteriorating.

What can you do if a hard disk deteriorates? With digital data, your computer 
could just tell you no, you can't even look at that file, or perhaps even the 
whole disk.

Often one can still run a recovery program, which pulls off data anyway, even 
if the computer says it's damaged and unreliable. (For hard disks at least. Not 
sure about CD, etc).

These programs have to guess where each file begins and ends, and can give you 
a pile of files with names like unknown00001.dat. It's then up to you to 
examine each one with a hex viewer and try to guess the filetype from patterns 
in the headers. For example, a jpeg file will have the letters JFIF in the 
first few bytes.

This makes me think your archived audio would be safer if you knew all the 
files were, for example, wav files. Ie, it might help not to mix audio and 
documents and programs on the same archive disk.

Sometimes the recovery program can't tell for sure where the files start and 
end, and there might be several files in the one recovered file. Then knowing 
the file format and what the headers mean will help. Is there an audio format 
that records the file length at the start?

Sometimes the program can't even be sure what order the blocks of data were in, 
and the contents of the recovered files could be disordered chunks, say, 4KB 
long. What then? Is your audio file format structured in a way that lets you 
get usable information from an isolated chunk? Or is it all useless without the 
rest of the file? I suspect wav format might give you much more than, say, mp3, 
but I don't know.

Finally, the data might contain parts that are just wrong. Then I suppose the 
above applies. It's better if the format lets you play the damaged file and 
hear distortion in one part than not be playable at all even though most of the 
file is intact.

My conclusion, not based on any experience, is that to match the recoverability 
of analog tape, one's archived data should be:
- On a freshly formatted drive, and that no deletions and replacements occur 
after copying. This is to guarantee the data is stored sequentially to assist 
recovery.
- Of known filetypes that can be identified by manual examination of the 
headers. Preferably just the one filetype? Perhaps stuff like Audacity project 
files should be archived elsewhere. There are hundreds of them for one audio 
file that's being worked on.
- A filetype that allows recovery of parts of the file by manual examination.

I guess it would also help if you knew for sure what was on there, so a printed
Index in the same order as the files are placed on the disk would be helpful.

Any thoughts? Does wav format allow identification of its parts? Can isolated 
chunks be played?

Peter Shute


--------------------------
Sent using BlackBerry

<Prev in Thread] Current Thread [Next in Thread>
Admin

The University of NSW School of Computer and Engineering takes no responsibility for the contents of this archive. It is purely a compilation of material sent by many people to the naturerecordists mailing list. It has not been checked for accuracy nor its content verified in any way. If you wish to get material removed from the archive or have other queries about the archive e-mail Andrew Taylor at this address: andrewt@cse.unsw.EDU.AU