Recovering corrupted files

A discussion about using Transcriber with Fusion on a Mac was held in August 2010 on the RNLD email list.

Question:

I was using my ZoomH2 to record a long language session. The batteries died after about an hour which I didn't think was a total disaster because I thought the Zoom would save the file before it completely shutdown. But now I'm panicking slightly because the file seems to be error-ridden and untransferable. It is using up storage space (~500MB) so there is *something* there, but it doesn't playback, upload or anything else. It doesn't give a duration either (it just has 00:00:00). Is there anything I can do or is the recording lost forever??

Responses:

There is hope, but it depends on your digital skills.
First, I would transfer every and anything that is on the device to a computer HDD, even deleted and hidden files. Do not edit anything on the device itself; the file might be actually there (invisible) and so might overwrite valuable information if you try to do anything on the device.
For recovering deleted files I personally like to use Back2life for the Total Commander file manager, but there are other good tools around.
Then you have to figure out which is the file which was created but not properly saved during the recording.  The date tag may help, but also file size. I am not sure that it is the file that reports to have the size of the device itself, because what failed was the battery, not storage capacity, right?  But that depends on how your device starts a new file etc. Once you have the file identified, and safely saved on a normal HDD, you probably will have to fix it.
I had once a broken WAV file which would not open nor play, although the content was there. One possibility to fix such a file is to change manually the header, where the length (time duration) is indicated.  Probably it still says "0 sec", which is why the file does not work.  Using a simple text editor (but probably you will have to use a different editor than Notepad, which does not open very large files), find the place in the header and change the value. There are several how-to manuals around in the internet, google terms such as "fix broken wave sound file header".
Actually, I remember I used SoundForge for fixing the file. I just had to change the options when opening the file. Instead of opening it as a plain WAV file (which is the default, but will fail due to the wrong duration tag in the header), I opened it as a RAW SOUND FILE, if I remember it correctly (else test other possible options) - then the header will be ignored. Saving the file back as WAV produced a clean file which would also work with other devices. Possibly, after the proper recording the file still goes on and on with other information or just white noise. Use SoundForge (or Audacity or a similar tool) to delete that final part.
For future recordings, especially when you know the batteries may fail at some point, consider configuring your device to start automatically a new sound file each X (for instance, 5) minutes, so you will lose less if everything goes wrong. The little extra work it will take you to join the files again to one large audio file (again, with Soundforge or similar) is certainly worth it.

The problem is most likely the header. WAV files begin with a header which in the simplest standard-conforming case is 44 bytes long. This header contains information about the representation of the audio and its duration. When recording in real time, it is of course impossible to fill in the duration information correctly - you have to leave those four bytes blank, or set them to 0, then come back and fill in the correct information once the recording is complete and you know its duration. Anything that terminates the recording before it is possible to go back and clean up the header will result in bad duration information.
Depending on how your device writes data to the disk, which is a function both of the drive technology and the software, the most recently recorded audio data may also be missing or corrupted by the loss of power.
As already noted, one approach is simply to remove the header, then convert the resulting raw file back to WAV. In this case, you may need to provide the converter with information about the audio since it can't get it from the header. Note, by the way, that if you tell the converter that your corrupted WAV file is a raw file, it does not actually strip the header - after all, you've told it there isn't any. Rather, what it does is treat the header as the first bit of audio data. The result is that the first few samples of your new audio file will be garbage. This won't make any real difference though since at typical sampling rates the garbage will have a duration of about 1 millisecond.
The other approach is to edit the WAV file header, which, however, takes a bit of computing expertise. The duration is the length of the audio chunk in bytes, expressed as a 4 byte little-endian unsigned integer. If the WAV file is in the simplest standard-conforming format, those four bytes will be bytes 40-43 (assuming that the first byte of the file is numbered zero). Unfortunately, it is not uncommon to encounter "WAV files" that do not conform to the standard, and it is also common for them to be standard-conforming but contain additional, usually unnecessary, chunks. (The WAV format is, from a linguistic point of view, much more complex than necessary. WAV files potentially contain all sorts of stuff of interest only to the entertainment industry, such as play lists and cue lists.)
For those interested,here are links to

Acknowledgements:

Thanks to contributors Greg Dickson, Sebastian Drude and Bill Poser.