You are here
Digitising cassettes FAQ
The following question and seventeen responses from ten contributors were posted on the Indigenous Languages and Technology (ILAT) discussion list from 11-13 February 2010. Each contributor is acknowledged beneath her/his response together with institutional affiliation.
Question re copying cassette tapes
Howdy all, I'm planning to copy some old language tapes on cassettes to a CD format. Any caveats would be welcome. One question -- for this purpose, is it necessary to use "music CDs", or will ordinary CD-R disks work?
1) CD-R will work fine. I think CD-RW won't though, but I'm not exactly sure. The main restriction is that the files must be encoded at 16 bit resolution, 44.1 kHz sample rate and stereo. Just out of interest (and possibly some professional bias) are you archiving these materials? And how are you digitising them?
2) Rudy, Given the affordability of blank CD's and given the worth of your content, I definitely suggest buying the best quality. The (very) small extra sum spent on top quality will be a good investment! Also, here's why CD-RWs should be avoided:
a) because of their erasability. True, common sense tells us that nobody should reformat a properly labelled CD-RW... but that's not incompatible with prudence;
b) though we have no evidence or statistics over a long period on longevity, Wikipedia CD-RW gives some clues: "CD-RWs are not as reliable for long-term storage; however, under recommended storage conditions, CD-RW should have a life expectancy of 2 years or more (as compared to 30+ years for CD-R)" c) some CD players can not play CD-RW's .
3) I'd also strongly recommend having a hard drive backup (or two) and DVDs of the materials. That way if you need to replace the CDs or provide more copies it's easy to burn more.
4) The lengthy and very serious discussion of long term sustainability of digital media aside... If you're just asking about playback, I was surprised to find that regular CD-Rs didn't play in some older CD players, and that I did in fact need "music CDs" to share recordings with folks who just wanted something to listen to in the kitchen, in the car, etc...
Andrea L. Berez
University of California at Santa Barbara
5) I'm not so sure about the recommendation of stereo digitisation. If the originals are not stereo recordings, there's no point in creating a stereo digital recording, and indeed, even if there are two channels on the original tapes, if they do not reflect inputs from two different microphones, you don't have a true stereo recording and there isn't much point in preserving two channels. Also, 44.1 K samples/second is overkill for most linguistic material. If it contains music, such a rate may be desirable, but for most speech, 22.05 K samples per second includes all of the information likely to be of linguistic significance. 16 bit resolution is highly desirable, but there's nothing sacred about 44.1K samples/per second sampling rate and stereo. These are merely residues of decisions made by the music industry and have nothing to do with the quality of linguistic recordings.
William J Poser
University of Pennsylvania
6) Aloha. I'll disagree with this. I work in the recording industry outside of my academic duties, and have played around a lot with this for our own audio archives. It is always best, IMHO, to digitize at the highest possible resolution, and you can always down-sample for online use. I've A/B live and analog conversions that were done at the 44.1k and converted down to 22k. vs. those recorded at 22k, and there is noticeably (to my ears) greater clarity and depth to those that were recorded at the higher rate and down-sampled. We tried this with our Ka Leo Hawai'i archives, and even my colleagues who lacked the recording background could hear the difference on relatively inexpensive speakers. My songwriting partner has done the same with some of his commercial recordings, and found that he can clearly hear the difference when he records at 96k and down-samples to 44.1, as opposed to recording at 44.1. I did the same kind of experiment with scanning. Try to scan an image at 72DPI, and compare it to one scanned at a higher resolution and then down-sampled. The down-sampled ones are much clearer. Programs like Photoshop will extrapolate based on the surrounding pixels to create an image that is clearer than what the scanning software will do if it scans at 72DPI - those don't take the surrounding pixels into account. Slightly different case with audio. but similar results. To me a big consideration is the digital I/O device. If the tapes are valuable, get an external converter (PreSonus units are relatively inexpensive but very good). It will make a huge difference over using the built in mic or line jack on any computers. Re: mono recordings, agreed. However, if you record mono, make sure whatever format you are going to serve it as (or burn it to) will handle it properly. It's no fun listening to a mono track through one earphone, and some software encountered will do that, i.e., assume that the mono file is simply one side of a stereo track, and leave you with one ear listening. My 2 cents.
University of Hawaii
7) While I disagree about the benefits or otherwise of higher resolutions and ample rates in digitisation, the point is, that an audio CD must be stereo, 44.1 kHz, 16 bit. Anything else will not play on any regular CD player (that is, which isn't a computer that can interpret the wav header). The reason is that audio CD wav files don't contain headers; they're raw PCM data - 1s and 0s. CD players are designed to interpret those 1s and 0s as stereo, 16 bit 44.1 kHz. Altering the properties, if it plays at all, will have effects on the audio such as playing too fast/slow (if the sample rate is incorrect) or just outputting digital noise.
8) I would record at the highest rate even though, as Bill points out, that it is a waste of space. Terabyte Hard drives are the norm. If you are worried about space make sure you compress with a lossless compressions such as FLAC, programs like Audacity can do this natively. Audacity also has a timer for the record function so you can set it to record for 30 minutes, and it is available for Linux, Mac, Windows. This list is ordered for a reason ;) I don't think it really matters when burning to a CD what sample rate you use, because programs like iTunes or Windows Media Player usually ups-ample and convert to a WAV before the burn it to a CD anyways.
9) If you're "creating an audio CD" using iTunes or Windows Media Player, then sure; it'll do all the converting for you, but this is potentially sub-optimal, as you may be creating a CD from mp3 files. These programs will happily expand them out to CD-Audio WAV specifications in the formatting and burning process, but needless compression/expansion/transcoding should be avoided. I was thinking along the lines of manually formatting a bunch of audio files using Audacity or something, and using a program like burn (on a mac) or Brasero (on Linux). There must be a similar program for Windows - the point is there should be a checkbox for 'audio CD' - but I haven't used Windows in some time. On the tangential issue running alongside this thread, yes, the higher the better when it comes to audio quality, and only transcode at the last minute. If you're digitising something and turning it into an audio CD for instance, do all your bits and pieces to the files, normalising, EQ, noise filtering, etc. first. Resampling and dithering to 44.1@16 should be the very last operation. Bill maintains that the human ear is incapable of discerning frequencies above 22000 kHz, and that may well be true, but the human brain is capable of hearing quantization at that speed. As a test, take a 44.1 kHz file and slow it down by half using ELAN or something - you'll hear choppy playback. If you do the same to a 96 kHz file the effect is nowhere near as noticeable.
10) Yes, an audio CD must be stereo, 44.1 KHz, 16 bit. But that's not the
question. If you're digitizing analogue tapes, the data is likely to
be used in a variety of ways, only one of which is making audio CDs.
When you want to make an audio CD, if your data is in another format,
you convert it to the audio CD format, which is easily done. If your
original recording is monaural, as most linguistic recordings are in my
experience, there's no point in wasting space and processing time in
digitizing it stereo (or even worse, as can happen, digitizing one
channel of voice and another of background noise). If you need a
"stereo" version for an audio CD, it is a trivial matter to duplicate
the single channel.
With regard to higher sampling rates, I agree that they're desirable
for music, which is of course what the recording industry is concerned
with, but I repeat that there is not the slightest evidence that anything
of significance in speech is found above 10Khz. Even if real psychophysical
experiments rather than anecdotes demonstrate that people can tell the
difference, the question is, does the difference matter? Piles of evidence
form psychophysical experimentation together with practical experience
in both phonetics research and speech technology indicate no.
If you've got lots of space and processor time go ahead and digitize at
44.1K, but for straight speech data there really isn't any good reason to go
I concur that the quality of the digitizer can make a lot of difference
(as can setting the input gain properly so as to take advantage of the
full range of the quantizer while avoiding clipping.)
I agree with Neskie that worrying about space is mostly a throwback to earlier times when much less storage was available, and I say this when on my more serious rant about the evils of lossy compression. As he says, if you're going to compress, use FLAC (or another lossless method, though FLAC is my personal favorite), not MP3 or some other lossy technique. At the same time, if you're doing things like editing large recordings, a lower sampling rate and/or fewer channels can reduce memory (primary memory, not disk space) and processing requirements enough to make it possible to edit recordings that would otherwise be too large or to obtain much greater responsiveness from what would otherwise be a sluggish machine.
William J Poser
University of Pennsylvania
11) I should perhaps clarify that I'm not so much arguing that you ought to use a lower sampling rate as that you shouldn't feel obligated to use a 44.1 K rate and shouldn't feel ashamed of producing inferior material if you do use a lower (but still sufficiently high) rate. Here's my overall position, for the usual situation in which lots of storage is available. (Those archiving data on, say, old satellites, are in a different situation.) If you want to save space, the sequence in which techniques should be used is as follows:
(a) record/digitize mono rather than stereo: This gives you a savings of 50% at no cost in quality. If you're working with something like conversational data this will not be true, so this applies only to monologues.
(b) use a lossless compression technique such as FLAC: This gives you a savings of about 50% (variable depending on the data) at no cost in quality. For some people this might be the first technique to use rather than the second. I prefer not to have to decompress to work with the data (if it isn't long term archival), but your mileage may vary.
(c) use a lower sampling rate: If you use a rate of 22.05K, this gives you a savings of 50% at little or no cost in quality. This applies only to pure speech data. Some music may well contain higher frequency components of significance.
(d) use a lossy compression technique: Don't. Ever. With current hardware there is unlikely to be any justification for doing this. (For some devices/users you may need to create MP3s, but these should be regarded as inferior versions of the material. Also, you may be able to use a high bit-rate MP3 and avoid most of the distortion.)
William J Poser
University of Pennsylvania
12) Aloha Bill. To me this is an argument in support of digitizing both
channels, or at the very least listening to both sides before
digitizing. We have had experience of tapes where there was a
significant difference in audio quality between the two sides, or
pre-echoes that were more noticeable on one side or the other. It would
be easier to have a digitized file that you can more through fairly
quickly to compare the two side. Of course for space considerations, the
lesser side could be deleted. I've come across one situation where one
side was better at the beginning of the tape (some clicking was audible
in the better channel toward the end of the tape), and better on the
other side at the end. Ended up splicing the two together.
I can put my faith in research and academic papers or my own ears. I have had audio done as I described - one recorded at 22k and the other at 44.1 and down-sampled. There was a clarity to the second that was not present in the first, and it allowed me to differentiate some sounds that I could not when they were originally recorded at the lower rate. Same equipment, same software. There could be some variable that I don't know about, and it may not be the case for everyone else. As they say, your mileage may vary. I'd recommend to anyone who is going to start an archiving project to experiment broadly. If you can't hear the difference, by all means save the space and extra time it would take to process 44.1 files and go with 22.1. If I still have the files I'll post them, but I kind of doubt it as it was simply experimental and I probably deleted them after we came to a determination and created our system for the project. In our case, the reel-to-reel tapes were not marked and we had no way of knowing what, if any, noise reduction system was used on them. Took some experimenting, too.
University of Hawaii
13) An issue that hasn't yet been discussed in relation to digitising old tapes is that it should only be done once. This may sound strange, but I know of several projects where a 'trial' digitisation occurs, at low resolution. These files then become the basis for time-coded transcripts and then, later, the project decided they needed archival versions of the media and redigitised to international archival standards (96khz/24 bit. Yes, I know this is overkill but it is the standard). Of course, the time-coded transcripts no longer match the newer, higher resolution versions. Another motivation for doing it right the first time is that the tapes themselves may not survive more than one playback (although this is rarely the case).
Project Manager , Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC)
14) My final (I hope) contribution to this topic ;-) We should also consider the Nyquist-Shannon theorem, which states: If a function x(t) contains no frequencies higher than B hertz, it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart. Essentially, you should sample at twice the rate of the highest frequency you will find in your recording. As Bill mentioned, if the highest frequency that human ears can hear is @ 10k, and we divide 22k by 2, 11k is the upper limit. Beyond that you get some aliasing. For human ears, probably not a problem. How about for as-yet unwritten computer software for transcription that can analyze such data, or better noise reduction algorithms that we have today? I dunno. I would still prefer to be safe and keep a copy at the higher rate. The 96k/24 bit standard Nick cites may be overkill, but somewhere down the road our grandchildren may be grateful that it was done for reasons we don't yet comprehend.
University of Hawaii
15) One last peep from me - A recording engineer friend of mine has an axiom (don't know if it's his or he pirated it) that unless he has four copies of any digital files in four different location, the file doesn't really exist. I could cite an example that involved Kenny Loggins but will refrain from name dropping ;-) Food for thought when considering backup strategies. As Nick noted, tapes may not survive more than one playback, or vaporize 20 years down the road when lack of sufficient backups bite someone in the behind.
University of Hawaii
On Thu, Feb 11, 2010 at 10:50 PM, William J Poser w
"If you use a rate of 22.05K, this gives you a savings of 50%
at little or no cost in quality. This applies only to pure speech
data. Some music may well contain higher frequency components of signifi
I'm not sure that I agree with this point entirely: using a sample rate of 22,050 Hz should, as both Bill and Keola have pointed out, be able to reproduce essentially the entire primary frequency range for speech, from the low 'bass' fundamental frequencies to the upper end of high-frequency frication. In that sense, we're not losing anything by recording language samples for phonetic analysis at this sample rate, and can certainly save storage space by doing so; 22,050 Hz has been recommended for a long time in phonetics, even finding its way into popular textbooks on phonetic fieldwork (e.g. Ladefoged 2003, p. 26). This doesn't mean that samples recorded at this rate necessarily sound as good as higher-frequency recordings, though, as Keola mentioned. Even though this sampling rate captures the essential frequency range for speech (i.e. up to around 11KHz), most people's hearing extends well beyond that into the 20KHz range. The lower sampling rate might not incur distortions that would affect phonetic analysis, but there is usually still an audible difference in quality between recordings digitized at 44.1KHz versus those digitized at 22.05KHz, and likewise for higher sampling rates. I'm not sure that Ladefoged's recommendation of a 22,050 Hz sampling rate was really made with reuse of recordings outside of phonetics in mind. For some other purposes, the 'aesthetic' sound quality of a recording may be fairly important, maybe particularly if recordings have some cultural, historical, or even just sentimental value. If 22,050 Hz was all that was available, there'd certainly be no harm in choosing it -- but it would seem a shame to make recordings of a lower audio quality, just because they're all that's needed for instrumental phonetics! Anyway, that's just a thought. For what it's worth, NINCH (2003), Bartek & Kornbluh (2002), and the "Sound Directions" guide from Indiana University all appear to recommend 96Khz / 24-bit WAV for archival purposes.
University of Alberta
17) Subject: Longevity of documentation
Has anyone looked into albums? I have enjoyed reading about the various issues that arise with digitizing cassettes. Many thanks!! The discussion has reminded me of a larger issue w.r.t. the long-term status of audio recordings. The question I have is whether anyone has looked into copying materials onto LPs - the old-fashioned analog album? The reason I ask is that I have been thinking that this might be one form of media that has the potential to retain its longevity more than others. Even 50 year old scratched, gummy old albums can be cleaned up and played and do not rely on whether this or that 0 or 1 is at the beginning of the file, whether the compression algorithm changes the signal, the program changes, etc. etc. or whether the media is going to disintegrate after being played after 20 years. The album has been making a come-back lately and there are different types of presses that I am aware of, but want to know if anyone knows anything more about this. Ida Halpern (ethnomusicologist) used to take a "record-maker" (the picture I saw looked like a "record", not a wax cylinder) with her to document music of the Pacific Northwest and I wonder if there is anything like what she used that is available today.
University of Victoria
18) Hi, Thanks Suzanne for this point on longevity. Here's my two cents on archiving strategy, and the merciless fight against media loss: every time you get a new computer to replace the current one, don't just transfer all your files onto the new computer and get rid of the old one: take the old hard drive out of the computer and store it in a safe place. As most users, I change computer every 2-5 years on the average, and my IT experience leads me to believe that the average hard drive longevity is significantly longer than 5 years (though I have no stats on that point). Over the past 25 years, I've always saved my old hard drives and kept them in different safe places. I've been able to successfully read files from them whenever I tried to. Due to the incremental nature of this strategy, that also means I have 6 archives of those 25 year old files spread over different places, and I'll have 3 more in the next 10 years, etc. Not to mention the automatic daily + weekly + monthly backups of my current PC. Such an incremental archiving scheme would be way harder to implement with analog media. The cost? About 10 minutes of work to open that good old PC, unscrew the drive and unplug it.