You are here

Digitising cassettes FAQ

The following question and seventeen responses from ten contributors were posted on the Indigenous Languages and Technology (ILAT) discussion list from 11-13 February 2010. Each contributor is acknowledged beneath her/his response together with institutional affiliation.

 

Question re copying cassette tapes

Howdy all,

   I'm planning to copy some old language tapes on cassettes to a CD format. Any caveats would be welcome. One question -- for this purpose, is it necessary to use "music CDs", or will ordinary CD-R disks work?

Thanks,
Rudy Troike

 

1) CD-R will work fine. I think CD-RW won't though, but I'm not exactly sure.

 The main restriction is that the files must be encoded at 16 bit 
resolution, 44.1 kHz sample rate and stereo. Just out of interest (and possibly some professional bias) are you
 archiving these materials? And how are you digitising them?

Aidan Wilson
PARADISEC

2) Rudy,

 Given the affordability of blank CD's and given the worth of your
 content, I definitely suggest buying the best quality. The (very) small
 extra sum spent on top quality will be a good investment!
 Also, here's why CD-RWs should be avoided:

a) because of their erasability. True, common sense tells us that nobody 
should reformat a properly labelled CD-RW... but that's not incompatible
 with prudence;

b) though we have no evidence or statistics over a long period on 
longevity, Wikipedia CD-RW gives some clues: "CD-RWs
 are not as reliable for long-term storage; however, under recommended
 storage conditions, CD-RW should have a life expectancy of 2 years or 
more (as compared to 30+ years for CD-R)"
c) some CD players can not play CD-RW's

.

Eric Poncet
NunaSoft

3) I'd also strongly recommend having a hard drive backup (or two) and 
DVDs of the materials. That way if you need to replace the CDs or
 provide more copies it's easy to burn more.

Claire Bowern
Yale University

4) The lengthy and very serious discussion of long term sustainability of
 digital media aside...

If you're just asking about playback, I was surprised to find that regular
 CD-Rs didn't play in some older CD players, and that I did in fact need
 "music CDs" to share recordings with folks who just wanted something to 
listen to in the kitchen, in the car, etc...

 

Andrea L. Berez
University of California at Santa Barbara

5) I'm not so sure about the recommendation of stereo digitisation. If
 the originals are not stereo recordings, there's no point in creating a
 stereo digital recording, and indeed, even if there are two channels
 on the original tapes, if they do not reflect inputs from two different
 microphones, you don't have a true stereo recording and there isn't much
 point in preserving two channels.

 Also, 44.1 K samples/second is overkill for most linguistic material. If
 it contains music, such a rate may be desirable, but for most speech,
 22.05 K samples per second includes all of the information likely 
to be of linguistic significance.

 16 bit resolution is highly desirable, but there's nothing sacred about 
44.1K samples/per second sampling rate and stereo. These are merely
 residues of decisions made by the music industry and have nothing to
 do with the quality of linguistic recordings.

 

William J Poser
University of Pennsylvania

6) Aloha. I'll disagree with this. I work in the recording industry outside 
of my academic duties, and have played around a lot with this for our 
own audio archives.

 It is always best, IMHO, to digitize at the highest possible resolution, 
and you can always down-sample for online use. I've A/B live and analog 
conversions that were done at the 44.1k and converted down to 22k. vs. 
those recorded at 22k, and there is noticeably (to my ears) greater 
clarity and depth to those that were recorded at the higher rate and 
down-sampled. We tried this with our Ka Leo Hawai'i archives, and even 
my colleagues who lacked the recording background could hear the 
difference on relatively inexpensive speakers. My songwriting partner 
has done the same with some of his commercial recordings, and found that 
he can clearly hear the difference when he records at 96k and 
down-samples to 44.1, as opposed to recording at 44.1. 

I did the same kind of experiment with scanning. Try to scan an image at 
72DPI, and compare it to one scanned at a higher resolution and then 
down-sampled. The down-sampled ones are much clearer. Programs like 
Photoshop will extrapolate based on the surrounding pixels to create an 
image that is clearer than what the scanning software will do if it 
scans at 72DPI - those don't take the surrounding pixels into account. 
Slightly different case with audio. but similar results.

 To me a big consideration is the digital I/O device. If the tapes are 
valuable, get an external converter (PreSonus units are relatively 
inexpensive but very good). It will make a huge difference over using 
the built in mic or line jack on any computers.

 Re: mono recordings, agreed. However, if you record mono, make sure 
whatever format you are going to serve it as (or burn it to) will handle 
it properly. It's no fun listening to a mono track through one earphone, 
and some software encountered will do that, i.e., assume that the mono 
file is simply one side of a stereo track, and leave you with one ear 
listening.

 My 2 cents.

 

Keola Donaghy
University of Hawaii

7) While I disagree about the benefits or otherwise of higher resolutions and
 ample rates in digitisation, the point is, that an audio CD must be 
stereo, 44.1 kHz, 16 bit.

 Anything else will not play on any regular CD player (that is, which isn't
 a computer that can interpret the wav header). The reason is that audio 
CD wav files don't contain headers; they're raw PCM data - 1s and 0s. CD 
players are designed to interpret those 1s and 0s as stereo, 16 bit 44.1
kHz. Altering the properties, if it plays at all, will have effects on the
 audio such as playing too fast/slow (if the sample rate is incorrect) or 
just outputting digital noise. 

Aidan Wilson
PARADISEC

8) 

I would record at the highest rate even though, as Bill points out,
 that it is a waste of space. Terabyte Hard drives are the norm. If
 you are worried about space make sure you compress with a lossless
 compressions such as FLAC, programs like Audacity can do this 
natively. Audacity also has a timer for the record function so you
 can set it to record for 30 minutes, and it is available for Linux,
 Mac, Windows. This list is ordered for a reason ;) 

I don't think it really matters when burning to a CD what sample rate
 you use, because programs like iTunes or Windows Media Player usually
 ups-ample and convert to a WAV before the burn it to a CD anyways.

Neskie Manuel

9) 

If you're "creating an audio CD" using iTunes or Windows Media Player, then
 sure; it'll do all the converting for you, but this is potentially 
sub-optimal, as you may be creating a CD from mp3 files. These programs
 will happily expand them out to CD-Audio WAV specifications in the 
formatting and burning process, but needless
 compression/expansion/transcoding should be avoided.

 I was thinking along the lines of manually formatting a bunch of audio
 files using Audacity or something, and using a program like burn (on a
mac) or Brasero (on Linux). There must be a similar program for Windows -
the point is there should be a checkbox for 'audio CD' - but I haven't 
used Windows in some time.

 On the tangential issue running alongside this thread, yes, the higher the
 better when it comes to audio quality, and only transcode at the last
 minute. If you're digitising something and turning it into an audio CD for 
instance, do all your bits and pieces to the files, normalising, EQ, noise
 filtering, etc. first. Resampling and dithering to 44.1@16 should be the
 very last operation.

 Bill maintains that the human ear is incapable of discerning frequencies 
above 22000 kHz, and that may well be true, but the human brain is capable
 of hearing quantization at that speed. As a test, take a 44.1 kHz file and slow it down by half using ELAN or something - you'll hear choppy playback. If you do the same to a 96 kHz file the effect is nowhere near
 as noticeable.

 

Aidan Wilson
PARADISEC

10) Yes, an audio CD must be stereo, 44.1 KHz, 16 bit. But that's not the
 question. If you're digitizing analogue tapes, the data is likely to
 be used in a variety of ways, only one of which is making audio CDs.
 When you want to make an audio CD, if your data is in another format,
 you convert it to the audio CD format, which is easily done. If your 
original recording is monaural, as most linguistic recordings are in my
 experience, there's no point in wasting space and processing time in
 digitizing it stereo (or even worse, as can happen, digitizing one 
channel of voice and another of background noise). If you need a
 "stereo" version for an audio CD, it is a trivial matter to duplicate
 the single channel.

 With regard to higher sampling rates, I agree that they're desirable 
for music, which is of course what the recording industry is concerned
 with, but I repeat that there is not the slightest evidence that anything
 of significance in speech is found above 10Khz. Even if real psychophysical
 experiments rather than anecdotes demonstrate that people can tell the 
difference, the question is, does the difference matter? Piles of evidence
 form psychophysical experimentation together with practical experience
 in both phonetics research and speech technology indicate no. 

If you've got lots of space and processor time go ahead and digitize at
 44.1K, but for straight speech data there really isn't any good reason to go
 so high.

 I concur that the quality of the digitizer can make a lot of difference
(as can setting the input gain properly so as to take advantage of the
 full range of the quantizer while avoiding clipping.)
    I agree with Neskie that worrying about space is mostly a throwback to
 earlier times when much less storage was available, and I say this
 when on my more serious rant about the evils of lossy compression. As
 he says, if you're going to compress, use FLAC (or another lossless 
method, though FLAC is my personal favorite), not MP3 or some other
 lossy technique. At the same time, if you're doing things like editing large recordings, a lower sampling rate and/or fewer channels can reduce 
memory (primary memory, not disk space) and processing requirements 
enough to make it possible to edit recordings that would otherwise be
 too large or to obtain much greater responsiveness from what would otherwise be a sluggish machine.

 

William J Poser
University of Pennsylvania

11) 

I should perhaps clarify that I'm not so much arguing that you ought 
to use a lower sampling rate as that you shouldn't feel obligated to
 use a 44.1 K rate and shouldn't feel ashamed of producing inferior material if you do use a lower (but still sufficiently high) rate.

 Here's my overall position, for the usual situation in which lots of
 storage is available. (Those archiving data on, say, old satellites,
 are in a different situation.) If you want to save space, the sequence
 in which techniques should be used is as follows:

(a) record/digitize mono rather than stereo: This gives you a savings of 50% at no cost in quality. If you're working with something like conversational data this will not be true, so this applies only to monologues.

(b) use a lossless compression technique such as FLAC: This gives you a savings of about 50% (variable depending on the data) at no cost in quality. For some people this might be the first technique to use rather than the second. I prefer not to have to decompress to work with the data (if it isn't long term archival), but your mileage may vary.

(c) use a lower sampling rate: If you use a rate of 22.05K, this gives you a savings of 50% at little or no cost in quality. This applies only to pure speech data. Some music may well contain higher frequency components of significance.

(d) use a lossy compression technique: Don't. Ever. With current hardware there is unlikely to be any justification for doing this. (For some devices/users you may need to create MP3s, but these should be regarded as inferior versions of the material. Also, you may be able to use a high bit-rate MP3 and avoid most of the distortion.)

William J Poser
University of Pennsylvania

12) Aloha Bill. To me this is an argument in support of digitizing both 
channels, or at the very least listening to both sides before 
digitizing. We have had experience of tapes where there was a 
significant difference in audio quality between the two sides, or 
pre-echoes that were more noticeable on one side or the other. It would 
be easier to have a digitized file that you can more through fairly 
quickly to compare the two side. Of course for space considerations, the 
lesser side could be deleted. I've come across one situation where one 
side was better at the beginning of the tape (some clicking was audible 
in the better channel toward the end of the tape), and better on the 
other side at the end. Ended up splicing the two together.
    I can put my faith in research and academic papers or my own ears. I 
have had audio done as I described - one recorded at 22k and the other 
at 44.1 and down-sampled. There was a clarity to the second that was not 
present in the first, and it allowed me to differentiate some sounds 
that I could not when they were originally recorded at the lower rate. 
Same equipment, same software. There could be some variable that I don't 
know about, and it may not be the case for everyone else. As they say, 
your mileage may vary. I'd recommend to anyone who is going to start an 
archiving project to experiment broadly. If you can't hear the 
difference, by all means save the space and extra time it would take to 
process 44.1 files and go with 22.1. If I still have the files I'll post 
them, but I kind of doubt it as it was simply experimental and I 
probably deleted them after we came to a determination and created our 
system for the project.

 In our case, the reel-to-reel tapes were not marked and we had no way of 
knowing what, if any, noise reduction system was used on them. Took some 
experimenting, too.

 

Keola Donaghy
University of Hawaii

13) An issue that hasn't yet been discussed in relation to digitising old
 tapes is that it should only be done once. This may sound strange, but 
I know of several projects where a 'trial' digitisation occurs, at low 
resolution. These files then become the basis for time-coded
 transcripts and then, later, the project decided they needed archival 
versions of the media and redigitised to international archival
 standards (96khz/24 bit. Yes, I know this is overkill but it is the
 standard). Of course, the time-coded transcripts no longer match the
 newer, higher resolution versions.

 Another motivation for doing it right the first time is that the tapes 
themselves may not survive more than one playback (although this is
 rarely the case).

 

Nick

 Thieberger
Project Manager
, Pacific and Regional Archive for Digital Sources in Endangered
 Cultures (PARADISEC)

14) My final (I hope) contribution to this topic ;-)

 We should also consider the Nyquist-Shannon theorem, which states:

 If a function x(t) contains no frequencies higher than B hertz, it is 
completely determined by giving its ordinates at a series of points 
spaced 1/(2B) seconds apart.

 Essentially, you should sample at twice the rate of the highest 
frequency you will find in your recording. As Bill mentioned, if the 
highest frequency that human ears can hear is @ 10k, and we divide 22k 
by 2, 11k is the upper limit. Beyond that you get some aliasing. For 
human ears, probably not a problem. How about for as-yet unwritten 
computer software for transcription that can analyze such data, or 
better noise reduction algorithms that we have today? I dunno. I would 
still prefer to be safe and keep a copy at the higher rate. The 96k/24 
bit standard Nick cites may be overkill, but somewhere down the road our grandchildren may be grateful that it was done for reasons we don't yet 
comprehend.

 

Keola Donaghy
University of Hawaii

15) One last peep from me -

A recording engineer friend of mine has an axiom (don't know if it's his 
or he pirated it) that unless he has four copies of any digital files in 
four different location, the file doesn't really exist. I could cite an 
example that involved Kenny Loggins but will refrain from name dropping 
;-)

 Food for thought when considering backup strategies. As Nick noted, 
tapes may not survive more than one playback, or vaporize 20 years down 
the road when lack of sufficient backups bite someone in the behind. 

Keola Donaghy
University of Hawaii

16) 

On Thu, Feb 11, 2010 at 10:50 PM, William J Poser w
rote: 
"If you use a rate of 22.05K, this gives you a savings of 50%
 at little or no cost in quality. This applies only to pure speech 
data. Some music may well contain higher frequency components of signifi
cance.

"
    I'm not sure that I agree with this point entirely: using a sample 
rate of 22,050 Hz should, as both Bill and Keola have pointed out, be 
able to reproduce essentially the entire primary frequency range for
 speech, from the low 'bass' fundamental frequencies to the upper end
 of high-frequency frication. In that sense, we're not losing anything by recording language samples for phonetic analysis at this sample 
rate, and can certainly save storage space by doing so; 22,050 Hz has 
been recommended for a long time in phonetics, even finding its way 
into popular textbooks on phonetic fieldwork (e.g. Ladefoged 2003, p.
26).

 This doesn't mean that samples recorded at this rate necessarily sound as good as higher-frequency recordings, though, as Keola mentioned.
 Even though this sampling rate captures the essential frequency range 
for speech (i.e. up to around 11KHz), most people's hearing extends
 well beyond that into the 20KHz range. The lower sampling rate might 
not incur distortions that would affect phonetic analysis, but there
 is usually still an audible difference in quality between recordings
 digitized at 44.1KHz versus those digitized at 22.05KHz, and likewise 
for higher sampling rates. I'm not sure that Ladefoged's recommendation of a 22,050 Hz sampling
 rate was really made with reuse of recordings outside of phonetics in 
mind. For some other purposes, the 'aesthetic' sound quality of a
 recording may be fairly important, maybe particularly if recordings
 have some cultural, historical, or even just sentimental value. If
 22,050 Hz was all that was available, there'd certainly be no harm in
 choosing it -- but it would seem a shame to make recordings of a lower
 audio quality, just because they're all that's needed for instrumental
 phonetics!

 Anyway, that's just a thought. For what it's worth, NINCH (2003),
 Bartek & Kornbluh (2002), and the "Sound Directions" guide from 
Indiana University all appear to recommend 96Khz / 24-bit WAV for 
archival purposes.

Chris Cox
University of Alberta

17) Subject: Longevity of documentation
Has anyone looked into albums?

 I have enjoyed reading about the various issues that arise with
 digitizing cassettes. Many thanks!!

 The discussion has reminded me of a larger issue w.r.t. the long-term
 status of audio recordings. The question I have is whether anyone 
has looked into copying materials onto LPs - the old-fashioned analog
 album? The reason I ask is that I have been thinking that this might
 be one form of media that has the potential to retain its longevity 
more than others. Even 50 year old scratched, gummy old albums can
 be cleaned up and played and do not rely on whether this or that 0 or
 1 is at the beginning of the file, whether the compression algorithm
 changes the signal, the program changes, etc. etc. or whether the 
media is going to disintegrate after being played after 20 years.

 The album has been making a come-back lately and there are different types of presses that I am aware of, but want to know if anyone knows
 anything more about this. Ida Halpern (ethnomusicologist) used to 
take a "record-maker" (the picture I saw looked like a "record", not
 a wax cylinder) with her to document music of the Pacific Northwest
 and I wonder if there is anything like what she used that is 
available today.

Su Urbanczyk
University of Victoria

18) Hi,
 Thanks Suzanne for this point on longevity.
 Here's my two cents on archiving strategy, and the merciless fight against media loss: every time you get a new computer to replace the
 current one, don't just transfer all your files onto the new computer
 and get rid of the old one: take the old hard drive out of the computer
 and store it in a safe place. As most users, I change computer every 2-5 
years on the average, and my IT experience leads me to believe that the average hard drive longevity is significantly longer than 5 years
 (though I have no stats on that point). Over the past 25 years, I've always saved my old hard drives and kept them in different safe places.
 I've been able to successfully read files from them whenever I tried to.
 Due to the incremental nature of this strategy, that also means I have 6
 archives of those 25 year old files spread over different places, and 
I'll have 3 more in the next 10 years, etc. Not to mention the automatic 
daily + weekly + monthly backups of my current PC.
 Such an incremental archiving scheme would be way harder to implement
 with analog media. The cost? About 10 minutes of work to open that good
 old PC, unscrew the drive and unplug it.
 

Eric Poncet
NunaSoft