Sunday, October 18, 2009

What is the piano doing? (Posted in reverse order) 1

I'm posting the two halves of this explanation in reverse order because, when I view my posts, more recent ones come up above later ones. If your display is the other way around, I apologize!

The piano voice synthesizer, Part I

So, what is happening here?

First a detour.

There are a number of ways to synthesize speech, but most of them involve determining a set of characteristics of speech, then build a mechanism to reproduce that kind of sound. Experiments in this direction are not new: in 1779, one C.G.Kratzenstein at the Imperial Academy in St. Petersburg constructed a device which generated vowel sounds by blowing through a reed into chambers shaped like a vocal tract.

These approaches all worked from the standpoint of analyzing the existing system and building mechanical analogs. One such analog is on display at the Exploratorium in San Francisco, CA, and a write up on it, with pictures, can be seen here: http://www.exploratorium.edu/exhibits/vocal_vowels/vocal_vowels.html

Samuel Morse is supposed to have been able to form vocal sounds with his hands, and used the ability to prank friends and adults. (I don't remember, honestly, if that story is supposed to be true, but it goes on to say that when he was suffering some rather painful dentistry, he used it to tell the doctor to lighten up a little!)

There are other approaches which are valid: synthesize part of the system, then let the remainder of the system be used to provide its normal function. Witness the 'Talk Box'. Whether you are more familiar with the guitar antics of Peter Frampton (or piano antics of Stevie Wonder), or the animated Casey Junior, the engine that pulled the circus train in Disney's classic Dumbo, you've heard this: a sound source is captured and applied to the vocal tract, and the vocalist merely moves his mouth and oral cavity as they would for speaking. With the talk-box, a tube leads the sound of an amplifier to the player's mouth, with the Sonovox (used for Casey Jones and numerous interesting commercials through the 60's), a pair of audio transducers are pressed lightly against the neck of the performer. In either case, the effect is the same: the vocal chords are replaced in function by another sound source, and the oral cavity, lips, tongue and teeth are employed as for normal sound.

In each case, of course, the effort is to reproduce the physical action and the acoustical modifiers used in human vocal production.

Electronic efforts to reproduce vocal characteristics are more recent (as is electronics more recent than mechanics!) The easiest of these is the recorder-reproducer, where a human speaks and the sound pressure wave from their voice is recorded electronically, whether on magnetic tape, in vinyl (and originally, recording to wax disks were totally mechanical), or in digital numbers, which themselves are recorded or stored. For playback, the recording is processed through an opposite process which takes the stored numbers or signals and turns them back into audible sound. In this case, the recording captures all the information and reproduces most of it, with some attendent noise. However, you can't record a woman saying "He saw the cat," and play back a man saying "It's a Rolex!" The playback is what was recorded. (This leaves out a whole branch of electronic music, where recorded sounds are distorted, reversed, stook on their heads and severely beaten, or simply chopped up and re-ordered. That's because the discussion of recording/reproducing is but a step to a discussion of synthesis of speech, so please, let's not get off the track!)

One of the earliest efforts to reproduce the human voice electronically was the Voder of Henry Dudley. This machine had multiple keys and footpedals, each assigned to a certain aspect of the electronic vocal model. For instance, there were keys to produce the gutterals, frickatives and pops produced by the tongue and lips for hard consonants. There where hiss generators which provided the SH and S sound, and which could be mixed into the sound of the previous consonant sounds, or vowels for voiced consonantals. And there were a set of "formant filters", which could be engaged by different amounts depending on the pressure of the performer's hands. And performer it was (and most often "she" was, since women were almost exclusively trained to operate the Voder.) The Voder was used in a great hall at the 1939 Worlds Fair in NYC, and received rave reviews, but little came of it afterwards, probably because of the difficulty involved in operating it.

The Formant Filters are important. These are an electrical analog to the resonant characteristics of the human vocal tract in certain configurations. Generally, three formant filters are enough to make recognizeable vowel sounds: they are tuned, one above the excitation pitch, the next tuned higher, and the third tuned higher. By controlling how much sound they let through in those ranges, and using a sound source which is rich in content in those ranges, the formant filters do just what the vocal tract and sinuses do to the complex sound coming from the vocal chords: it carves them away until they sound like... well... vowels!

When I was in 7th grade, I found a Bell Labs kit in the classroom, and talked the teacher into letting me take it home and build it. My father helped me (a lot: he was a TV repair technician, and understood what the instructions said. It was a _real_ learning experience for me!) The result was an electrical circuit built on the back of a box, fed from a sound generator with a control voltage that made it's pitch rise and fall. The generator turned out a sawtooth wave (very rich in harmonic content) and it fed through three formant filters, formed by capacitors and inductors. We could change the formant filter pitches by changing the capacitors, and change their strength by changing resistors, and we got it to say "ahhhh" easily, and "eeeeeee" and even long "o", but getting the long "u" (or "oo" really) was very difficult: the filters got so strong that we couldn't get enough sound out to hear it!

I'm going to require that you retain this last paragraph's information for the next post when I get back to the piano voice synthesizer, so maybe you want to go back and re-read it: three formant filters, which could have their frequency (pitch) and strength (Q is the official term, but you could think upside down and use the term damping as easily) adjusted, and a sound source that provided lots of rich components (harmonics), and which could have its pitch varied to lend a sense of emphasis. There was no effort made at consonants in this box, just vowels. For all intents and purposes, it acted just like the artifical vocal tracts shown on the Exploratorium page above!

This is a good place to stop, until the next post.

No comments:

Post a Comment