Tuesday, March 26, 2024

Minor setbacks, major helps

Minor setbacks. An understatement, I suppose. This morning (8/17/09), commuting to work, I dozed at the wheel. The result is that my car had to be winched off the tree I hit because I dozed off in the middle of a turn around a corner and never straightened the wheel. 

Particularly disturbing was waking up again about a half-second before the crash with the tree approaching me at speed. 

What happened? Narcolepsy. Why? Well, it appears that I have untreated sleep apnea, and narcolepsy is one of the side of effects of having untreated apnea. Amazingly enough, there is a medication to treat it (which, in hindsight, since I'm finishing this up in 2023) which works very well. The bad news is that it is extremely expensive, so if insurance covers it at all, it is usually with an enormous co-pay (on the order of $50/month, and if you don't, it can cost between $750 and $1300/month. (That was then, and surprisingly, that hasn't changed in all the intervening time: The French company that owns the patents agreed to make generics, but only if they owned the generic-making company, with a promise to release that to other companies in time...which they've done, but that didn't significantly change the price, either.)

What treats narcolepsy? 
Modafinil was the original medicine in this particular family. The odd fact is that it isn't very pure, for that same reason that not all racemic molecule can't be trusted to stay the same.

Racemic??

Racemic molecules are chiral, meaning that they have "handedness". If you have the left-handed and right-handed versions, (and are either microscopic yourself or have really good microscopes), you can't turn one so that it is identical to the other, even though they use exactly the same atoms and, other than the handedness, the exact same structure. Probably the most famous racemic drug was Thalidomide, the drug which was, at one time, considered "God's gift to Pregnant women", and then, as the cause of horrific birth defects. Thalidomide consists of two structural units with a single bond between them, one unit being a 'glutarimide ring' and the other a phthalimide. The phthalimide portion is a dual-cycle structure, a five-sided carbon ring mated at one side with a six-sided carbon ring, with the extreme point of the five sided ring having a nitrogen atom substituted for the carbon. On each of the the two neighboring carbon atom, an oxygen atom is held in a double-bond. The glutarimide ring is a six-sided carbon ring, which, similarly, has a substituted nitrogen atom with two double-bonded oxygen atoms on either side, but the nitrogen is not at the extreme from the bond between the two constituent units. It is offset from that point so that one of the oxygen atoms is at the extreme end. The bond between the two units goes from the nitrogen atom in the five-sided ring to a carbon in the six-sided ring. This one bond can hold the six-sided glutarimide ring above the plane of the phthalimide molecule or below it, and depending on which direction the glutarimide ring is held, the first carbon in the glutarimide ring rises above or falls below the plane of the rest of that ring. A picture is worth a thousand words, so here, and good luck with it (because sometimes linked images get lost):
https://en.wikipedia.org/wiki/File:Thalidomide-structures.png

Now as it was belatedly discovered, the difference between the angle between the two units (and associated lift or descent of the glutarimide ring) makes the difference between the drug acting as a suppresser of morning sickness in pregnant women and a birth-defect cause in the fetus, and the horror of Thalidomide is that the racemic can change between the two states (each called an 'enantiomer'), through a process which is still not well understood. In the case of Thalidomide, the chiral shift took place within the human body. 

Now as it happens, Thalidomide could be racemically separate, so that only the enantiomer which treats morning sickness was given to pregnant women... but because of the spontaneous (or seeming spontaneous) chiral shift, it is likely rather than merely possible that half of the dose would undergo the shift. So the drug was removed from the market in 1961. But it turns out that it was also an effective cancer treatment drug, and so it is back on the market, with treatments involving Thalidomide and other drugs, requiring risk assessment, because it can still cause birth defects if the woman who is treated (or her partner, since Thalidomide can be passed in semen).

Modafinil is a racemic structure, with a right-handed and left-handed form, but it can be racemically filtered. The right-handed form is an effective treatment for Narcolepsy, although the mechanism is not perfectly understood. The left-handed form is psychogenically inert, and it's major action seems to be to make the user's urine smell really bad. To add to the already burdensome information I've provided, I'll add that the enantiomer forms which are considered right-handed are designated with an "r-" prefix, while the left handed forms are prefixed with "s-". (see sidebar). So the purified version of modafinil, which contains only the right-handed enantiomer of modafinil sells under the generic name Armodafinil (brand name Nuvigil, among others). And that's what I was prescribed.

Armodafinil is treated as a class-3 narcotic, not because it demonstrates narcotic effect on humans, but because its effect falls loosely into a psychological categorization: it acts upon the brain in some way that causes negation of the drowsy effect of Narcolepsy (or, another way of looking at it would be that it "promotes wakefulness". This has led to some really dumb efforts by people who just don't know any better or just don't care, to claim that its wakefulness promoting effect has a side effect of promoting better memorization. I can tell you, from first hand experience that it doesn't. I'm 70, I've been taking it for 6 years, on and off, and by golly, my ability to memorize new information or recall stuff I've had in my head for decades is not even slightly improve, short- or long-term. If you meet me on the street, the odds I'll remember your face are high, but your name? feh! So don't take offense, please. 

However, taking Nuvigil has allowed me to have something approaching a normal life. My father had Narcolepsy, and he'd drop out in the middle of a sentence (if he was sitting) and wake up anywhere from minutes to hours later, to continue the sentence. In my case, when I was off the drug during the COVID era, I did the same. My wife and kids coded it "unscheduled nap", and claimed it was cute, so I codify it as "bouts of unscheduled cuteness." 

Promised sidebar, from the pedant department:

Since chirality involves two identical structures of identical components which, because of angles between components, out-of-plane, cause them to be significantly different, some form of identifying the difference was needed. There has already been a system in place for a long time, related to the glyceraldehyde molecule, where what we now considered right-handed was originally given a (+) prefix and (-) to the left-handed one. This was actually pretty cool, because years, later, in 1951, x-ray crystallography confirmed that there was a right-handed aspect to the (+) labelled form.

The two forms have since been labeled with an R/L system and an S/D system. In the latter, the letters are based on the Latin Sinister and Dexter terms for, respectively, left and right. The earlier system used the Latin Rectus as a cognate the English "to the Right", and "Laevo", Latin for "to the left".

Wednesday, April 26, 2023

What's wrong with CFLs anyway?

OK, so what's the problem with CFLs?

First, they are fluorescent bulbs. That means that they produce light in a very different fashion from Thomas Edison's lil' bulbs: Incandescents make light from electrical currents heating a metal (tungsten alloy) filament to white-hot.

Instead, a Compact Fluorescent bulb consists of two stages: the opaque white ceramic base with the Edison mount contains lots of electronics, and the bulb contains gas, a phosphor coating, and two electrodes. The electronics part is actually a miniaturized ballasted AC-AC converter supply. The gas is harmless enough, but it has mercury, a required part of the process of making the light. This fact has ramifications which may be hard for the uninitiated to realize, so I'll make them plain here.

  • Phosphors are not good for you. They are more not good for you than Lead oxides.
  • Fluorescents produce light that your eyes don't like as much as light from incandescent filaments. Why is this so? Because the gas in the CFL tube emits invisible light, which is used to pump the phosphors. The phosphors emit numerous (5-7 or so) specific wavelengths of light, rather than a smooth spectrum, like the sun and heated metals (like the tungsten alloy used in incandescents.) Those lines "look" like white to our eyes, but they do it by beating on your eye's sensors with two or three wavelengths in the vicinity of each sensor (Cone), rather than a broader spectral set of wavelengths. And they also use the wall power almost directly, which means the generated light varies as the wall power does, at 60 or 120 cycles per second. Many people can detect light like that with their eyes, and it is very annoying.*
  • The way CFL's make light is a lot like how neon signs make light. The difference is mercury. The amount of mercury has been reduced from the original fluorescent tubes of our youth, but it's still there, and when you throw away a CFL or break it, there's mercury with the phosphors to be cleaned up.
  • Those electronic ballasts are more complicated than the ballasts for straight, large tubes: the care that goes into CFL electronics is notorious.
  • In addition to marginal designs, the electronics in CFL bulbs is limited by cost, and failures are usually due to cheap parts.
  • CFLs with electronic failure are easily repaired, if you know the failed part and can get a replacement, although it's fiddly: the problem is that the designs are kept secret by the manufacturer, and no one wants to bother with the extra work required to fix them. 
Specifically, a CFL works by two energy conversions, ingenious use of heat and pressure, and if you are lucky, ingenious and effective heat sinking.

To light a CFL, an electrical plasma is induced in a low-pressure gas (such as Neon) to produce photons (light).  The light's wavelength is determined by the makeup of the gas in the tube. A beautiful and well-written explanation, accompanied by images of spectra and lamps using the gasses adorn the wikipedia.org entry for "Gas-Discharge lamp".

Fluorescent lamps start with a low-pressure Nobel gas: Neon, Argon, Krypton or Xenon. (Radon is not used, because it is rare, and also radioactive.) A small amount of mercury is used in the tube, and the starting process vaporizes it so it can mix with the gas. The plasma formed when the electrons flow through the gas excites the gas and mercury vapor to produce ultra-violet (and some green) light. The ultra-violet light is mostly invisible to your eyes, so to provide more visible light, rare-earth phosphors, which coat the inside of the glass, accept the UV photons and emit the energy in other wavelengths.

The base of the CFL contains a line-power (120VAC)-to-DC converter. This powers an oscillator which produces a large voltage which drives the plasma formation. Once the plasma is formed, the circuitry changes to simply maintain the arc. As long as the power is on, the arc continues to excite the plasma, providing those initial photons. The phosphors are "passive": they don't require anything more than the photons from the plasma in order to make light. That keeps the outside of the tube fairly cool... but the extra work required to make the plasma and maintain it creates a lot of heat: this is trapped in that small white base, and shortens the life of the electrical components. If those components are cheap, it shortens the life of the bulb, considerably, over a similar set up that was vented or actively cooled.

The photons that the plasma produces are "discrete" wavelengths. That means they are identifiable by their wavelength, and don't have a large spread of numbers. (Can I make that any more opaque? I thought not. OK,  slight rabbit trail: The color of light relates to wavelength. A "pure" color, like green, may be a range of wavelengths, all happening at the same time. In such a case, we'd identify the wavelengths that make the color as something like "500-550nm" (a band of wavelengths which fill the space between 500 nanometer-long and 550 nanometer-long wavelengths). We might say "525nm green", but in reality, that particular wavelength might appear quite rarely. But from a mercury plasma, we can say "546.1nm" for the green line, and 435.8nm for the blue line: they are pretty exact! And between those lines might be other lines, but not bands. An incandescence produces bands, while fluorescence produces lines. The sun's light is continuous, with all the spaces filled with active wavelengths, but a fluorescent bulb's light is not: there are specific lines with big gaps between.)

The phosphors are chosen to respond to the discrete wavelengths that the mercury vapor produces, and to emit new colors in tight lines which your eye will combine to make "white" or "cool white" or "warm white" (etc.) But it is important to remember that you are not looking at a spread of wavelengths like the sun produces, but a handful of discrete lines. You can see this by looking at the reflection of fluorescent light on a  DVD, compared to the sun's light on the same surface: the tracks split the light, analogous to a prism, into component colors. Fluorescent reflections split into discrete colors, the sun provides a wash of colors, gradually changing from one to another.

*That 60-cycle-per-second variation is also pronounced in a single LED: it's the nature of the beast. A diode passes current in one direction only, so only the positive half of the cycle produces light. Now it is possible to wire two LEDs butt-to-face, and then each one produces light for each half cycle, but there's a dead spot when the incoming voltage is between the turn-on-voltage of one LED and the turn-on-voltage of the other. That means that there are sixty very short periods of zero light output, compared to the CFL's longer period of darkness, which is smoothed over by the Phosphors slow response...if they are formulated properly. And, in fact (and this footnote is being written in 2023), LED lights are producing much smoother (no dark-dropouts) output using phosphors themselves, and are producing much less UV output than they did when this article was first written.

Friday, July 15, 2011

Only the government can save us, part 1

OK, all you die-hard liberals might as well just skip this series. You won't agree with anything I say, it'll just give you ulcers, you don't need that. Go have a beer with barry and chill, and wait a while, and I'll be on to something maybe you can read without imploding.

For the rest of us:

Why the Government can't think its way out of a hole it dug itself.

Two days ago (as of writing) Congress failed to reverse an idiot decision. The decision was to outlaw the production and sale of 'inefficient' lighting sources, like Incandescent bulbs.

Instead, they want us to use 'Green' sources, like CFLs. In fact, when they're done, it's likely that the only light sources we can have will be LED-based and CFLs or long-tube fluorescents. Why is this bad? After all, everyone knows that CFLs are far more efficient than incandescents, have a longer lifetime, and will cost less once the government is done ensuring that they have no competition. And no one cares about LEDs, because they only put light out in a fairly directional cone, so any general-purpose light will have to have lots of LEDs in it, and that's gotta be expensive. (Actually, no sarcasm there.)

Well, it's all a scam, and it isn't hard to follow the money. CFLs are produced in significant numbers only in one country: China. And since every business in China belongs to the government (and because the only real discipline in China seems to be in the Red Army, guess who gets the money you spend on CFLs?) But then, it has been the US's pleasure to give money to China, hand-over-fist, since the Clinton administration (when Billy Clinton handed Red China all the scientists and science at a major American satellite company, what looked on the outside like a business deal to get work for the sat company, but really amounted to handing Red China the navigation and targeting technology developed in the US to use for ICBMs aimed at America. But I digress.)

So what's so wrong with CFL's?

First, they aren't as efficient as we're told, because the measurement system used to determine the efficiency of light bulbs is a) a carefully guarded secret, apparently even from the bulb testers b) very selective in what is measured and calculated on, and c) hag-ridden by politics. The first means that you can look at seven boxes with edison-base incandescent light bulbs and find seven different ways of telling you how much power they consume, and how much light you get from them, and never find a way to make a coherent comparison. At this point in time, almost all industry-labeling for lightbulbs is meaningless drivel.

The second means that there is no level playing field at all. For instance, the power labeling on CFLs utterly ignores the heat generated from the base, while the incandescent has all of its functional bits in the glass envelope, so all the heat generated comes right from where it is measured. The obvious comparison is how much light you get, compared to how much heat you emit. Ignoring the heat generated by the converters and ballast in the base of CFLs makes them look fantastic: 7.5W for the same light as you'd get from a 60W bulb! (Of course, if you remember the first characteristic of bulb efficiency measurements, it also means that you can't actually prove that any of those 60-watt-equivalent CFL's produces 60W of anything!)

And of course, the third means that no matter how bad CFLs are, because our current government claims that they're better, you'd better accept that they're better, even if they aren't.

Stay tuned...

Saturday, February 6, 2010

Where did I go...

OK, it's been four months and I've been silent. What could have happened?

Well, on October 1, 2009, I fell and broke and dislocated both my elbows. The breakage wasn't the usual sort of thing, a break across the bone or a greenstick fracture. Rather, it was a breaching of the cup-wall where each ulna fit into its humerus, where the front-side was knocked off. That meant that it was hard to reduce the dislocations (in fact, the ER physician reset the right one three times, and it just redislocated again in the cast!) Finally, the Orthopedic Surgeon stuck me in an operating theatre and, using x-rays and the modern update of the fluoroscope, set both arms and got successful casts on them.

From there, a week saw me in articulated braces which prevented me from straightening my arms. With the threat of "hearing pop-pop and being dislocated again," I was in no danger of trying to straighten them anyway! Now, after 3 months, my bones are certified as healed, I'm out of braces, and there's nothing left but the soft-tissue recovery. This is tendons and integuments which were stressed pretty badly. I'm told that I can expect them to be problematic for about a year, and painful for not-a-year. At the moment, I make a fine barometer who cannot turn doorknobs.

The best therapy turned out to be playing musical instruments, though, so my wife and I have been playing through all the bass-clef and bass/tenor-clef music in our library, she on cello and I on bass viol. I've done a few gigs on the six-string electric bass (which must be great exercise, 'cause it hurts for days after!), and I've actually practiced recorder a few times. So recovery proceeds apace.

My son and I finished our first semester, in my case with one withdrawal, one very bad grade (C- for advanced Visual Basic...is it really necessary to keep the available parameters of controls such a closely guarded secret?) and an A in Java. Now we're in our second semester, and have three classes together: Intermediate C++, Info Security and Comms and Networking.

And, with work, I'll carve part of my time out to post here on a more regular basis.

Sunday, October 18, 2009

What is the piano doing? (Posted in reverse order) 1

I'm posting the two halves of this explanation in reverse order because, when I view my posts, more recent ones come up above later ones. If your display is the other way around, I apologize!

The piano voice synthesizer, Part I

So, what is happening here?

First a detour.

There are a number of ways to synthesize speech, but most of them involve determining a set of characteristics of speech, then build a mechanism to reproduce that kind of sound. Experiments in this direction are not new: in 1779, one C.G.Kratzenstein at the Imperial Academy in St. Petersburg constructed a device which generated vowel sounds by blowing through a reed into chambers shaped like a vocal tract.

These approaches all worked from the standpoint of analyzing the existing system and building mechanical analogs. One such analog is on display at the Exploratorium in San Francisco, CA, and a write up on it, with pictures, can be seen here: http://www.exploratorium.edu/exhibits/vocal_vowels/vocal_vowels.html

Samuel Morse is supposed to have been able to form vocal sounds with his hands, and used the ability to prank friends and adults. (I don't remember, honestly, if that story is supposed to be true, but it goes on to say that when he was suffering some rather painful dentistry, he used it to tell the doctor to lighten up a little!)

There are other approaches which are valid: synthesize part of the system, then let the remainder of the system be used to provide its normal function. Witness the 'Talk Box'. Whether you are more familiar with the guitar antics of Peter Frampton (or piano antics of Stevie Wonder), or the animated Casey Junior, the engine that pulled the circus train in Disney's classic Dumbo, you've heard this: a sound source is captured and applied to the vocal tract, and the vocalist merely moves his mouth and oral cavity as they would for speaking. With the talk-box, a tube leads the sound of an amplifier to the player's mouth, with the Sonovox (used for Casey Jones and numerous interesting commercials through the 60's), a pair of audio transducers are pressed lightly against the neck of the performer. In either case, the effect is the same: the vocal chords are replaced in function by another sound source, and the oral cavity, lips, tongue and teeth are employed as for normal sound.

In each case, of course, the effort is to reproduce the physical action and the acoustical modifiers used in human vocal production.

Electronic efforts to reproduce vocal characteristics are more recent (as is electronics more recent than mechanics!) The easiest of these is the recorder-reproducer, where a human speaks and the sound pressure wave from their voice is recorded electronically, whether on magnetic tape, in vinyl (and originally, recording to wax disks were totally mechanical), or in digital numbers, which themselves are recorded or stored. For playback, the recording is processed through an opposite process which takes the stored numbers or signals and turns them back into audible sound. In this case, the recording captures all the information and reproduces most of it, with some attendent noise. However, you can't record a woman saying "He saw the cat," and play back a man saying "It's a Rolex!" The playback is what was recorded. (This leaves out a whole branch of electronic music, where recorded sounds are distorted, reversed, stook on their heads and severely beaten, or simply chopped up and re-ordered. That's because the discussion of recording/reproducing is but a step to a discussion of synthesis of speech, so please, let's not get off the track!)

One of the earliest efforts to reproduce the human voice electronically was the Voder of Henry Dudley. This machine had multiple keys and footpedals, each assigned to a certain aspect of the electronic vocal model. For instance, there were keys to produce the gutterals, frickatives and pops produced by the tongue and lips for hard consonants. There where hiss generators which provided the SH and S sound, and which could be mixed into the sound of the previous consonant sounds, or vowels for voiced consonantals. And there were a set of "formant filters", which could be engaged by different amounts depending on the pressure of the performer's hands. And performer it was (and most often "she" was, since women were almost exclusively trained to operate the Voder.) The Voder was used in a great hall at the 1939 Worlds Fair in NYC, and received rave reviews, but little came of it afterwards, probably because of the difficulty involved in operating it.

The Formant Filters are important. These are an electrical analog to the resonant characteristics of the human vocal tract in certain configurations. Generally, three formant filters are enough to make recognizeable vowel sounds: they are tuned, one above the excitation pitch, the next tuned higher, and the third tuned higher. By controlling how much sound they let through in those ranges, and using a sound source which is rich in content in those ranges, the formant filters do just what the vocal tract and sinuses do to the complex sound coming from the vocal chords: it carves them away until they sound like... well... vowels!

When I was in 7th grade, I found a Bell Labs kit in the classroom, and talked the teacher into letting me take it home and build it. My father helped me (a lot: he was a TV repair technician, and understood what the instructions said. It was a _real_ learning experience for me!) The result was an electrical circuit built on the back of a box, fed from a sound generator with a control voltage that made it's pitch rise and fall. The generator turned out a sawtooth wave (very rich in harmonic content) and it fed through three formant filters, formed by capacitors and inductors. We could change the formant filter pitches by changing the capacitors, and change their strength by changing resistors, and we got it to say "ahhhh" easily, and "eeeeeee" and even long "o", but getting the long "u" (or "oo" really) was very difficult: the filters got so strong that we couldn't get enough sound out to hear it!

I'm going to require that you retain this last paragraph's information for the next post when I get back to the piano voice synthesizer, so maybe you want to go back and re-read it: three formant filters, which could have their frequency (pitch) and strength (Q is the official term, but you could think upside down and use the term damping as easily) adjusted, and a sound source that provided lots of rich components (harmonics), and which could have its pitch varied to lend a sense of emphasis. There was no effort made at consonants in this box, just vowels. For all intents and purposes, it acted just like the artifical vocal tracts shown on the Exploratorium page above!

This is a good place to stop, until the next post.

What is the piano doing? (Posted in reverse order)

I'm posting the two halves of this explanation in reverse order because, when I view my posts, more recent ones come up above later ones. If your display is the other way around, I apologize!

The piano voice synthesizer, Part II

Back to the background: there are two more things to consider before we get to the piano and what it is doing. They are the Vocoder and the Sonogram.

There is a special case of the formant-filter electronic voice synthesizer called the Vocoder. This interesting machine works in the conceptual gap between the formant synthesizer and the recorder-reproducer, and understanding it really is key to understanding the piano.

The vocoder is called vocoder because it both codes the voice and produces vocals. One side at a time: to record, the human voice is presented to the vocoder, which is a bank of fixed filters. Each filter carves off a part of the vocal signal, based on frequency. If you've ever seen a graphic equalizer (and you probably have: rows of sliders with cryptic labels like "125hz" and "250hz" and "500hz" over each one, which you can shape sound electronically: lift a few of the sliders that are close together, and the sound in that part of the audio range is increased: do it to sliders to the right, high frequency sounds increase (and hiss!), do it to sliders on the left, and bass sounds are increased (and maybe thumps and booms, too!) The fun of graphic equalizers is that they slice up the audio band from low pitches to high pitches and make control of those slices easier!) The vocoder's input doesn't merely change the incoming sound, it records the amount of power in each slice. Then, the reproducer uses that information to control the strength of power allowed in each slice in another bank of filters. If you apply a sound source that is appropriately like the vocal chords, then the output sound is identical to the input sound.

Before we go on, lets review what we have: on the input side, we have a bank of filters which _analyze_ the sound into bands. Each band is associated with a slice of the frequency spectrum from low to high. The numbers that come out of each band tell how much power is in that band. If the power increases, the numbers get bigger. If it decreases, the numbers are smaller. These numbers, when applied to a similar bank of filters that can be controlled, will make the same amount of power 'be allowed' in each filter, and those filters will act on a rich source of sound to produce an output like the input. We call the numbers 'coefficients', which isn't really kind to the numbers or us, but sound people are like that.

Now. What if we put something different into the input. Use a guitar: It's just like a talk-box: the filters act like an acoustical resonator and carve away the sound until it sounds like the guitar is talking! Use a woman's voice, and shift the coefficients so they feed higher-frequency filters than originally, and she can go "eeeeee" and out comes the words a man said at the input side! Substitute a musical synthesizer with a good, buzzy output, and make the man sing!

The nice thing about the vocoder is that you can connect the output of the analyzing filters to the analogous controllable output filters, and play music into the output filter while talking into the input filter, and the music comes out with words! Vocoders are used a lot in the entertainment business now, and digital versions of them are so sophistocated that they can be used to correct the pitch of a singer or add other voices in harmony with a singer, using the same enunciation and expression!

OK, now that the concept of filters, coefficients and power-in-a-band (of frequencies) are established, one more thing:

The sonogram is a picture of sound. Sound is complex in many ways: First of all, it is dependent on volume, pitch (frequency), and time. Specifically, if you remember the vocoder: each filter band has a frequency that it is active for: it has no response to unrelated frequencies. When a sound enters the filter band that it responds to, the level of output depends on the volume of the incoming sound: if it increases, the signal telling how much power is in the band increases, etc. And this varying happens with time. The fact that we can analyze the voice with a set of filters like we do with the vocoder means that we can just as easily do it with more filters (or less: the first vocoder only used five filters on each end!)

The sonogram, conceptually, is a display of many filters, shown with low frequencies lower and high frequencies higher. Time is shown from left (history) to right (more recent history). And volume is shown by the color of each point on the sonogram. Where no sound happens, no color happens. If the sonogram is "black and white", then it'll be black where there is no sound and get lighter as sound at that frequency increases. If a sound starts very quiet and grows loud, then dies away, but stays at one frequency, the sonogram will show one horizontal line which starts very dark, rises to a level of whiteness, then dims back to darkness. If a sound (like a drum-stroke) produces many frequencies all at once, but not for a long time, the sonogram will have a single vertical line, probably very light. Usually "false color" is used, say, blue for very quiet sounds, green for middle-loud sounds, and red for very loud sounds. And, of course, the filters are not perfect, so sound from adjoining bands "leak" into neighboring filters, so a person saying "bOOp" (which is very pure, i.e., may show up only at one frequency) may appear as a round spot with a vertical line at the start (for the b) and another at the end (for the p).

The sonogram is a way to see sound. It has been used in various forms, both analog and digital, for a century to analyze speech and as a tool to train speakers, as well as a way to analyze musical and other sounds.

If each spot on a sonogram could be tied to the control of a filter tuned to exactly that frequency, it could be used like the Vocoder to impress recorded speech into other sounds. In this case, we're using filters to change the shape of the sound that goes through them, and providing a single complex sound for those filters to carve up.

So what is the piano doing?

The piano voice synthesizer

Remember that the sonogram is a map of the acoustic energy present at each point in time, showing its frequency by its vertical placement, and its power by its color. What is analogous to this?

The player piano is a device that uses vacuum to actuate the hammers of a piano. The actuators are controlled by a mechanism with a hole for each hammer's actuator, over which a roll of paper with holes is passed. Where the holes pass over a hole in the reader, the pressure drops, a hammer is launched against the strings, and a note happens. If you take the piano roll and hold it with the beginning to the left and unroll it a bit, you can actually see the notes of the introduction, now going "up" as higher strings are sounded, now going left as lower notes are played.

This is a very "binary" approach: a hammer is actuated, or it is not. Later player piano roll systems added additional rows of holes to control volume, and some even added rows for speed. All in all, the system is very understandable: the sonogram is very similar.

But what does the sonogram _actually_ portray? Each spot, looked at vertically, indicates a frequency, around which some acoustic energy is present. And taken horizontally, each spot is a moment in time. The sonogram is a player piano roll for speech.

Conceptually, there is only one thing left to discuss: what goes on within each spot, and how closely must that be recreated to produce recognizeable speech?

The answer may be amazing to you. In actual fact, if there are enough 'bins', which are the filter bands, and they change quickly enough, the actual sound in the band doesn't need to be terribly like the original sound in the band at all!

This is why the vocoder works so nicely: as long as the formants (the output filters) are tracking the analyzers (input filters), you can let a moose bellow into the mike, and get out speech! Or, you could play harpsichord chords (Or as Don Dorsey did for his post 1977-versions of the "Disney's Electrical MainStreet Parade", synthesizer fanfares) and out will come sung words in harmonies!

So. What if we use a sonogram with 88 bins. Each bin is tuned to the same frequency as a piano. The piano has 88 keys, tuned to 88 individual pitches. If the bins are made moderately tight, then the amount of sound power in each bin represents the amount of power in frequencies close to the associated piano note. What, then, if the sonogram is treated like a piano roll, and the strength at each moment in time of each spot on the sonogram is used as the impulse energy for the associated key/hammer of the piano?

And the answer is what you see in that video of the piano speaking!

There is no need for electronics: all it would do is vary the volume of the sound from the piano at specific frequencies to impress the voice shape on the piano sound: instead, by controlling the strength with which each key is actuated, you get the same effect! Over all, the ear knits the result into one sound, which your brain can (at least after it gets the hint) interpret as speech.

And that's all there is to it!

Thursday, October 8, 2009

Wow... this leaves me speechless!

Ok, it doesn't leave me incapable of saying anything! Dig this:

Make.Blog post about talking piano

Now, what exactly are we seeing? I am fairly sure that the voiced sounds are 100% piano sounds, i.e. mechanically produced. No synthesizers, no electronic sound sources, no electronic sound modification like filters, ring modulators, ADSRs, etc.

The piano speaks.

I'll write another post describing what I think is happening and how this relates to the world of speech synthesis after my arms stop hurting, though.

For now, watch, listen, and maybe go "wow" like I did.