An Ice Storm or a Nice Storm?

March 5, 2015 HHF Intern

By Kathi Mestayer

When we get fuzzy speech data to work with, such as in music lyrics, noisy settings or, if you have a hearing loss, everywhere, our brains can come up with some pretty silly interpretations, like “I led the pigeons to the Flag…” instead of “I pledge allegiance to the Flag …”

But even when we hear correctly, that same brain can play it back to us more than one way, with more than one meaning. Kind of like an optical illusion…but with sound.

A string of words that has multiple interpretations is called an oronym. Like “I scream, you scream, we all scream for ice cream.” You can interpret it either way, depending on the context, the way it is said, or the phase of the moon.

Why?

In his book, “The Language Instinct,” the Harvard psychologist and linguist Steven Pinker tells us: “In the speech sound wave, one word runs into the next seamlessly; there are no little silences between spoken words the way there are white spaces between written words. We simply hallucinate word boundaries when we reach the edge of a stretch of sound that matches some entry in our mental dictionary.”

So, if our mental dictionary contains more than one match for what we hear, well, we can hear it both ways. In fact, the sound input doesn’t even have to be speech for our brains to have a crack at it. Pinker explains: “The brain can hear speech content in sounds that have only the remotest resemblance to speech.”

So, we will superimpose meaning onto just about anything, and if it doesn’t make sense, we just keep trying until something fits. Kind of like when my brain heard baroque music coming from the vacuum cleaner, or the countless phrases I swear my parrot says (everything from “ashram” to “wiki” with “kabuki” and “Nietzsche” and many more in between).

While our brains are busy riffing on what we hear, programmers and engineers are working hard trying to create devices that can interpret speech, even passably, well. One application is online captioning, or what I call “robo-captions.”

The results are not particularly impressive so far, but our brains are a tough act to follow. According to Pinker, “No human-made system can match a human in decoding speech.” That fact was brought home to me when my friend complained about the speech-activated calling system in her car. “It can only interpret the numbers if I read them without any pauses. If I pause, for just a second, the computer inserts the number eight,” she says.

Online captions created by speech-interpretation software are particularly bad at it (the real-time captions on TV are much better). You can go here to get a quick chuckle.

Staff writer Kathi Mestayer serves on advisory boards for the Virginia Department for the Deaf and Hard of Hearing and the Greater Richmond, Virginia, chapter of the Hearing Loss Association of America. This is adapted from her reader-sponsored work, “Be Hear Now.”