The Transient Theory of Voice Production

The study of voice acoustics arguably begins in 1727 when Leonhard Euler proposed the transient theory of voice production.^[1] Chen and Miller (2019) summarize this theory:

"To start, the vocal folds emit an impulse. The impulse triggers a transient response in the vocal tract to produce a decaying wave. A series of impulses produces a series of decaying waves. The superposition [overlap] of those decaying waves makes voiced sound."^[2]

In 1807, Joseph Fourier created the Fourier transform theory, which allows one to represent complex patterns as a series of functions of specific frequencies and amplitudes.^[3] Building on this idea, Wheatstone proposed the steady state theory (harmonic) of voice production in 1837. Chen and Miller continue the story:

"…Wheatstone proposed a simpler version of voice production theory. If the repetition rate of the impulses [from the closing of the vocal folds] is a constant, a fundamental frequency can be defined. Since the voice signal is then strictly periodic, Fourier analysis can be applied to compute the overtones. Timbre can be expressed as the intensity envelope of overtones. In 1865, H. Helmholtz systematically presented the steady-state theory.^[4] After the availability of digital computers in the 1960s, a version of the steady-state theory, the source-filter theory, was published by G. Fant,^[5] which enables a straightforward digital processing of speech signals on computers…^[6]"

Ninety-two years after Wheatstone’s proposal, Fletcher (1929) characterizes these two different theories of voice production in similar terms. His nearly century old excerpt should have a place in any discussion of the models used to understand voice acoustics:

"According to …[Wheatstone’s 1837] theory, the vocal cords [sic.] generate a complex wave having a fundamental and a large number of harmonics. The component frequencies are all exact multiples of the fundamental… When these waves pass through the throat, the mouth, and the nasal cavities, those frequencies near the resonant frequencies of these cavities are radiated into the air very much magnified, the amount depending upon the damping constant of the cavity [how quickly the oscillating energy returns to baseline]. These reinforced frequency regions determine the vowel quality."^[7]

Fletcher continues:

"According to the inharmonic (transient) theory of Willis (1829) and Herman… the vocal cords [sic.] act only as an agent for exciting the transient frequencies which are characteristic of the vocal cavities."^[7]

According to the transient theory, every time the vocal folds close an impulse of energy is introduced into the vocal tract. As that pressure wave propagates through that air mass, the resonances of the vocal tract are excited, oscillate, and decay over time. Imagine popping a balloon in a cathedral. The pop—sudden, temporary, and broadband (all frequencies)—is the contribution of the vocal folds per glottal cycle. The reverberation of that pop in the cathedral—complex, longer lasting, decreasing over time, and characterized by just a few dominant frequency regions—is the resonant response of the vocal tract per glottal cycle. In this model, a resonance does not filter continuous harmonic energy. A resonance in this context is the way that a mass (in this case, the air in the vocal tract) continues to vibrate when a burst of energy is suddenly introduced. Fletcher (note that he states that the puff of transglottal flow when the folds open is the sound source per cycle, which was the prevailing theory at the time) continues:

"A puff of air from the glottis [sic.] sets the air in these cavities into vibration. This vibration soon diminishes until it is started anew by a second puff [sic.]... An examination of the records of speech sounds shows that this is true. The different waves succeed each other quite regularly… [an argument for the harmonic theory]. On the other hand this examination also supports the view that these regular puffs do excite the transients of the mouth and throat cavities, for the amplitudes are large at the beginning of the wave and gradually die away toward the end… [an argument for the transient theory]. When the pitch is high, the natural vibrations do not have time to die down before another pulse sets them going again [emphasis added]."^[7]

This last sentence is one of the most important concepts the voice pedagogy community could consider incorporating into the way we introduce and discuss singing voice acoustics. It is obfuscated by the harmonic theory and illustrated by the transient theory. Since the resonant response of the vocal tract dies down over time, and the vocal folds close and open more or less frequently as pitch changes, it is possible for the resonant response of the vocal tract to remain active and interact with pressure and flow dynamics above the vocal folds. This points to the idea that resonance in the vocal tract is a real, physical force consisting of high amplitude oscillations between low- and high-pressure peaks. In fact, the power of that interaction may be a reasonable way to frame some of the most important ideas in singing voice acoustics (e.g., in what pitch ranges and for what vowels are interactions between vocal tract shape and phonation quality most relevant and why is the acoustic landscape for treble voices different from non-treble voices). What we currently identify as interactions between harmonics and resonances may instead be described in terms of the alignment of ongoing physical oscillations in the vocal tract with glottal closing events.

As voice pedagogy teachers consider how to introduce voice acoustics to their voice pedagogy students, the history of the transient theory may provide a useful framework to understand the link between observable acoustical events and the physical phenomenon of voice production.

Resources

3D view of diaphragm | by www.3dyoga.com | Video

↑ Euler, Leonhard (1727). Dissertation physical de sono. Euler Archive.
↑ Chen, Julian (January 16, 2019). "Pitch-Synchronous Analysis of Human Voice". Journal of Voice.
↑ Fourier, Joseph B.J. (1807). Théorie de la propagation de la chaleur dans les solides. Manuscript.
↑ Helmholtz, HLF (1863). On the Sensations of Tone as a Physiological Basis for the Theory of Music. Dover.
↑ Fant, Gunnar (1970). Acoustic Theory of Speech Production. Mouton De Gruyer.
↑ Chen. "Pitch-Synchronous Analysis": 1. {{cite journal}}: Cite journal requires |journal= (help)
↑ ^7.0 ^7.1 ^7.2 Fletcher, Harvey (1929). Speech and Hearing. Nostrand. pp. 46–47.

Authored by: Ian Lauchlin Howell

[1] Euler, Leonhard (1727). Dissertation physical de sono. Euler Archive.

[2] Chen, Julian (January 16, 2019). "Pitch-Synchronous Analysis of Human Voice". Journal of Voice.

[3] Fourier, Joseph B.J. (1807). Théorie de la propagation de la chaleur dans les solides. Manuscript.

[4] Helmholtz, HLF (1863). On the Sensations of Tone as a Physiological Basis for the Theory of Music. Dover.

[5] Fant, Gunnar (1970). Acoustic Theory of Speech Production. Mouton De Gruyer.

[6] Chen. "Pitch-Synchronous Analysis": 1. {{cite journal}}: Cite journal requires |journal= (help)

[:0-7] 7.0 ^7.1 ^7.2 Fletcher, Harvey (1929). Speech and Hearing. Nostrand. pp. 46–47.

[1]

[2]

[3]

[4]

[5]

[6]

[7]