Spontaneous formant tracking: A point of logic, Part 1

Formant tracking is simply adjusting the vocal tract to a resonant state by attempting to match the frequency of one of the vowel formants to one of the harmonics of the sung pitch.

First, I will define the elements that we are trying to match.

The source: The sung pitch.

When we speak of pitch, we are not only speaking about the fundamental frequency that we are trying to sing, but the fundamental and its endless set of overtones. The overtones are multiples of the fundamental. For example, the pitch G2 (bass low G) is 100 Hz (or 100 vibrations per second). Its overtones would be 200, 300, 400, 500, 600, etc…

The fundamental is labeled F0 traditionally. It is also called the first harmonic or H1. The overtone above it is H2, the next one is H3 and so on.

The filter: The vowel (vocal tract)

Vowels can be defined by their first two formants, which correspond to the vocal tract divided by the tongue:


I altered the picture above to have the formant separation reflected. The red area is the first formant space (F1) and the white is the second formant space (F2). A picture of all the vowels can be seen here. I copied the page and include the picture here below so that you do not have to navigate away from the blog page:


I also include here the formant frequency values. Directly below is what is referred to as formant centers. These are the values given to the frequency pairs that give the most common version (some might say the purest form) of the vowel. They are color coordinated by vowel. The smooth column represents the F1 values and the dotted column represents the F2 values.


However, there is a wide latitude as to what the human ear will hear as a specific vowel in context. The following chart found at the National Center for Voice and Speech is a standard chart that has been used for years. It also shows were certain vowels intersect, that is where a formant pair might be perceived as two different vowels depending on context.


What we are attempting to do is to alter the size of the formant spaces to match the frequency of one of the harmonics of the sung pitch. This is the concept of vowel modification. The following chart which I’ve developed over the last year (there is one for each vowel spectrum, e.g. [o to u], [i to e], [E to ae]) shows the male passaggio up to tenor high B which corresponds to the female first passaggio and middle voice.


A larger version of this file can be found in PDF form here.

This chart tells us a lot. Before I go further, I must add that vowel modification (formant tracking) is most important where a precarious balance between the cricothyroid muscle and the vocalis muscle exists. Read the next blog entry which should appear in a couple of days for more details. For our purposes let us assume the muscular balance is not precarious.

The [a] vowel is colored dark blue on the chart. What is immediately noteworthy is that we do not see a lot of dark blue on the chart. That means that in the male passaggio and upper range and in the female first passaggio and middle range, a pure form of the [a] vowel is not the best choice. Keep in mind that neighboring vowels will sound like [a] in context when phonation is efficient (i.e. the quality of the vowel is strongly dependent upon a good phonation mode).

The problem in this range is a bigger problem for men than for women. We will discuss the female difficult range a bit later.

For men it is important to know that the range between C4-B4 is not only a problem of resonance (formant tracking) but one of muscular balance.

Many professional singers develop the crico-thyroid dominance that makes it possible to stretch the vocal folds for high notes regardless of resonance adjustments. Therefore, in the best case scenario, resonance tracking is a point of refinement. However, because accurate resonance adjustment takes pressure off of the vocal folds (see previous post on Inertial Reactance), a singer who experiences a precarious muscular balance would benefit from exact formant tracking. Most certainly a singer who has great muscular balance in general becomes a great singer when formant tracking is added into the mix. The voice would sound more consistently resonant and richer.

In the range between C4 and G4 the issues are different depending on whether one is a bass, baritone or tenor. This is because basses, baritones and tenors reach the muscular threshold at different point. The muscular change has been dealt with concurrently with the acoustic (resonance) issue. One of the points of this post is that formant changes sometimes occur before the muscular threshold. Let us take the different archetypal male voice parts one at a time:

1. Basso: Let us say that the muscular change (from vocalis dominance in the low and middle range to crico-thyroid dominance in the upper range) occurs on C4#. A basso will begin to feel the muscular tension around B3b or even A3. The question is whether at that point, the basso should try to access F1 tuning or F2. This brings us to how the singer alters the formant frequencies to tune to F1 or F2. The following rules come from the National Center for Voice and Speech and are scientifically proven and accepted as standard by the voice science community. more details on these rules can be found on the link directly above.

Four Rules for Modifying Vowels

1. All formant frequencies decrease uniformly as the length of the vocal tract increases.

The vocal tract length increases when the larynx lowers.

2. All formant frequencies decrease uniformly with lip rounding and increase with lip spreading.

Lip rounding and lip trumpeting have the same effect (see details on the NCVS page)

3. A mouth constriction lowers the first formant and raises the second formant.

This includes the raising of the tongue principally as in going from the [a] to the [i] vowel whereby the space below the tongue increases (lowering the pitch. Larger spaces have lower pitch) and the space above decreases (raising the pitch. Smaller spaces have higher pitch).

4. A pharyngeal constriction raises the first formant and lowers the second formant.

The reverse of number 3.

In order to follow these rules, we must establish what the default position of the vocal tract should be. I proceed from the following:

The larynx cannot fall to its naturally low position without the jaw being released. The laryngeal position that produces accurately resonance notes in the speaking range (male between 110 and 150 Hz and women between 220 and 260 Hz) should be the default. Therefore:

1. The larynx should maintain that basic low position.
2. The jaw should always return to the [a] position and the tongue and lips should articulate for all changes (consonants and vowels).

If the jaw had to close for vowels and the larynx had to rise, the variables would be too many and since both would narrow and shorten the larynx, the voice would take a thinner quality.

Given the parameters that I have established, the rest is a matter of logic. Let us continue with our basso on the [a] vowel:

C4 is an interesting note. The choice is either to raise F1 (the [a] vowel is the closest) or to round to [Ɔ] (as in “fort”) to access F2. F2 is a better choice for the basso, but he is still in his lower register muscularly and might feel more “natural” (more speech-like) to sing the [a] vowel although the resonance might be imprecise and cause the tone to spread. C# fits the [a] vowel perfectly on F2. This is a moment where an F2 change might be better as mentioned earlier even though the muscular is borderline and the singer would still feel comfortable singing the more speech-like F1 resoanance. The vowel of the word “up” fits nicely to continue the second formant tracking through D4 and Eb4. Remember that to keep first formant tracking the lower space must become smaller. This means that the larynx may rise when tongue migration and lip spreading is not enough to accomplish the frequency rise. If the larynx is kept low during this change, it becomes difficult to match the formant with the harmonic in question (which is fixed with a given pitch). Some bassos are able to let the larynx rise and keep a first formant resonance, but they will lose the darker color, which is a basic characteristic of the basso voice type.

E3b is an interesting note. The same vowel (of the word “up”) matches both formants. Some basses I hear round the vowel slightly which lowers both formants. This discourages F2, which needs to rise from its center (1180) to meet the 4th harmonic (1264) and lowers F1 from the center (640) to meet the 2nd harmonic (632). There are more options. It is possible that the singer might track the second formant of the schwa, [Ə] (1450), and by rounding bring it closer to the 4th harmonic (1264). It might be less exact but would diminish the competition from F1 since the first formant of the [Ə] (430) is out of range (too distant from the two relevant harmonics: H1 (316) and H2 (612).

In truth, either choice is possible. This is to say that while formant tracking makes for a more continually resonant quality, second formant dominance in the upper range is not as crucial for basses as it would be for baritones or tenors, unless the basso is singing F4-G4. The following graph (large version here) shows the E4b (D4#) as sung by two great basses, Nicolai Ghiaurov and Jerome Hines at “passar nelle tue tasche…” in the famous “coat aria” from Puccini’s la Bohème.


The top spectrogram (using the Voce Vista software) is Ghiaurov and the lower is Hines. The green cursor goes through the dominant formant, F2 for Ghiaurov and F1 for Hines. Ghiaurov is clearly dominant on F2 which essentially means that most of the energy of the the sound gathers on the 3rd harmonic (H3). This acoustic “focus” has the positive characteristic of clearly delineating the harmonics and help increase energy in them and in the singer’s formant range (between the two orange lines). The singer’s formant range is the most sensitive acoustic range for the human ear. The Hines’ spectrogram in this instance is less efficient. The acoustic energy is not only spread between F1 and F2 (although they look nearly equal, F1 is slightly stronger) but F1 lies between the fundamental (F0) and the second harmonic (H2) causing another spread of energy between those two harmonics. In short the energy is spread between the first three harmonics, which seems to weaken energy in the singer’s formant range. The two Youtube.com clips in question are found below.

Both singers sound wonderful. However it seems my analysis of the chart bears out. Hines seems to favor a rather extremely low larynx. If the larynx was deeper than natural overall, the first formant would be tracked on D4# because the lowered larynx would lower both formants. That slightly depressed larynx it seems was enough to cause some balance problems between the two formants and suppress the strength of the harmonics to the point where there exact frequency is difficult to ascertain at sight. In other words, the acoustic adjustment of the vocal tract is out of phase with the frequencies of the natural harmonic series of the sung pitch.

The main point of this article is that maintaining a naturally low larynx promotes a natural transition from F1 dominance to F2 dominance. The next article will continue this discussion with analysis of the same range relative to baritones, tenors and female voices. I will also discuss the next octave which will deal with the female top voice.

© 09/23/2008 (Date of publication)