The exact title of my PhD was:
Three-dimensional modelling of the speech organs from MRI images for nasals production - Articulatory-acoustic characterization of the velum movements.
In the general framework of virtual talking head development, this works fits onto the study of nasality in speech production. Nasal phonemes, presents in around 99% of the world languages, are produced by a lowering of the velum which allows air flow and propagation wave to come out through the nasal tract. This simple gesture has complex acoustic effects.
In this contaxt, the objective of my work was:
1. to build a 3D data-based articulatory model of the nasal tract
2. to characterize acoustically the movements of the velum
3D articulatory-acoustic model of the nasal tract
The model charecteristics were:
- Fonctional: In opposition to geometrical or biomechanical models, fonctional models intends to make emerge from data the movements and the shapes of the articulators. It has the drawback to be dependant of the recording task, but the advantage to avoid introduction of a priori movements and shapes in the model.
- Three-dimensional: In order to fully describe and understand nasal tract and gestures, the model has been developped in 3D.
- Organ-based: The organs, or articulators, are considered to be the unit elements of the vocal tract. For an accurate modelling, the organs, in opposition to the tract, are thus independently modelled (and more precisely their shape).
- Single subject: As a first attempt to develop such a model, only one subject has been considered in the study.
Briefly, the nasal tract can be anatomically divided into two regions:
1. the posterior region, from the connection point with the oral tract to the seperation point of the tract into two choanae, is composed of the velopharyngeal port and the cavum and is surrounded of deformable tissues, that is the velum and the pharyngeal walls.
2. the anterior region, from the seperation point of the tract into two choanae to the nostril outlet, is composed of the two undeformable nasal passages. To these passages are moreover connected various paranasal sinuses.
The
anterior region,
i.e. the nasal passages, has been precisely described in 3D:
The
posterior region,
i.e. the velopharyngeal port and the cavum, has also been precisely described in 3D. More precisely, the articulatory parameters corresponding to the articulators surrounding the velopharyngeal port,
i.e. the velum and the pharyngeal wall, have been uncovered from the data. Two parameters, corresponding to two degrees of freedom, are necessary to describe the movements of the velum. The first parameter, VL, explains alone until 83% of the velum variance while the second parameter, VS, explains 6%, raising the total variance explanation close to 90%. Note interestingly that the first parameter VL controls also a small movement of the pharyngeal wall, leading to a combined movement velum-pharynx that must correspond to the Passavant's Pad.
Two videos corresponding to each of the
two articulatory parameters of the velopharyngeal port are shown below. The left figures represent a 3/4 anterior view of the velum, the lower figure representing only an half side (3/4 posterior view for the upper-left figure of VS); the upper-right figure shows a midsagittal view (in blue the velum, in red the posterior pharyngeal wall) and the two lower-right figures display transverse views in the plans represented by two black lines on the upper-right figure.
Video 1 (*.avi) = VL parameter
Video 2 (*.avi) = VS parameter
In order to model realist movements, dynamic data (EMA recordings) have been used to control the articulatory model of the velopharyngeal port. Here is a video of the
synthesized velum movement corresponding to real movements of velum during the sequence [pε~pε~pε~bε~pε~mε~] (like in "pinpinpinbinpinmin" in French). The right figure represents a 3/4 anterior view of the velum; the left figure displays in the midsagittal plane the trajectory of the red point visible on the right figure, the background point cloud representing all the positions taken by this point during the recording of the dynamic data. Note that the sound is the real sound recorded and synchronized with the synthesized movement.
Video 3 (*.avi) = Synthesized velum movement
Acoustic characterization of the velum movements
These two parameters have been
acoustically characterized in the low frequencies using an electric analog planar wave acoustic propagation model. It has been showed that the acoustic effect of VL can be ascribed to the joint variation of the geometries of the oral and nasal tracts (and more precisely in the nasal tract of the velopharyngeal port geometry) while the acoustic effect of VS can be mainly ascribed to variation of the geometry of the nasal tract alone (of the velopharyngeal port as well).
Moreover, in the context of my position as assistant lecturer at ICP in 2007, the 3D nasal tract model (3D mesh of the velopharyngeal port and of the nasal passages) has been integrated into a complete 3D mesh of the vocal tract (see figure below). A comparative study between 2D and 3D acoustic propagation in the case of nasality as been started in collaboration with Hiroki Matsuzaki and Kunitoshi Motoki from Hokkai-Gakuen University (Japon).
Finally, again in the context of my position as assistant lecturer, the geometry and the acoustical effects of the sinus maxillaris and piriformis have been assessed.
For more details...
For more details about this work:
- Find here abstracts and keywords of my PhD in English and in French
- Download here
my PhD thesis manuscript (in color pdf) [Warning: French only, 24.2 MB]
- Download here
my PhD presentation (in color pdf) in English [Warning: 6 MB]
- Download here
my PhD presentation (in color pdf) in French [Warning: 6 MB]
- See my publications page
You can also visit
this page for a (quite old) modelling approach description.
In addition to my
personal publications, here are some publications related to the vitual talking heads (into which this research fits) developped at the
Department of Speech and Cognition (ex-
ICP) of
GIPSA, a description and an example of application:
Pierre Badin, Gérard Bailly, Frédéric Elisei, and Matthias Odisio.
Virtual Talking Heads and audiovisual articulatory synthesis.
In D. Recasens M.-J. Solé and J. Romero, editors,
Proceedings
of the 15th International Congress of Phonetic Sciences, volume 1, pages
193-197, Barcelona, Spain, 2003.
Yuliya Tarabalka, Pierre Badin, Frédéric Elisei and Gérard Bailly.
Can you "read tongue movements"? Evaluation of the contribution of tongue display to speech understanding
In
Actes de la 1ère Conférence internationale sur l'accessibilité et les systèmes de suppléance aux personnes en situation de handicaps (ASSISTH'2007) , Toulouse, France, 2007.