Magnetic Resonance Imaging of the Vocal Tract - Techniques and Applications

S. R. Ventura, D. Freitas, J. Tavares
{"title":"Magnetic Resonance Imaging of the Vocal Tract - Techniques and Applications","authors":"S. R. Ventura, D. Freitas, J. Tavares","doi":"10.5220/0001792901050110","DOIUrl":null,"url":null,"abstract":"Magnetic resonance (MR) imaging has been used to analyse and evaluate the vocal tract shape through different techniques and with promising results in several fields. Our purpose is to demonstrate the relevance of MR and image processing for the vocal tract study. The extraction of contours of the air cavities allowed the set-up of a number of 3D reconstruction image stacks by means of the combination of orthogonally oriented sets of slices for each articulatory gesture, as a new approach to solve the expected spatial under sampling of the imaging process. In result these models give improved information for the visualization of morphologic and anatomical aspects and are useful for partial measurements of the vocal tract shape in different situations. Potential use can be found in Medical and therapeutic applications as well as in acoustic articulatory speech modelling. Magnetic Resonance (MR) improvements, in the past decades, allowed vocal tract imaging, making it currently one of the most promising tools in speech research. Speech is the most important instrument of human communication and interaction. Nevertheless, the knowledge about its production is far from being complete or even sufficient to describe the most relevant acoustic phenomena that are conditioned at morphological and dynamic levels. The anatomic and physiologic aspects of the vocal tract are claimed to be essential for a better understanding of this process. The quality and resolution of soft-tissues and the use of non-ionizing radiation are some of the most important advantages of MR imaging (Avila-García et al., 2004; Engwall, 2003). Several approaches have been used up to now for the study of the vocal tract based on MR images. Since the first study proposed by Baer et al. (1991), many MR techniques have been used (from static to dynamic studies, and more recently even done in real-time), starting by studies of vowel production (Badin et al., 1998; Demolin et al., 2000), followed by consonant production (Engwall, 2000b; Narayanan et al., 2004), for different languages such as French (Demolin et al., 1996; Serrurier & Badin, 2006), German (Behrends et al., 2001; Mády et al., 2001), and Japonese (Kitamura et al., 2005; Takemoto et al., 2003). The work presented in this paper, consisting basically in the static description of the vocal tract shape during sustained vowels and consonants and in the dynamic description of some syllables, is the first to report the application of MR imaging for the characterization of European Portuguese (EP). This study started in 2004, having attained a first series of results published in 2006 (Rua & Freitas, 2006). Our approach can be seen as a contribution to the wide area of articultory speech modeling, since it provides geometrical data to the acoustic modeling phase or research. In the articulatory speech research of EP a few studies of nasal vowels have been carried through, at the acoustic production and perceptual levels based on acoustic analysis and electromagnetic articulography (Teixeira et al., 2001, 2002, 2003). More recently, another MR study of EP presents some results relative to oral and nasal vowels exploring contours extraction from 2D images, articulatory measures and area functions (Martins et al., 2008). In former studies, vocal tract modelling has been limited to the midsagittal plane (Engwall, 2000a; Takemoto et al., 2003), but improvement of MR imaging equipment system capabilities allowed the expansion into this domain of research and made it possible to obtain three-dimensional (3D) modelling (Badin & Serrurier, 2006). The more realistic models of the vocal tract shape that nowadays are possible to obtain, are hugely needed in the research towards improved speech synthesis algorithms and more efficient speech rehabilitation. The main purpose of this paper is to present some 3D models of the vocal tract based on MR data of some relevant sustained articulations of EP in a static study. From the point of view of image processing, a new approach for 3D modelling by means of the combination of orthogonal stacks, to describe the vocal tract shape in different articulatory positions is presented. We also demonstrate an MR technique to capture useful image sequences during speech (dynamic study). In addition, some preliminary results of this dynamic study are presented. The remaining of this paper is organized in four sections. The next section is dedicated to the methods and describes the equipment, corpus and subjects, as well as the procedures used for the speech study, namely for morphologic and dynamic imaging of the vocal tract. The results are presented in following section, through the exhibition of some three-dimensional models built of the vocal tract and an image sequence obtained during speech. Finally the conclusions of the work described are presented.","PeriodicalId":231479,"journal":{"name":"International Conference on Imaging Theory and Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Imaging Theory and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0001792901050110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Magnetic resonance (MR) imaging has been used to analyse and evaluate the vocal tract shape through different techniques and with promising results in several fields. Our purpose is to demonstrate the relevance of MR and image processing for the vocal tract study. The extraction of contours of the air cavities allowed the set-up of a number of 3D reconstruction image stacks by means of the combination of orthogonally oriented sets of slices for each articulatory gesture, as a new approach to solve the expected spatial under sampling of the imaging process. In result these models give improved information for the visualization of morphologic and anatomical aspects and are useful for partial measurements of the vocal tract shape in different situations. Potential use can be found in Medical and therapeutic applications as well as in acoustic articulatory speech modelling. Magnetic Resonance (MR) improvements, in the past decades, allowed vocal tract imaging, making it currently one of the most promising tools in speech research. Speech is the most important instrument of human communication and interaction. Nevertheless, the knowledge about its production is far from being complete or even sufficient to describe the most relevant acoustic phenomena that are conditioned at morphological and dynamic levels. The anatomic and physiologic aspects of the vocal tract are claimed to be essential for a better understanding of this process. The quality and resolution of soft-tissues and the use of non-ionizing radiation are some of the most important advantages of MR imaging (Avila-García et al., 2004; Engwall, 2003). Several approaches have been used up to now for the study of the vocal tract based on MR images. Since the first study proposed by Baer et al. (1991), many MR techniques have been used (from static to dynamic studies, and more recently even done in real-time), starting by studies of vowel production (Badin et al., 1998; Demolin et al., 2000), followed by consonant production (Engwall, 2000b; Narayanan et al., 2004), for different languages such as French (Demolin et al., 1996; Serrurier & Badin, 2006), German (Behrends et al., 2001; Mády et al., 2001), and Japonese (Kitamura et al., 2005; Takemoto et al., 2003). The work presented in this paper, consisting basically in the static description of the vocal tract shape during sustained vowels and consonants and in the dynamic description of some syllables, is the first to report the application of MR imaging for the characterization of European Portuguese (EP). This study started in 2004, having attained a first series of results published in 2006 (Rua & Freitas, 2006). Our approach can be seen as a contribution to the wide area of articultory speech modeling, since it provides geometrical data to the acoustic modeling phase or research. In the articulatory speech research of EP a few studies of nasal vowels have been carried through, at the acoustic production and perceptual levels based on acoustic analysis and electromagnetic articulography (Teixeira et al., 2001, 2002, 2003). More recently, another MR study of EP presents some results relative to oral and nasal vowels exploring contours extraction from 2D images, articulatory measures and area functions (Martins et al., 2008). In former studies, vocal tract modelling has been limited to the midsagittal plane (Engwall, 2000a; Takemoto et al., 2003), but improvement of MR imaging equipment system capabilities allowed the expansion into this domain of research and made it possible to obtain three-dimensional (3D) modelling (Badin & Serrurier, 2006). The more realistic models of the vocal tract shape that nowadays are possible to obtain, are hugely needed in the research towards improved speech synthesis algorithms and more efficient speech rehabilitation. The main purpose of this paper is to present some 3D models of the vocal tract based on MR data of some relevant sustained articulations of EP in a static study. From the point of view of image processing, a new approach for 3D modelling by means of the combination of orthogonal stacks, to describe the vocal tract shape in different articulatory positions is presented. We also demonstrate an MR technique to capture useful image sequences during speech (dynamic study). In addition, some preliminary results of this dynamic study are presented. The remaining of this paper is organized in four sections. The next section is dedicated to the methods and describes the equipment, corpus and subjects, as well as the procedures used for the speech study, namely for morphologic and dynamic imaging of the vocal tract. The results are presented in following section, through the exhibition of some three-dimensional models built of the vocal tract and an image sequence obtained during speech. Finally the conclusions of the work described are presented.
声道磁共振成像技术与应用
磁共振(MR)成像已被用于通过不同的技术来分析和评估声道形状,并在几个领域取得了可喜的结果。我们的目的是证明MR和图像处理在声道研究中的相关性。空气腔轮廓的提取允许通过组合每个发音手势的正交取向切片集来建立多个3D重建图像堆栈,作为解决成像过程中期望的采样下空间的新方法。结果,这些模型为形态学和解剖学方面的可视化提供了改进的信息,并可用于不同情况下声道形状的部分测量。潜在的用途可以在医学和治疗应用以及声学发音语音建模中找到。在过去的几十年里,磁共振技术的进步使声道成像成为可能,使其成为目前语音研究中最有前途的工具之一。语言是人类交流和互动最重要的工具。然而,关于其产生的知识还远远不够完整,甚至不足以描述在形态和动态水平上受限制的最相关的声学现象。声道的解剖和生理方面被认为是更好地理解这一过程所必需的。软组织的质量和分辨率以及非电离辐射的使用是核磁共振成像的一些最重要的优势(Avila-García等人,2004;Engwall, 2003)。目前已有几种基于磁共振图像的声道研究方法。自从Baer et al.(1991)提出第一项研究以来,许多磁共振技术已经被使用(从静态研究到动态研究,最近甚至是实时研究),从元音产生的研究开始(Badin et al., 1998;demoin et al., 2000),然后是辅音生成(Engwall, 2000b;Narayanan et al., 2004),用于不同的语言,如法语(demoin et al., 1996;Serrurier & Badin, 2006),德语(Behrends et al., 2001;Mády et al., 2001)和日语(Kitamura et al., 2005;Takemoto et al., 2003)。本文介绍的工作主要包括在持续元音和辅音时声道形状的静态描述以及一些音节的动态描述,这是第一次报道MR成像在欧洲葡萄牙语(EP)表征中的应用。这项研究开始于2004年,并于2006年发表了第一批研究结果(Rua & Freitas, 2006)。我们的方法可以被视为对发音语音建模的广泛领域的贡献,因为它为声学建模阶段或研究提供了几何数据。在EP的发音语音研究中,基于声学分析和电磁发音学,在声学产生和感知水平上对鼻音元音进行了一些研究(Teixeira et al., 2001,2002,2003)。最近,另一项关于EP的MR研究提出了一些与口腔和鼻元音有关的结果,探索了从2D图像中提取轮廓、发音测量和区域功能(Martins et al., 2008)。在以前的研究中,声道建模仅限于中矢状面(Engwall, 2000;Takemoto et al., 2003),但磁共振成像设备系统能力的提高使其得以扩展到这一研究领域,并使其能够获得三维(3D)建模(Badin & Serrurier, 2006)。在改进语音合成算法和提高语音康复效率的研究中,迫切需要更逼真的声道形状模型。本文的主要目的是在静态研究中,基于一些相关EP持续发音的MR数据,提出一些声道的三维模型。从图像处理的角度出发,提出了一种利用正交叠加组合的方法来描述不同发音位置声道形状的三维建模方法。我们还演示了一种MR技术,用于在语音(动态研究)中捕获有用的图像序列。此外,本文还给出了这一动态研究的一些初步结果。本文的其余部分分为四个部分。下一节专门介绍语音研究的方法和设备,语料库和对象,以及用于语音研究的程序,即声道的形态和动态成像。通过展示一些建立的声道三维模型和在讲话过程中获得的图像序列,结果将在下一节中展示。最后,对本文的工作进行了总结。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信