A Speech-Driven 3-D Tongue Model with Realistic Movement in Mandarin Chinese

Changwei Liang, Jiangping Kong, Xiyu Wu
{"title":"A Speech-Driven 3-D Tongue Model with Realistic Movement in Mandarin Chinese","authors":"Changwei Liang, Jiangping Kong, Xiyu Wu","doi":"10.1145/3448748.3448796","DOIUrl":null,"url":null,"abstract":"In this paper, a new speech driven 3-D geometric tongue model is constructed. The constructed 3-D tongue shape is controlled with control points on 2-D midsagittal tongue curve, and speech-driven inverse estimation based on the constructed model is evaluated by empirical data. X-Ray 2-D vocal tract motion videos are tagged for the midsagittal tongue motion, and static 3-D vocal tracts of 20 phonemes are collected with MRI for the realistic 3-D tongue shape. MFCC are calculated from the videos as acoustic features, and are then used in a LSTM-RNN to predict the control points movement of the tongue shape. Three geometrically intuitive control points are selected to represent and calculate the midsagittal line of the tongue through linear regression. Cross-sections on the central lines of the tongues, whose height, width and angle are then predicted from the midsagittal line, are reconstructed with geometric curves, and the shape of each cross-section are then placed on the midsagittal line to get the overall predicted moving grid of the 3-D tongue. In this 3-D tongue model, acoustic features and realistic tongue motion are mapped directly to preserve more realistic articulatory details, and the control points are intuitive for non-experts to control the model, and the geometric tongue shapes predicted are comparable with realistic tongue dynamics. Based on the proposed method, the speech-driven prediction is evaluated with the realistic data, which proved this proposed method feasible.","PeriodicalId":115821,"journal":{"name":"Proceedings of the 2021 International Conference on Bioinformatics and Intelligent Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 International Conference on Bioinformatics and Intelligent Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3448748.3448796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this paper, a new speech driven 3-D geometric tongue model is constructed. The constructed 3-D tongue shape is controlled with control points on 2-D midsagittal tongue curve, and speech-driven inverse estimation based on the constructed model is evaluated by empirical data. X-Ray 2-D vocal tract motion videos are tagged for the midsagittal tongue motion, and static 3-D vocal tracts of 20 phonemes are collected with MRI for the realistic 3-D tongue shape. MFCC are calculated from the videos as acoustic features, and are then used in a LSTM-RNN to predict the control points movement of the tongue shape. Three geometrically intuitive control points are selected to represent and calculate the midsagittal line of the tongue through linear regression. Cross-sections on the central lines of the tongues, whose height, width and angle are then predicted from the midsagittal line, are reconstructed with geometric curves, and the shape of each cross-section are then placed on the midsagittal line to get the overall predicted moving grid of the 3-D tongue. In this 3-D tongue model, acoustic features and realistic tongue motion are mapped directly to preserve more realistic articulatory details, and the control points are intuitive for non-experts to control the model, and the geometric tongue shapes predicted are comparable with realistic tongue dynamics. Based on the proposed method, the speech-driven prediction is evaluated with the realistic data, which proved this proposed method feasible.
基于语言驱动的普通话三维语言模型
本文构建了一种新的语音驱动的三维几何舌形模型。利用二维中矢状舌曲线上的控制点对构建的三维舌形进行控制,并利用经验数据对基于构建模型的语音驱动逆估计进行评价。对x射线二维声道运动视频进行舌正中矢状位运动标记,对20个音素的静态三维声道进行MRI采集,获得真实的三维舌形。从视频中计算出MFCC作为声学特征,然后将其用于LSTM-RNN来预测舌头形状的控制点运动。选择三个几何上直观的控制点,通过线性回归来表示和计算舌矢状中线。然后用几何曲线重建舌头中心线上的横截面,根据中矢状线预测舌头的高度、宽度和角度,然后将每个横截面的形状放在中矢状线上,得到三维舌头的整体预测移动网格。该三维舌形模型直接映射声学特征和真实舌形运动,保留了更真实的发音细节,控制点直观,便于非专家控制模型,预测的舌形几何形状与真实舌形动力学相当。在此基础上,用实际数据对语音驱动预测结果进行了评价,验证了该方法的可行性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信