基于支持向量机的音素分类和唇形优化实现实时口型同步

Taeyoon Kim, Yongsung Kang, Hanseok Ko
{"title":"基于支持向量机的音素分类和唇形优化实现实时口型同步","authors":"Taeyoon Kim, Yongsung Kang, Hanseok Ko","doi":"10.1109/ICMI.2002.1167010","DOIUrl":null,"url":null,"abstract":"In this paper, we develop a real time lip-synch system that activates a 2D avatar's lip motion in synch with incoming speech utterance. To realize \"real time\" operation of the system, we contain the processing time by invoking a merge and split procedure performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply a support vector machine (SVM) to constrain the computational load while attaining desirable accuracy. Coarse-to-fine phoneme classification is accomplished via 2 stages of feature extraction, where each speech frame is acoustically analyzed first for 3 classes of lip opening using MFCC as the feature and then a further refined classification for detailed lip shape using formant information. We implemented the system with 2D lip animation that shows the effectiveness of the proposed 2-stage procedure accomplishing the real-time lip-synch task.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Achieving real-time lip-synch via SVM-based phoneme classification and lip shape refinement\",\"authors\":\"Taeyoon Kim, Yongsung Kang, Hanseok Ko\",\"doi\":\"10.1109/ICMI.2002.1167010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we develop a real time lip-synch system that activates a 2D avatar's lip motion in synch with incoming speech utterance. To realize \\\"real time\\\" operation of the system, we contain the processing time by invoking a merge and split procedure performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply a support vector machine (SVM) to constrain the computational load while attaining desirable accuracy. Coarse-to-fine phoneme classification is accomplished via 2 stages of feature extraction, where each speech frame is acoustically analyzed first for 3 classes of lip opening using MFCC as the feature and then a further refined classification for detailed lip shape using formant information. We implemented the system with 2D lip animation that shows the effectiveness of the proposed 2-stage procedure accomplishing the real-time lip-synch task.\",\"PeriodicalId\":208377,\"journal\":{\"name\":\"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMI.2002.1167010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMI.2002.1167010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

在本文中,我们开发了一个实时口型同步系统,该系统可以激活2D虚拟人物的嘴唇运动与输入的语音同步。为了实现系统的“实时”操作,我们通过调用一个合并和分割过程来执行从粗到精的音素分类,从而包含处理时间。在音素分类的每个阶段,我们使用支持向量机(SVM)来约束计算量,同时获得理想的准确率。从粗到精的音素分类是通过两个阶段的特征提取来完成的,其中每个语音帧首先使用MFCC作为特征对3类唇开进行声学分析,然后使用形成体信息对详细的唇形进行进一步的细化分类。我们用2D嘴唇动画实现了该系统,显示了所提出的两阶段过程完成实时口型同步任务的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Achieving real-time lip-synch via SVM-based phoneme classification and lip shape refinement
In this paper, we develop a real time lip-synch system that activates a 2D avatar's lip motion in synch with incoming speech utterance. To realize "real time" operation of the system, we contain the processing time by invoking a merge and split procedure performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply a support vector machine (SVM) to constrain the computational load while attaining desirable accuracy. Coarse-to-fine phoneme classification is accomplished via 2 stages of feature extraction, where each speech frame is acoustically analyzed first for 3 classes of lip opening using MFCC as the feature and then a further refined classification for detailed lip shape using formant information. We implemented the system with 2D lip animation that shows the effectiveness of the proposed 2-stage procedure accomplishing the real-time lip-synch task.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信