基于支持向量机的音素分类和唇形优化实现实时口型同步

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI:10.1109/ICMI.2002.1167010

Taeyoon Kim, Yongsung Kang, Hanseok Ko

{"title":"基于支持向量机的音素分类和唇形优化实现实时口型同步","authors":"Taeyoon Kim, Yongsung Kang, Hanseok Ko","doi":"10.1109/ICMI.2002.1167010","DOIUrl":null,"url":null,"abstract":"In this paper, we develop a real time lip-synch system that activates a 2D avatar's lip motion in synch with incoming speech utterance. To realize \"real time\" operation of the system, we contain the processing time by invoking a merge and split procedure performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply a support vector machine (SVM) to constrain the computational load while attaining desirable accuracy. Coarse-to-fine phoneme classification is accomplished via 2 stages of feature extraction, where each speech frame is acoustically analyzed first for 3 classes of lip opening using MFCC as the feature and then a further refined classification for detailed lip shape using formant information. We implemented the system with 2D lip animation that shows the effectiveness of the proposed 2-stage procedure accomplishing the real-time lip-synch task.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Achieving real-time lip-synch via SVM-based phoneme classification and lip shape refinement\",\"authors\":\"Taeyoon Kim, Yongsung Kang, Hanseok Ko\",\"doi\":\"10.1109/ICMI.2002.1167010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we develop a real time lip-synch system that activates a 2D avatar's lip motion in synch with incoming speech utterance. To realize \\\"real time\\\" operation of the system, we contain the processing time by invoking a merge and split procedure performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply a support vector machine (SVM) to constrain the computational load while attaining desirable accuracy. Coarse-to-fine phoneme classification is accomplished via 2 stages of feature extraction, where each speech frame is acoustically analyzed first for 3 classes of lip opening using MFCC as the feature and then a further refined classification for detailed lip shape using formant information. We implemented the system with 2D lip animation that shows the effectiveness of the proposed 2-stage procedure accomplishing the real-time lip-synch task.\",\"PeriodicalId\":208377,\"journal\":{\"name\":\"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMI.2002.1167010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMI.2002.1167010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

在本文中，我们开发了一个实时口型同步系统，该系统可以激活2D虚拟人物的嘴唇运动与输入的语音同步。为了实现系统的“实时”操作，我们通过调用一个合并和分割过程来执行从粗到精的音素分类，从而包含处理时间。在音素分类的每个阶段，我们使用支持向量机(SVM)来约束计算量，同时获得理想的准确率。从粗到精的音素分类是通过两个阶段的特征提取来完成的，其中每个语音帧首先使用MFCC作为特征对3类唇开进行声学分析，然后使用形成体信息对详细的唇形进行进一步的细化分类。我们用2D嘴唇动画实现了该系统，显示了所提出的两阶段过程完成实时口型同步任务的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Achieving real-time lip-synch via SVM-based phoneme classification and lip shape refinement

In this paper, we develop a real time lip-synch system that activates a 2D avatar's lip motion in synch with incoming speech utterance. To realize "real time" operation of the system, we contain the processing time by invoking a merge and split procedure performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply a support vector machine (SVM) to constrain the computational load while attaining desirable accuracy. Coarse-to-fine phoneme classification is accomplished via 2 stages of feature extraction, where each speech frame is acoustically analyzed first for 3 classes of lip opening using MFCC as the feature and then a further refined classification for detailed lip shape using formant information. We implemented the system with 2D lip animation that shows the effectiveness of the proposed 2-stage procedure accomplishing the real-time lip-synch task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces

自引率

0.00%

发文量