2008 IEEE Spoken Language Technology Workshop最新文献

筛选
英文 中文
Multilingual spoken-password based user authentication in emerging economies using cellular phone networks 新兴经济体中使用移动电话网络的基于多语言语音密码的用户认证
2008 IEEE Spoken Language Technology Workshop Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777826
A. Das, O. K. Manyam, Makarand Tapaswi, Veeresh Taranalli
{"title":"Multilingual spoken-password based user authentication in emerging economies using cellular phone networks","authors":"A. Das, O. K. Manyam, Makarand Tapaswi, Veeresh Taranalli","doi":"10.1109/SLT.2008.4777826","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777826","url":null,"abstract":"Mobile phones are playing an important role in changing the socio-economic landscapes of emerging economies like India. A proper voice-based user authentication will help in many new mobile based applications including mobile-commerce and banking. We present our exploration and evaluation of an experimental set-up for user authentication in remote Indian villages using mobile phones and user-selected multilingual spoken passwords. We also present an effective speaker recognition method using a set of novel features called Compressed Feature Dynamics (CFD) which capture the speaker-identity effectively from the speech dynamics contained in the spoken passwords. Early trials demonstrate the effectiveness of the proposed method in handling noisy cell-phone speech. Compared to conventional text-dependent speaker recognition methods, the proposed CFD method delivers competitive performance while significantly reducing storage and computational complexity - an advantage highly beneficial for cell-phone based deployment of such user authentication systems.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116588212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Modeling vocal interaction for text-independent detection of involvement hotspots in multi-party meetings 基于文本独立的多方会议参与热点检测的语音交互建模
2008 IEEE Spoken Language Technology Workshop Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777845
K. Laskowski
{"title":"Modeling vocal interaction for text-independent detection of involvement hotspots in multi-party meetings","authors":"K. Laskowski","doi":"10.1109/SLT.2008.4777845","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777845","url":null,"abstract":"Indexing, retrieval, and summarization in recordings of meetings have, to date, focused largely on the propositional content of what participants say. Although objectively relevant, such content may not be the sole or even the main aim of potential system users. Instead, users may be interested in information bearing on conversation flow. We explore the automatic detection of one example of such information, namely that of hotspots defined in terms of participant involvement. Our proposed system relies exclusively on low-level vocal activity features, and yields a classification accuracy of 84%, representing a 39% reduction of error relative to a baseline which selects the majority class.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129890596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Incorporating discourse context in spoken language translation through dialog acts 通过对话行为将话语语境融入口语翻译中
2008 IEEE Spoken Language Technology Workshop Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777892
V. Sridhar, Shrikanth S. Narayanan, S. Bangalore
{"title":"Incorporating discourse context in spoken language translation through dialog acts","authors":"V. Sridhar, Shrikanth S. Narayanan, S. Bangalore","doi":"10.1109/SLT.2008.4777892","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777892","url":null,"abstract":"Current statistical speech translation approaches predominantly rely on just text transcripts and are limited in their use of rich contextual information such as prosody and discourse function. In this paper, we explore the role of discourse context characterized through dialog acts (DAs) in statistical translation. We present a bag-of-words (BOW) model that exploits DA tags in translation and contrast it with a phrase table interpolation approach presented in previous work. In addition to producing interpretable DA-annotated target language translations through our framework, we also obtain consistent improvements in terms of automatic evaluation metrics such as lexical selection accuracy and BLEU score using both the models. We also analyze the performance improvements per DA tag. Our experiments indicate that questions, acknowledgments, agreements and appreciations contribute to more improvement in comparison to statements.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133778745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Vowel-based frequency alignment function design and recognition-based time alignment for automatic speech morphing 语音自动变形中基于元音的频率对齐函数设计和基于识别的时间对齐
2008 IEEE Spoken Language Technology Workshop Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777831
Masato Onishi, Toru Takahashi, T. Irino, Hideki Kawahara
{"title":"Vowel-based frequency alignment function design and recognition-based time alignment for automatic speech morphing","authors":"Masato Onishi, Toru Takahashi, T. Irino, Hideki Kawahara","doi":"10.1109/SLT.2008.4777831","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777831","url":null,"abstract":"New design procedures of time-frequency alignment for automatic speech morphing are proposed. The frequency alignment function at a specific frame is represented as a weighted average of vowel alignment functions based on similarity to each vowel. Julian, an open source speech recognition system, was used to design a time alignment function. Objective and subjective tests were conducted to evaluate the proposed method, and test results indicated that the proposed method yields comparable naturalness to the manually morphed samples in terms of time alignment. The results also illustrated that the proposed frequency alignment provides significantly better naturalness than morphed samples without frequency alignment.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"402 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133557660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Using hidden Markov models for topic segmentation of meeting transcripts 基于隐马尔可夫模型的会议记录主题分割
2008 IEEE Spoken Language Technology Workshop Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777871
Melissa Sherman, Yang Liu
{"title":"Using hidden Markov models for topic segmentation of meeting transcripts","authors":"Melissa Sherman, Yang Liu","doi":"10.1109/SLT.2008.4777871","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777871","url":null,"abstract":"In this paper, we present a hidden Markov model (HMM) approach to segment meeting transcripts into topics. To learn the model, we use unsupervised learning to cluster the text segments obtained from topic boundary information. Using modified WinDiff and Pk metrics, we demonstrate that an HMM outperforms LCSeg, a state-of-the-art lexical chain based method for topic segmentation using the ICSI meeting corpus. We evaluate the effect of language model order, the number of hidden states, and the use of stop words. Our experimental results show that a unigram LM is better than a trigram LM, using too many hidden states degrades topic segmentation performance, and that removing the stop words from the transcripts does not improve segmentation performance.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121083529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Global syllable set for building speech synthesis in Indian languages 用于在印度语言中构建语音合成的全局音节集
2008 IEEE Spoken Language Technology Workshop Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777837
E. V. Raghavendra, Srinivas Desai, B. Yegnanarayana, A. Black, K. Prahallad
{"title":"Global syllable set for building speech synthesis in Indian languages","authors":"E. V. Raghavendra, Srinivas Desai, B. Yegnanarayana, A. Black, K. Prahallad","doi":"10.1109/SLT.2008.4777837","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777837","url":null,"abstract":"Indian languages are syllabic in nature where many syllables are found common across its languages. This motivates us to build a global syllable set by combining multiple language syllables to build a synthesizer which can borrow units from a different language when the required syllable is not found. Such synthesizer make use of speech database in different languages spoken by different speakers, whose output is likely to pick units from multiple languages and hence the synthesized utterance contains units spoken by multiple speakers which would annoy the user. We intend to use a cross lingual voice conversion framework using Artificial Neural Networks (ANN) to transform such an utterance to a single target speaker.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127505157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Latent semantic retrieval of spoken documents over position specific posterior lattices 基于位置特异性后晶格的口语文件潜在语义检索
2008 IEEE Spoken Language Technology Workshop Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777896
Hung-lin Chang, Yi-Cheng Pan, Lin-Shan Lee
{"title":"Latent semantic retrieval of spoken documents over position specific posterior lattices","authors":"Hung-lin Chang, Yi-Cheng Pan, Lin-Shan Lee","doi":"10.1109/SLT.2008.4777896","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777896","url":null,"abstract":"This paper presents a new approach of latent semantic retrieval of spoken documents over Position Specific Posterior Lattices (PSPL). This approach performs concept matching instead of literal term matching during retrieval based on the Probabilistic Latent Semantic Analysis (PLSA), so as to solve the problem of term mismatch between the query and the desired spoken documents. This approach is performed over PSPL to consider the multiple hypotheses generated by ASR process, as well as the position information for these hypotheses, so as to alleviate the problem of relatively poor ASR accuracy. We establish a framework to evaluate semantic relevance between terms and the relevance score between a query and a PSPL, both based on the latent topic information from PLSA. Preliminary experiments on Chinese broadcast news segments showed significant improvements can be obtained with the proposed approach.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124015475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Experience with developing and deploying an agricultural information system using spoken language technology in Kenya 在肯尼亚使用口语技术开发和部署农业信息系统的经验
2008 IEEE Spoken Language Technology Workshop Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777829
R. Tucker, M. Gakuru
{"title":"Experience with developing and deploying an agricultural information system using spoken language technology in Kenya","authors":"R. Tucker, M. Gakuru","doi":"10.1109/SLT.2008.4777829","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777829","url":null,"abstract":"We describe the progress of the Local Language Speech Technology Initiative in Kenya, where since starting in 2003, technology and expertise have been successfully transferred to the Kenyan partners, culminating in the launch of the National Farmers Information Service (NAFIS) in April 2008. NAFIS is primarily a voice service accessed over the phone and offers a wide range of information in Kiswahili or Kenyan English, supplementing the existing agricultural extension services.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115072507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Modelling multimodal user ID in dialogue 在对话中建模多模态用户ID
2008 IEEE Spoken Language Technology Workshop Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777853
H. Holzapfel, A. Waibel
{"title":"Modelling multimodal user ID in dialogue","authors":"H. Holzapfel, A. Waibel","doi":"10.1109/SLT.2008.4777853","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777853","url":null,"abstract":"This paper presents an approach to model user ID in dialogue. A belief network is used to integrate ID classifiers, such as face ID and voice ID, and person related information, such as the first name and last name of a person from speech recognition or spelling. Different network structures are analyzed and compared with each other and are compared with a rule-based user model. The approach is evaluated on dialogue data collected in a person identification scenario, which includes both, identification of known persons and interactive learning of names and ID of unknown persons.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124961889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Identifying salient utterances of online spoken documents using descriptive hypertext 使用描述性超文本识别在线口语文档的显著话语
2008 IEEE Spoken Language Technology Workshop Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777868
Xiao-Dan Zhu, Siavash Kazemian, Gerald Penn
{"title":"Identifying salient utterances of online spoken documents using descriptive hypertext","authors":"Xiao-Dan Zhu, Siavash Kazemian, Gerald Penn","doi":"10.1109/SLT.2008.4777868","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777868","url":null,"abstract":"The Internet has become an important supply channel of spoken documents. Efficient ways of navigating their content are highly desirable. This paper aims to identify the most salient utterances from online spoken documents using relevant hypertext that encapsulates key information. Experimental results show that hypertext features are helpful when properly utilized and if the bit rates used to compress the spoken documents are reasonable.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"7 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123610746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信