2012 IEEE Spoken Language Technology Workshop (SLT)最新文献

筛选
英文 中文
Combining criteria for the detection of incorrect entries of non-native speech in the context of foreign language learning 结合外语学习背景下非母语语音错误词条检测标准
2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-02 DOI: 10.1109/SLT.2012.6424261
Luiza Orosanu, D. Jouvet, D. Fohr, I. Illina, A. Bonneau
{"title":"Combining criteria for the detection of incorrect entries of non-native speech in the context of foreign language learning","authors":"Luiza Orosanu, D. Jouvet, D. Fohr, I. Illina, A. Bonneau","doi":"10.1109/SLT.2012.6424261","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424261","url":null,"abstract":"This article analyzes the detection of incorrect entries of non-native speech in the context of foreign language learning. The purpose is to detect and reject incorrect entries (i.e. those for which the speech signal does not correspond at all to the associated text) while being tolerant to the mispronunciations of non-native speech. The proposed approach exploits the comparison between two text-to-speech alignments : one constrained by the text which is being checked, with another one unconstrained, corresponding to a phonetic decoding. Several comparison criteria are described and combined via a logistic regression function. The article analyzes the influence of different settings, such as the impact of non-native pronunciation variants, the impact of learning the decision functions on native or on non-native speech, as well as the impact of combining various comparison criteria. The performance evaluations are conducted both on native and on non-native speech.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121431058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Crowdsourcing the acquisition of natural language corpora: Methods and observations 自然语言语料库的众包获取:方法与观察
2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424200
William Yang Wang, D. Bohus, Ece Kamar, E. Horvitz
{"title":"Crowdsourcing the acquisition of natural language corpora: Methods and observations","authors":"William Yang Wang, D. Bohus, Ece Kamar, E. Horvitz","doi":"10.1109/SLT.2012.6424200","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424200","url":null,"abstract":"We study the opportunity for using crowdsourcing methods to acquire language corpora for use in natural language processing systems. Specifically, we empirically investigate three methods for eliciting natural language sentences that correspond to a given semantic form. The methods convey frame semantics to crowd workers by means of sentences, scenarios, and list-based descriptions. We discuss various performance measures of the crowdsourcing process, and analyze the semantic correctness, naturalness, and biases of the collected language. We highlight research challenges and directions in applying these methods to acquire corpora for natural language processing applications.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121251118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Exploiting the Semantic Web for unsupervised spoken language understanding 利用语义网进行无监督的口语理解
2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424227
Larry Heck, Dilek Z. Hakkani-Tür
{"title":"Exploiting the Semantic Web for unsupervised spoken language understanding","authors":"Larry Heck, Dilek Z. Hakkani-Tür","doi":"10.1109/SLT.2012.6424227","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424227","url":null,"abstract":"This paper proposes an unsupervised training approach for SLU systems that leverages the structured semantic knowledge graphs of the emerging Semantic Web. The approach creates natural language surface forms of entity-relation-entity portions of knowledge graphs using a combination of web search retrieval and syntax-based dependency parsing. The new forms are used to train an SLU system in an unsupervised manner. This paper tests the approach on the problem of intent detection, and shows that the unsupervised training procedure matches the performance of supervised training over operating points important for commercial applications.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125929339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
An automatic pitch accent feedback system for english learners with adaptation of an english corpus spoken by Koreans 一个自动音高口音反馈系统,为英语学习者与韩国人说的英语语料库的适应
2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424263
Sechun Kang, G. G. Lee, Ho-Young Lee, Byeongchang Kim
{"title":"An automatic pitch accent feedback system for english learners with adaptation of an english corpus spoken by Koreans","authors":"Sechun Kang, G. G. Lee, Ho-Young Lee, Byeongchang Kim","doi":"10.1109/SLT.2012.6424263","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424263","url":null,"abstract":"To improve the English proficiency of Korean learners, we design a system for pitch accents, which consists of prediction, detection and feedback parts. The prediction and detection parts adopt Conditional Random Field models to achieve a prediction accuracy of 87.25%, which is based on the Boston University radio news corpus, and a detection accuracy of 81.21%, which is based on the Korean Learner's English Accentuation corpus. In the learner experiment with our system, learners' pitch accent proficiency, as assessed by English experts, was improved from 2.67 to 3.25 on a scale of 1-to-5, and the accuracy of not-wrong feedback was measured at 82.77%. The learners assessed the learning effectiveness of our system at 4.3 on a scale of 1-to-5.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"19 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123280914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Recognition rate estimation based on word alignment network and discriminative error type classification 基于词对齐网络和判别错误类型分类的识别率估计
2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424207
A. Ogawa, Takaaki Hori, Atsushi Nakamura
{"title":"Recognition rate estimation based on word alignment network and discriminative error type classification","authors":"A. Ogawa, Takaaki Hori, Atsushi Nakamura","doi":"10.1109/SLT.2012.6424207","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424207","url":null,"abstract":"Techniques for estimating recognition rates without using reference transcriptions are essential if we are to judge whether or not speech recognition technology is applicable to a new task. This paper proposes two recognition rate estimation methods for continuous speech recognition. The first is an easy-to-use method based on a word alignment network (WAN) obtained from a word confusion network through simple conversion procedures. A WAN contains the correct (C), substitution error (S), insertion error (I) and deletion error (D) probabilities word-by-word for a recognition result. By summing these CSID probabilities individually, the percent correct and word accuracy (WACC) can be estimated without using a reference transcription. The second more advanced method refines the CSID probabilities provided by a WAN based on discriminative error type classification (ETC) and estimates the recognition rates more accurately. In the experiments on the MIT lecture speech corpus, we obtained 0.97 of correlation coefficient between the true WACCs calculated by a scoring tool using reference transcriptions and the WACCs estimated from the discriminative ETC results.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122714204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Generating grammar questions using corpus data in L2 learning 在二语学习中使用语料库数据生成语法问题
2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424265
Kyusong Lee, Soo-Ok Kweon, Hongsuck Seo, G. G. Lee
{"title":"Generating grammar questions using corpus data in L2 learning","authors":"Kyusong Lee, Soo-Ok Kweon, Hongsuck Seo, G. G. Lee","doi":"10.1109/SLT.2012.6424265","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424265","url":null,"abstract":"This paper examines how grammar questions are automatically generated for L2 learning by applying a sequential labeling technique to learner corpora. We developed a model that helps detect possible error positions and select the most appropriate form among choices. Discriminant models such as conditional random field and maximum entropy are used to generate the error identification question. Questions generated by the proposed method corresponded highly to questions that experts made. Our data-driven approach lends itself to any language without costing expensive expertise.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122923587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Modeling intensity contours and the interaction of pitch and intensity to improve automatic prosodic event detection and classification 建模强度轮廓和音调和强度的相互作用,以提高自动韵律事件检测和分类
2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424253
A. Rosenberg
{"title":"Modeling intensity contours and the interaction of pitch and intensity to improve automatic prosodic event detection and classification","authors":"A. Rosenberg","doi":"10.1109/SLT.2012.6424253","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424253","url":null,"abstract":"Prosody, or the way words are spoken, carries important information to understanding a speaker's communicative intention. Many studies on automatic prosodic analysis focus on parameterizing pitch content. In this work, we extend previous pitch contour modeling features to intensity contours, and develop a set of features based on the interaction of pitch and intensity. These new features improve the state-of-the-art on all prosodic event detection and classification tasks related to automatic ToBI labeling.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122078746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Modeling multiword phrases with constrained phrase trees for improved topic modeling of conversational speech 基于约束短语树的多词短语建模,改进会话语音的主题建模
2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424226
Timothy J. Hazen, Fred Richardson
{"title":"Modeling multiword phrases with constrained phrase trees for improved topic modeling of conversational speech","authors":"Timothy J. Hazen, Fred Richardson","doi":"10.1109/SLT.2012.6424226","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424226","url":null,"abstract":"Latent topic modeling has proven to be an effective means for learning the underlying semantic content within document collections. Latent topic modeling has traditionally been applied to bag-of-words representations that ignore word sequence information that can aid in semantic understanding. In this work we introduce a method for efficiently incorporating arbitrarily long word sequences into a topic modeling approach. This method iteratively constructs a constrained set of phrase trees in an unsupervised fashion from a document collection using weighted pointwise mutual information statistics to guide the process. In experiments on the Fisher Corpus of conversational speech, the incorporation of learned phrases into a latent topic model yielded significant improvements in the unsupervised discovery of the known topics present within the data.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116512471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Simultaneous feature selection and parameter optimization for training of dialog policy by reinforcement learning 基于强化学习的对话策略训练的特征选择与参数优化
2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424160
Teruhisa Misu, H. Kashioka
{"title":"Simultaneous feature selection and parameter optimization for training of dialog policy by reinforcement learning","authors":"Teruhisa Misu, H. Kashioka","doi":"10.1109/SLT.2012.6424160","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424160","url":null,"abstract":"This paper addresses the problem of feature selection in the reinforcement learning (RL) of the dialog policies of spoken dialog systems. A statistical dialog manager selects the system actions the system should take based on the features derived from the current dialog state and/or the system's belief state. When defining the features used by the system for training the dialog policy, however, finding a set of actually effective features from potentially useful ones is not obvious. In addition, the selection should be done simultaneously with the optimization of the dialog policy. In this paper, we propose an incremental feature selection method for the optimization of a dialog policy by RL, in which improvement of the dialog policy and the feature selection are conducted simultaneously. Experiments in dialog policy optimization by RL with a user simulator demonstrated the following: 1) that the proposed method can find a better dialog policy with fewer policy iterations and 2) the learning speed is comparable with the case where feature selection is conducted in advance.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121621114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On the use of phone log-likelihood ratios as features in spoken language recognition 电话日志似然比在口语识别中的应用
2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424235
M. Díez, A. Varona, M. Peñagarikano, Luis Javier Rodriguez-Fuentes, Germán Bordel
{"title":"On the use of phone log-likelihood ratios as features in spoken language recognition","authors":"M. Díez, A. Varona, M. Peñagarikano, Luis Javier Rodriguez-Fuentes, Germán Bordel","doi":"10.1109/SLT.2012.6424235","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424235","url":null,"abstract":"This paper presents an alternative feature set to the traditional MFCC-SDC used in acoustic approaches to Spoken Language Recognition: the log-likelihood ratios of phone posterior probabilities, hereafter Phone Log-Likelihood Ratios (PLLR), produced by a phone recognizer. In this work, an iVector system trained on this set of features (plus dynamic coefficients) is evaluated and compared to (1) an acoustic iVector system (trained on the MFCC-SDC feature set) and (2) a phonotactic (Phone-lattice-SVM) system, using two different benchmarks: the NIST 2007 and 2009 LRE datasets. iVector systems trained on PLLR features proved to be competitive, reaching or even outperforming the MFCC-SDC-based iVector and the phonotactic systems. The fusion of the proposed approach with the acoustic and phonotactic systems provided even more significant improvements, outperforming state-of-the-art systems on both benchmarks.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126215253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信