SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology最新文献

筛选
英文 中文
STYLETTS-VC: ONE-SHOT VOICE CONVERSION BY KNOWLEDGE TRANSFER FROM STYLE-BASED TTS MODELS. stylettes - vc:通过基于风格的TTS模型的知识转移进行一次语音转换。
Yinghao Aaron Li, Cong Han, Nima Mesgarani
{"title":"STYLETTS-VC: ONE-SHOT VOICE CONVERSION BY KNOWLEDGE TRANSFER FROM STYLE-BASED TTS MODELS.","authors":"Yinghao Aaron Li,&nbsp;Cong Han,&nbsp;Nima Mesgarani","doi":"10.1109/slt54892.2023.10022498","DOIUrl":"https://doi.org/10.1109/slt54892.2023.10022498","url":null,"abstract":"<p><p>One-shot voice conversion (VC) aims to convert speech from any source speaker to an arbitrary target speaker with only a few seconds of reference speech from the target speaker. This relies heavily on disentangling the speaker's identity and speech content, a task that still remains challenging. Here, we propose a novel approach to learning disentangled speech representation by transfer learning from style-based text-to-speech (TTS) models. With cycle consistent and adversarial training, the style-based TTS models can perform transcription-guided one-shot VC with high fidelity and similarity. By learning an additional mel-spectrogram encoder through a teacher-student knowledge transfer and novel data augmentation scheme, our approach results in disentangled speech representation without needing the input text. The subjective evaluation shows that our approach can significantly outperform the previous state-of-the-art one-shot voice conversion models in both naturalness and similarity.</p>","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"2022 ","pages":"920-927"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10417535/pdf/nihms-1919646.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9990482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
COMPUTATIONAL ANALYSIS OF TRAJECTORIES OF LINGUISTIC DEVELOPMENT IN AUTISM. 自闭症儿童语言发展轨迹的计算分析。
Emily Prud'hommeaux, Eric Morley, Masoud Rouhizadeh, Laura Silverman, Jan van Santen, Brian Roark, Richard Sproat, Sarah Kauper, Rachel DeLaHunta
{"title":"COMPUTATIONAL ANALYSIS OF TRAJECTORIES OF LINGUISTIC DEVELOPMENT IN AUTISM.","authors":"Emily Prud'hommeaux,&nbsp;Eric Morley,&nbsp;Masoud Rouhizadeh,&nbsp;Laura Silverman,&nbsp;Jan van Santen,&nbsp;Brian Roark,&nbsp;Richard Sproat,&nbsp;Sarah Kauper,&nbsp;Rachel DeLaHunta","doi":"10.1109/SLT.2014.7078585","DOIUrl":"https://doi.org/10.1109/SLT.2014.7078585","url":null,"abstract":"<p><p>Deficits in semantic and pragmatic expression are among the hallmark linguistic features of autism. Recent work in deriving computational correlates of clinical spoken language measures has demonstrated the utility of automated linguistic analysis for characterizing the language of children with autism. Most of this research, however, has focused either on young children still acquiring language or on small populations covering a wide age range. In this paper, we extract numerous linguistic features from narratives produced by two groups of children with and without autism from two narrow age ranges. We find that although many differences between diagnostic groups remain constant with age, certain pragmatic measures, particularly the ability to remain on topic and avoid digressions, seem to improve. These results confirm findings reported in the psychology literature while underscoring the need for careful consideration of the age range of the population under investigation when performing clinically oriented computational analysis of spoken language.</p>","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"2014 ","pages":"266-271"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/SLT.2014.7078585","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35532885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
ROBUST DETECTION OF VOICED SEGMENTS IN SAMPLES OF EVERYDAY CONVERSATIONS USING UNSUPERVISED HMMS. 使用无监督 HMMs 对日常对话样本中的语音片段进行稳健检测。
Meysam Asgari, Izhak Shafran, Alireza Bayestehtashk
{"title":"ROBUST DETECTION OF VOICED SEGMENTS IN SAMPLES OF EVERYDAY CONVERSATIONS USING UNSUPERVISED HMMS.","authors":"Meysam Asgari, Izhak Shafran, Alireza Bayestehtashk","doi":"10.1109/slt.2012.6424264","DOIUrl":"10.1109/slt.2012.6424264","url":null,"abstract":"<p><p>We investigate methods for detecting voiced segments in everyday conversations from ambient recordings. Such recordings contain high diversity of background noise, making it difficult or infeasible to collect representative labelled samples for estimating noise-specific HMM models. The popular utility <i>get-f0</i> and its derivatives compute normalized cross-correlation for detecting voiced segments, which unfortunately is sensitive to different types of noise. Exploiting the fact that voiced speech is not just periodic but also rich in harmonic, we model voiced segments by adopting harmonic models, which have recently gained considerable attention. In previous work, the parameters of the model were estimated independently for each frame using maximum likelihood criterion. However, since the distribution of harmonic coefficients depend on articulators of speakers, we estimate the model parameters more robustly using a maximum <i>a posteriori</i> criterion. We use the likelihood of voicing, computed from the harmonic model, as an observation probability of an HMM and detect speech using this unsupervised HMM. The one caveat of the harmonic model is that they fail to distinguish speech from other stationary harmonic noise. We rectify this weakness by taking advantage of the non-stationary property of speech. We evaluate our models empirically on a task of detecting speech on a large corpora of everyday speech and demonstrate that these models perform significantly better than standard voice detection algorithm employed in popular tools.</p>","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"2012 ","pages":"438-442"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7909075/pdf/nihms-1670854.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25414977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient prior and incremental beam width control to suppress excessive speech recognition time based on score range estimation 基于分数范围估计的有效的先验和增量波束宽度控制来抑制过多的语音识别时间
Satoshi Kobashikawa, Takaaki Hori, Y. Yamaguchi, Taichi Asami, H. Masataki, Satoshi Takahashi
{"title":"Efficient prior and incremental beam width control to suppress excessive speech recognition time based on score range estimation","authors":"Satoshi Kobashikawa, Takaaki Hori, Y. Yamaguchi, Taichi Asami, H. Masataki, Satoshi Takahashi","doi":"10.1109/SLT.2012.6424209","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424209","url":null,"abstract":"","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"214 1","pages":"125-130"},"PeriodicalIF":0.0,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72783333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speech Technology Opportunities and Challenges 语音技术的机遇与挑战
D. Nahamoo
{"title":"Speech Technology Opportunities and Challenges","authors":"D. Nahamoo","doi":"10.1109/SLT.2006.326778","DOIUrl":"https://doi.org/10.1109/SLT.2006.326778","url":null,"abstract":"Summary form only given. Two forces are in pursuit of discovering the possibilities of speech technology automation. First is the global research and development community which has been hard at work for improving the performance and usability of the technology. Second is the business community which constantly evaluates the performance of the technology against the expectation of the user community for delivering solutions such as a spoken car navigation system. While the performance improvement has been on a constant positive progress curve, the market opportunity has been on a much more uncertain curve. For example, the early vision of delivering a dictation solution has been on hold in recent years while it enjoyed enormous interest in the 90 s. At the same time, some industry experts predict that this vision will be fulfilled soon because of the usability needs of billions of mobile devices in use today. Analogies can be drawn for the use of speech technologies for call centers self service interaction. While we have seen a much bigger market success, some industry experts predict that web self services will slow down the use of speech self service. So, where does the truth lie? What are those market opportunities that are clear winners? What opportunities will open up in future and what are their technical challenges? In this talk, we will address some of these questions.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"34 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82726673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
No More Strings, please 请不要再用绳子了
Kevin Knight
{"title":"No More Strings, please","authors":"Kevin Knight","doi":"10.1109/SLT.2006.326779","DOIUrl":"https://doi.org/10.1109/SLT.2006.326779","url":null,"abstract":"Summary form only given. In natural language research, many (grammar) trees were felled in 1992, to make room for the highly successful string-based HMM industry. A small literature survived on parsing (putting a tree on a string) and syntactic language modeling (putting a weight on a string). However, trees are making a comeback. Tree transformations are turning out to be very useful in large-scale machine translation (MT), and we will cover recent developments in this area. Most of the tree techniques used in MT turn out to be generic, leading to tools and software for manipulating tree automata in general. Tree acceptors and transducers generalize HMM techniques to the world of trees, raising many interesting theoretical and practical problems.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"60 1","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82633077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Information Extraction from speech 语音信息提取
J. Makhoul
{"title":"Information Extraction from speech","authors":"J. Makhoul","doi":"10.1109/SLT.2006.326780","DOIUrl":"https://doi.org/10.1109/SLT.2006.326780","url":null,"abstract":"Summary form only given. The state of the art in automatic speech recognition has reached the point that searching for and extracting information from large speech repositories or streaming audio has become a growing reality. This paper summarizes the technologies that have been instrumental in making audio as searchable as text, including speech recognition, speaker clustering, segmentation, and identification; topic classification; and story segmentation. Once speech is turned into text, information extraction methods can then be applied, such as named entity extraction, finding relationships between named entities, and resolution of anaphoric references. Examples of deployed systems for information extraction from speech, which incorporate some of the aforementioned technologies, will be given.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"38 1","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80964566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Widening the NLP Pipeline for spoken Language Processing 扩大口语语言处理的NLP管道
S. Bangalore
{"title":"Widening the NLP Pipeline for spoken Language Processing","authors":"S. Bangalore","doi":"10.1109/SLT.2006.326787","DOIUrl":"https://doi.org/10.1109/SLT.2006.326787","url":null,"abstract":"Summary form only given. A typical text-based natural language application (eg. machine translation, summarization, information extraction) consists of a pipeline of preprocessing steps such as tokenization, stemming, part-of-speech tagging, named entity detection, chunking, parsing. Information flows downstream through the preprocessing steps along a narrow pipe: each step transforms a single input string into a single best solution string. However, this narrow pipe is limiting for two reasons: First, since each of the preprocessing steps are erroneous, producing a single best solution could magnify the error propogation down the pipeline. Second, the preprocessing steps are forced to resolve genuine ambiguity prematurely. While the widening of the pipeline can potentially benefit text-based language applications, it is imperative for spoken language processing where the output from the speech recognizer is typically a word lattice/graph. In this talk, we review how such a goal has been accomplished in tasks such as spoken language understanding, speech translation and multimodal language processing. We will also sketch methods that encode the preprocessing steps as finite-state transductions in order to exploit composition of finite-state transducers as a general constraint propogation method.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"48 1","pages":"15"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85810428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-Based Methods for Language Processing and Information Retrieval 基于图的语言处理和信息检索方法
Dragomir R. Radev
{"title":"Graph-Based Methods for Language Processing and Information Retrieval","authors":"Dragomir R. Radev","doi":"10.1109/SLT.2006.326781","DOIUrl":"https://doi.org/10.1109/SLT.2006.326781","url":null,"abstract":"Summary form only given. A number of problems in information retrieval and natural language processing can be approached using graph theory. Some representative examples in IR include Brin and Page's Pagerank and Kleinberg's HITS for document ranking using graph-based random walk models. In NLP, one could mention Pang and Lee's work on sentiment analysis using graph min- cuts, Mihalcea's work on word sense disambiguation, Zhu et al.'s label propagation algorithms, Toutanova et al.'s prepositional attachment algorithm, and McDonald et al.'s dependency parsing algorithm using minimum spanning trees. In this talk I will quickly summarize three graph-based algorithms developed recently at the University of Michigan: (a) lexrank, a method for multidocument summarization based on random walks on lexical centrality graphs, (b) TUMBL, a generic method using bipartite graphs for semi-supervised learning, and (c) biased lexrank, a semi-supervised technique for passage ranking for information retrieval and discuss the applicability of such techniques to other problems in Natural Language Processing and Information Retrieval.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"6 1","pages":"4"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89351600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model Adaptation for Dialog Act Tagging 对话行为标注的模型自适应
Gökhan Tür, Ümit Güz, Dilek Z. Hakkani-Tür
{"title":"Model Adaptation for Dialog Act Tagging","authors":"Gökhan Tür, Ümit Güz, Dilek Z. Hakkani-Tür","doi":"10.1109/SLT.2006.326825","DOIUrl":"https://doi.org/10.1109/SLT.2006.326825","url":null,"abstract":"In this paper, we analyze the effect of model adaptation for dialog act tagging. The goal of adaptation is to improve the performance of the tagger using out-of-domain data or models. Dialog act tagging aims to provide a basis for further discourse analysis and understanding in conversational speech. In this study we used the ICSI meeting corpus with high-level meeting recognition dialog act (MRDA) tags, that is, question, statement, backchannel, disruptions, and floor grabbers/holders. We performed controlled adaptation experiments using the Switchboard (SWBD) corpus with SWBD-DAMSL tags as the out-of-domain corpus. Our results indicate that we can achieve significantly better dialog act tagging by automatically selecting a subset of the Switchboard corpus and combining the confidences obtained by both in-domain and out-of-domain models via logistic regression, especially when the in-domain data is limited.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"204 1","pages":"94-97"},"PeriodicalIF":0.0,"publicationDate":"2006-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77023227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信