ACM Trans. Speech Lang. Process.最新文献

筛选
英文 中文
Distributed speech translation technologies for multiparty multilingual communication 面向多方多语言交流的分布式语音翻译技术
ACM Trans. Speech Lang. Process. Pub Date : 2012-07-01 DOI: 10.1145/2287710.2287712
S. Sakti, Michael Paul, A. Finch, Xinhui Hu, Jinfu Ni, Noriyuki Kimura, Shigeki Matsuda, Chiori Hori, Yutaka Ashikari, H. Kawai, H. Kashioka, E. Sumita, Satoshi Nakamura
{"title":"Distributed speech translation technologies for multiparty multilingual communication","authors":"S. Sakti, Michael Paul, A. Finch, Xinhui Hu, Jinfu Ni, Noriyuki Kimura, Shigeki Matsuda, Chiori Hori, Yutaka Ashikari, H. Kawai, H. Kashioka, E. Sumita, Satoshi Nakamura","doi":"10.1145/2287710.2287712","DOIUrl":"https://doi.org/10.1145/2287710.2287712","url":null,"abstract":"Developing a multilingual speech translation system requires efforts in constructing automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS) components for all possible source and target languages. If the numerous ASR, MT, and TTS systems for different language pairs developed independently in different parts of the world could be connected, multilingual speech translation systems for a multitude of language pairs could be achieved. Yet, there is currently no common, flexible framework that can provide an entire speech translation process by bringing together heterogeneous speech translation components. In this article we therefore propose a distributed architecture framework for multilingual speech translation in which all speech translation components are provided on distributed servers and cooperate over a network. This framework can facilitate the connection of different components and functions. To show the overall mechanism, we first present our state-of-the-art technologies for multilingual ASR, MT, and TTS components, and then describe how to combine those systems into the proposed network-based framework. The client applications are implemented on a handheld mobile terminal device, and all data exchanges among client users and spoken language technology servers are managed through a Web protocol. To support multiparty communication, an additional communication server is provided for simultaneously distributing the speech translation results from one user to multiple users. Field testing shows that the system is capable of realizing multiparty multilingual speech translation for real-time and location-independent communication.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122063431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Perspective-oriented generation of football match summaries: Old tasks, new challenges 面向视角的足球比赛摘要生成:旧任务,新挑战
ACM Trans. Speech Lang. Process. Pub Date : 2012-07-01 DOI: 10.1145/2287710.2287711
Nadjet Bouayad-Agha, Gerard Casamayor, Simon Mille, L. Wanner
{"title":"Perspective-oriented generation of football match summaries: Old tasks, new challenges","authors":"Nadjet Bouayad-Agha, Gerard Casamayor, Simon Mille, L. Wanner","doi":"10.1145/2287710.2287711","DOIUrl":"https://doi.org/10.1145/2287710.2287711","url":null,"abstract":"Team sports commentaries call for techniques that are able to select content and generate wordings to reflect the affinity of the targeted reader for one of the teams. The existing works tend to have in common that they either start from knowledge sources of limited size to whose structures then different ways of realization are explicitly assigned, or they work directly with linguistic corpora, without the use of a deep knowledge source. With the increasing availability of large-scale ontologies this is no longer satisfactory: techniques are needed that are applicable to general purpose ontologies, but which still take user preferences into account. We take the best of both worlds in that we use a two-layer ontology. The first layer is composed of raw domain data modelled in an application-independent base OWL ontology. The second layer contains a rich perspective generation-motivated domain communication knowledge ontology, inferred from the base ontology. The two-layer ontology allows us to take into account user perspective-oriented criteria at different stages of generation to generate perspective-oriented commentaries. We show how content selection, discourse structuring, information structure determination, and lexicalization are driven by these criteria and how stage after stage a truly user perspective-tailored summary is generated. The viability of our proposal has been evaluated for the generation of football match summaries of the First Spanish Football League. The reported outcome of the evaluation demonstrates that we are on the right track.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129669148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Unsupervised similarity-based word sense disambiguation using context vectors and sentential word importance 基于上下文向量和句子词重要性的无监督相似度词义消歧
ACM Trans. Speech Lang. Process. Pub Date : 2012-05-01 DOI: 10.1145/2168748.2168750
Khaled Abdalgader, A. Skabar
{"title":"Unsupervised similarity-based word sense disambiguation using context vectors and sentential word importance","authors":"Khaled Abdalgader, A. Skabar","doi":"10.1145/2168748.2168750","DOIUrl":"https://doi.org/10.1145/2168748.2168750","url":null,"abstract":"The process of identifying the actual meanings of words in a given text fragment has a long history in the field of computational linguistics. Due to its importance in understanding the semantics of natural language, it is considered one of the most challenging problems facing this field. In this article we propose a new unsupervised similarity-based word sense disambiguation (WSD) algorithm that operates by computing the semantic similarity between glosses of the target word and a context vector. The sense of the target word is determined as that for which the similarity between gloss and context vector is greatest. Thus, whereas conventional unsupervised WSD methods are based on measuring pairwise similarity between words, our approach is based on measuring semantic similarity between sentences. This enables it to utilize a higher degree of semantic information, and is more consistent with the way that human beings disambiguate; that is, by considering the greater context in which the word appears. We also show how performance can be further improved by incorporating a preliminary step in which the relative importance of words within the original text fragment is estimated, thereby providing an ordering that can be used to determine the sequence in which words should be disambiguated. We provide empirical results that show that our method performs favorably against the state-of-the-art unsupervised word sense disambiguation methods, as evaluated on several benchmark datasets through different models of evaluation.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121181030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Optimizing the turn-taking behavior of task-oriented spoken dialog systems 优化任务型口语对话系统的轮询行为
ACM Trans. Speech Lang. Process. Pub Date : 2012-05-01 DOI: 10.1145/2168748.2168749
Antoine Raux, M. Eskénazi
{"title":"Optimizing the turn-taking behavior of task-oriented spoken dialog systems","authors":"Antoine Raux, M. Eskénazi","doi":"10.1145/2168748.2168749","DOIUrl":"https://doi.org/10.1145/2168748.2168749","url":null,"abstract":"Even as progress in speech technologies and task and dialog modeling has allowed the development of advanced spoken dialog systems, the low-level interaction behavior of those systems often remains rigid and inefficient. Based on an analysis of human-human and human-computer turn-taking in naturally occurring task-oriented dialogs, we define a set of features that can be automatically extracted and show that they can be used to inform efficient end-of-turn detection. We then frame turn-taking as decision making under uncertainty and describe the Finite-State Turn-Taking Machine (FSTTM), a decision-theoretic model that combines data-driven machine learning methods and a cost structure derived from Conversation Analysis to control the turn-taking behavior of dialog systems. Evaluation results on CMU Let's Go, a publicly deployed bus information system, confirm that the FSTTM significantly improves the responsiveness of the system compared to a standard threshold-based approach, as well as previous data-driven methods.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125006621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Active learning with semi-automatic annotation for extractive speech summarization 基于半自动标注的主动学习提取语音摘要
ACM Trans. Speech Lang. Process. Pub Date : 2012-02-01 DOI: 10.1145/2093153.2093155
J. Zhang, Pascale Fung
{"title":"Active learning with semi-automatic annotation for extractive speech summarization","authors":"J. Zhang, Pascale Fung","doi":"10.1145/2093153.2093155","DOIUrl":"https://doi.org/10.1145/2093153.2093155","url":null,"abstract":"We propose using active learning for extractive speech summarization in order to reduce human effort in generating reference summaries. Active learning chooses a selective set of samples to be labeled. We propose a combination of informativeness and representativeness criteria for selection. We further propose a semi-automatic method to generate reference summaries for presentation speech by using Relaxed Dynamic Time Warping (RDTW) alignment between presentation speech and its accompanied slides. Our summarization results show that the amount of labeled data needed for a given summarization accuracy can be reduced by more than 23% compared to random sampling.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125990687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Uncertainty-based active learning with instability estimation for text classification 基于不确定性和不稳定性估计的文本分类主动学习
ACM Trans. Speech Lang. Process. Pub Date : 2012-02-01 DOI: 10.1145/2093153.2093154
Jingbo Zhu, Matthew Y. Ma
{"title":"Uncertainty-based active learning with instability estimation for text classification","authors":"Jingbo Zhu, Matthew Y. Ma","doi":"10.1145/2093153.2093154","DOIUrl":"https://doi.org/10.1145/2093153.2093154","url":null,"abstract":"This article deals with pool-based active learning with uncertainty sampling. While existing uncertainty sampling methods emphasize selection of instances near the decision boundary to increase the likelihood of selecting informative examples, our position is that this heuristic is a surrogate for selecting examples for which the current learning algorithm iteration is likely to misclassify. To more directly model this intuition, this article augments such uncertainty sampling methods and proposes a simple instability-based selective sampling approach to improving uncertainty-based active learning, in which the instability degree of each unlabeled example is estimated during the learning process. Experiments on seven evaluation datasets show that instability-based sampling methods can achieve significant improvements over the traditional uncertainty sampling method. In terms of the average percentage of actively selected examples required for the learner to achieve 99% of its performance when training on the entire dataset, instability sampling and sampling by instability and density methods achieve better effectiveness in annotation cost reduction than random sampling and traditional entropy-based uncertainty sampling. Our experimental results have also shown that instability-based methods yield no significant improvement for active learning with SVMs when a popular sigmoidal function is used to transform SVM outputs to posterior probabilities.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134519868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Spatial role labeling: Towards extraction of spatial relations from natural language 空间角色标注:从自然语言中提取空间关系
ACM Trans. Speech Lang. Process. Pub Date : 2011-12-01 DOI: 10.1145/2050104.2050105
Parisa Kordjamshidi, M. V. Otterlo, Marie-Francine Moens
{"title":"Spatial role labeling: Towards extraction of spatial relations from natural language","authors":"Parisa Kordjamshidi, M. V. Otterlo, Marie-Francine Moens","doi":"10.1145/2050104.2050105","DOIUrl":"https://doi.org/10.1145/2050104.2050105","url":null,"abstract":"This article reports on the novel task of spatial role labeling in natural language text. It proposes machine learning methods to extract spatial roles and their relations. This work experiments with both a step-wise approach, where spatial prepositions are found and the related trajectors, and landmarks are then extracted, and a joint learning approach, where a spatial relation and its composing indicator, trajector, and landmark are classified collectively. Context-dependent learning techniques, such as a skip-chain conditional random field, yield good results on the GUM-evaluation (Maptask) data and CLEF-IAPR TC-12 Image Benchmark. An extensive error analysis, including feature assessment, and a cross-domain evaluation pinpoint the main bottlenecks and avenues for future research.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123239899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 144
Semantic relations in bilingual lexicons 双语词汇中的语义关系
ACM Trans. Speech Lang. Process. Pub Date : 2011-11-01 DOI: 10.1145/2050100.2050102
Yves Peirsman, Sebastian Padó
{"title":"Semantic relations in bilingual lexicons","authors":"Yves Peirsman, Sebastian Padó","doi":"10.1145/2050100.2050102","DOIUrl":"https://doi.org/10.1145/2050100.2050102","url":null,"abstract":"Bilingual lexicons, essential to many NLP applications, can be constructed automatically on the basis of parallel or comparable corpora. In this article, we make two contributions to their induction from comparable corpora. The first one concerns the creation of these lexicons. We show that seed lexicons can be improved by adding a bootstrapping procedure that uses cross-lingual distributional similarity. The second contribution concerns the evaluation of bilingual lexicons. It is generally based on translation lexicons, which corresponds to the implicit assumption that (cross-lingual) synonymy is the semantic relation of primary interest, even though other semantic relations like (cross-lingual) hyponymy or cohyponymy make up a considerable portion of translation pair candidates proposed by distributional methods.\u0000 We argue that the focus on synonymy is an oversimplification and that many applications can profit from the inclusion of other semantic relations. We study what effect these semantic relations have on two cross-lingual tasks: the cross-lingual projection of polarity scores and the cross-lingual modeling of selectional preferences. We find that the presence of non-synonymous semantic relations may negatively affect the former of these tasks, but benefit the latter.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128690028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Speech retrieval from unsegmented finnish audio using statistical morpheme-like units for segmentation, recognition, and retrieval 使用统计语素类单位进行分割、识别和检索的未分割芬兰语音频语音检索
ACM Trans. Speech Lang. Process. Pub Date : 2011-10-01 DOI: 10.1145/2036916.2036917
V. Turunen, M. Kurimo
{"title":"Speech retrieval from unsegmented finnish audio using statistical morpheme-like units for segmentation, recognition, and retrieval","authors":"V. Turunen, M. Kurimo","doi":"10.1145/2036916.2036917","DOIUrl":"https://doi.org/10.1145/2036916.2036917","url":null,"abstract":"This article examines the use of statistically discovered morpheme-like units for Spoken Document Retrieval (SDR). The morpheme-like units (morphs) are used both for language modeling in speech recognition and as index terms. Traditional word-based methods suffer from out-of-vocabulary words. If a word is not in the recognizer vocabulary, any occurrence of the word in speech will be missing from the transcripts. The problem is especially severe for languages with a high number of distinct word forms such as Finnish. With the morph language model, even previously unseen words can be recognized by identifying its component morphs. Similarly in information retrieval queries, complex word forms, even unseen ones, can be matched to data after segmenting them to morphs. Retrieval performance can be further improved by expanding the transcripts with alternative recognition results from confusion networks. In this article, a novel retrieval evaluation corpus consisting of unsegmented Finnish radio programs, 25 queries and corresponding human relevance assessments was constructed. Previous results on using morphs and confusion networks for Finnish SDR are confirmed and extended to the unsegmented case. As previously, using morphs or base forms as index terms yields about equal performance but combination methods, including a new one, are found to work better than either alone. Using alternative morph segmentations of the query words is found to further improve the results. Lexical similarity-based story segmentation was applied and performance using morphs, base forms, and their combinations was compared for the first time.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125215904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario 儿童机器人交互场景中儿童语音关键字检测的串联解码
ACM Trans. Speech Lang. Process. Pub Date : 2011-08-01 DOI: 10.1145/1998384.1998386
M. Wöllmer, Björn Schuller, A. Batliner, S. Steidl, Dino Seppi
{"title":"Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario","authors":"M. Wöllmer, Björn Schuller, A. Batliner, S. Steidl, Dino Seppi","doi":"10.1145/1998384.1998386","DOIUrl":"https://doi.org/10.1145/1998384.1998386","url":null,"abstract":"In this article, we focus on keyword detection in children's speech as it is needed in voice command systems. We use the FAU Aibo Emotion Corpus which contains emotionally colored spontaneous children's speech recorded in a child-robot interaction scenario and investigate various recent keyword spotting techniques. As the principle of bidirectional Long Short-Term Memory (BLSTM) is known to be well-suited for context-sensitive phoneme prediction, we incorporate a BLSTM network into a Tandem model for flexible coarticulation modeling in children's speech. Our experiments reveal that the Tandem model prevails over a triphone-based Hidden Markov Model approach.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115158451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信