ACM Trans. Speech Lang. Process.最新文献_第3页

Distributed speech translation technologies for multiparty multilingual communication 面向多方多语言交流的分布式语音翻译技术

ACM Trans. Speech Lang. Process. Pub Date : 2012-07-01 DOI: 10.1145/2287710.2287712

S. Sakti, Michael Paul, A. Finch, Xinhui Hu, Jinfu Ni, Noriyuki Kimura, Shigeki Matsuda, Chiori Hori, Yutaka Ashikari, H. Kawai, H. Kashioka, E. Sumita, Satoshi Nakamura

{"title":"Distributed speech translation technologies for multiparty multilingual communication","authors":"S. Sakti, Michael Paul, A. Finch, Xinhui Hu, Jinfu Ni, Noriyuki Kimura, Shigeki Matsuda, Chiori Hori, Yutaka Ashikari, H. Kawai, H. Kashioka, E. Sumita, Satoshi Nakamura","doi":"10.1145/2287710.2287712","DOIUrl":"https://doi.org/10.1145/2287710.2287712","url":null,"abstract":"Developing a multilingual speech translation system requires efforts in constructing automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS) components for all possible source and target languages. If the numerous ASR, MT, and TTS systems for different language pairs developed independently in different parts of the world could be connected, multilingual speech translation systems for a multitude of language pairs could be achieved. Yet, there is currently no common, flexible framework that can provide an entire speech translation process by bringing together heterogeneous speech translation components. In this article we therefore propose a distributed architecture framework for multilingual speech translation in which all speech translation components are provided on distributed servers and cooperate over a network. This framework can facilitate the connection of different components and functions. To show the overall mechanism, we first present our state-of-the-art technologies for multilingual ASR, MT, and TTS components, and then describe how to combine those systems into the proposed network-based framework. The client applications are implemented on a handheld mobile terminal device, and all data exchanges among client users and spoken language technology servers are managed through a Web protocol. To support multiparty communication, an additional communication server is provided for simultaneously distributing the speech translation results from one user to multiple users. Field testing shows that the system is capable of realizing multiparty multilingual speech translation for real-time and location-independent communication.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122063431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Perspective-oriented generation of football match summaries: Old tasks, new challenges 面向视角的足球比赛摘要生成:旧任务，新挑战

ACM Trans. Speech Lang. Process. Pub Date : 2012-07-01 DOI: 10.1145/2287710.2287711

Nadjet Bouayad-Agha, Gerard Casamayor, Simon Mille, L. Wanner

{"title":"Perspective-oriented generation of football match summaries: Old tasks, new challenges","authors":"Nadjet Bouayad-Agha, Gerard Casamayor, Simon Mille, L. Wanner","doi":"10.1145/2287710.2287711","DOIUrl":"https://doi.org/10.1145/2287710.2287711","url":null,"abstract":"Team sports commentaries call for techniques that are able to select content and generate wordings to reflect the affinity of the targeted reader for one of the teams. The existing works tend to have in common that they either start from knowledge sources of limited size to whose structures then different ways of realization are explicitly assigned, or they work directly with linguistic corpora, without the use of a deep knowledge source. With the increasing availability of large-scale ontologies this is no longer satisfactory: techniques are needed that are applicable to general purpose ontologies, but which still take user preferences into account. We take the best of both worlds in that we use a two-layer ontology. The first layer is composed of raw domain data modelled in an application-independent base OWL ontology. The second layer contains a rich perspective generation-motivated domain communication knowledge ontology, inferred from the base ontology. The two-layer ontology allows us to take into account user perspective-oriented criteria at different stages of generation to generate perspective-oriented commentaries. We show how content selection, discourse structuring, information structure determination, and lexicalization are driven by these criteria and how stage after stage a truly user perspective-tailored summary is generated. The viability of our proposal has been evaluated for the generation of football match summaries of the First Spanish Football League. The reported outcome of the evaluation demonstrates that we are on the right track.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129669148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

ACM Trans. Speech Lang. Process. Pub Date : 2012-05-01 DOI: 10.1145/2168748.2168750

Khaled Abdalgader, A. Skabar

{"title":"Unsupervised similarity-based word sense disambiguation using context vectors and sentential word importance","authors":"Khaled Abdalgader, A. Skabar","doi":"10.1145/2168748.2168750","DOIUrl":"https://doi.org/10.1145/2168748.2168750","url":null,"abstract":"The process of identifying the actual meanings of words in a given text fragment has a long history in the field of computational linguistics. Due to its importance in understanding the semantics of natural language, it is considered one of the most challenging problems facing this field. In this article we propose a new unsupervised similarity-based word sense disambiguation (WSD) algorithm that operates by computing the semantic similarity between glosses of the target word and a context vector. The sense of the target word is determined as that for which the similarity between gloss and context vector is greatest. Thus, whereas conventional unsupervised WSD methods are based on measuring pairwise similarity between words, our approach is based on measuring semantic similarity between sentences. This enables it to utilize a higher degree of semantic information, and is more consistent with the way that human beings disambiguate; that is, by considering the greater context in which the word appears. We also show how performance can be further improved by incorporating a preliminary step in which the relative importance of words within the original text fragment is estimated, thereby providing an ordering that can be used to determine the sequence in which words should be disambiguated. We provide empirical results that show that our method performs favorably against the state-of-the-art unsupervised word sense disambiguation methods, as evaluated on several benchmark datasets through different models of evaluation.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121181030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Optimizing the turn-taking behavior of task-oriented spoken dialog systems 优化任务型口语对话系统的轮询行为

ACM Trans. Speech Lang. Process. Pub Date : 2012-05-01 DOI: 10.1145/2168748.2168749

Antoine Raux, M. Eskénazi

引用次数: 48

Active learning with semi-automatic annotation for extractive speech summarization 基于半自动标注的主动学习提取语音摘要

ACM Trans. Speech Lang. Process. Pub Date : 2012-02-01 DOI: 10.1145/2093153.2093155

J. Zhang, Pascale Fung

引用次数: 11

Uncertainty-based active learning with instability estimation for text classification 基于不确定性和不稳定性估计的文本分类主动学习

ACM Trans. Speech Lang. Process. Pub Date : 2012-02-01 DOI: 10.1145/2093153.2093154

Jingbo Zhu, Matthew Y. Ma

{"title":"Uncertainty-based active learning with instability estimation for text classification","authors":"Jingbo Zhu, Matthew Y. Ma","doi":"10.1145/2093153.2093154","DOIUrl":"https://doi.org/10.1145/2093153.2093154","url":null,"abstract":"This article deals with pool-based active learning with uncertainty sampling. While existing uncertainty sampling methods emphasize selection of instances near the decision boundary to increase the likelihood of selecting informative examples, our position is that this heuristic is a surrogate for selecting examples for which the current learning algorithm iteration is likely to misclassify. To more directly model this intuition, this article augments such uncertainty sampling methods and proposes a simple instability-based selective sampling approach to improving uncertainty-based active learning, in which the instability degree of each unlabeled example is estimated during the learning process. Experiments on seven evaluation datasets show that instability-based sampling methods can achieve significant improvements over the traditional uncertainty sampling method. In terms of the average percentage of actively selected examples required for the learner to achieve 99% of its performance when training on the entire dataset, instability sampling and sampling by instability and density methods achieve better effectiveness in annotation cost reduction than random sampling and traditional entropy-based uncertainty sampling. Our experimental results have also shown that instability-based methods yield no significant improvement for active learning with SVMs when a popular sigmoidal function is used to transform SVM outputs to posterior probabilities.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134519868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Spatial role labeling: Towards extraction of spatial relations from natural language 空间角色标注:从自然语言中提取空间关系

ACM Trans. Speech Lang. Process. Pub Date : 2011-12-01 DOI: 10.1145/2050104.2050105

Parisa Kordjamshidi, M. V. Otterlo, Marie-Francine Moens

引用次数: 144

Semantic relations in bilingual lexicons 双语词汇中的语义关系

ACM Trans. Speech Lang. Process. Pub Date : 2011-11-01 DOI: 10.1145/2050100.2050102

Yves Peirsman, Sebastian Padó

{"title":"Semantic relations in bilingual lexicons","authors":"Yves Peirsman, Sebastian Padó","doi":"10.1145/2050100.2050102","DOIUrl":"https://doi.org/10.1145/2050100.2050102","url":null,"abstract":"Bilingual lexicons, essential to many NLP applications, can be constructed automatically on the basis of parallel or comparable corpora. In this article, we make two contributions to their induction from comparable corpora. The first one concerns the creation of these lexicons. We show that seed lexicons can be improved by adding a bootstrapping procedure that uses cross-lingual distributional similarity. The second contribution concerns the evaluation of bilingual lexicons. It is generally based on translation lexicons, which corresponds to the implicit assumption that (cross-lingual) synonymy is the semantic relation of primary interest, even though other semantic relations like (cross-lingual) hyponymy or cohyponymy make up a considerable portion of translation pair candidates proposed by distributional methods.\u0000 We argue that the focus on synonymy is an oversimplification and that many applications can profit from the inclusion of other semantic relations. We study what effect these semantic relations have on two cross-lingual tasks: the cross-lingual projection of polarity scores and the cross-lingual modeling of selectional preferences. We find that the presence of non-synonymous semantic relations may negatively affect the former of these tasks, but benefit the latter.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128690028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Speech retrieval from unsegmented finnish audio using statistical morpheme-like units for segmentation, recognition, and retrieval 使用统计语素类单位进行分割、识别和检索的未分割芬兰语音频语音检索

ACM Trans. Speech Lang. Process. Pub Date : 2011-10-01 DOI: 10.1145/2036916.2036917

V. Turunen, M. Kurimo

{"title":"Speech retrieval from unsegmented finnish audio using statistical morpheme-like units for segmentation, recognition, and retrieval","authors":"V. Turunen, M. Kurimo","doi":"10.1145/2036916.2036917","DOIUrl":"https://doi.org/10.1145/2036916.2036917","url":null,"abstract":"This article examines the use of statistically discovered morpheme-like units for Spoken Document Retrieval (SDR). The morpheme-like units (morphs) are used both for language modeling in speech recognition and as index terms. Traditional word-based methods suffer from out-of-vocabulary words. If a word is not in the recognizer vocabulary, any occurrence of the word in speech will be missing from the transcripts. The problem is especially severe for languages with a high number of distinct word forms such as Finnish. With the morph language model, even previously unseen words can be recognized by identifying its component morphs. Similarly in information retrieval queries, complex word forms, even unseen ones, can be matched to data after segmenting them to morphs. Retrieval performance can be further improved by expanding the transcripts with alternative recognition results from confusion networks. In this article, a novel retrieval evaluation corpus consisting of unsegmented Finnish radio programs, 25 queries and corresponding human relevance assessments was constructed. Previous results on using morphs and confusion networks for Finnish SDR are confirmed and extended to the unsegmented case. As previously, using morphs or base forms as index terms yields about equal performance but combination methods, including a new one, are found to work better than either alone. Using alternative morph segmentations of the query words is found to further improve the results. Lexical similarity-based story segmentation was applied and performance using morphs, base forms, and their combinations was compared for the first time.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125215904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario 儿童机器人交互场景中儿童语音关键字检测的串联解码

ACM Trans. Speech Lang. Process. Pub Date : 2011-08-01 DOI: 10.1145/1998384.1998386

M. Wöllmer, Björn Schuller, A. Batliner, S. Steidl, Dino Seppi

引用次数: 15