ACM Trans. Speech Lang. Process.最新文献_第6页

Summarization system evaluation revisited: N-gram graphs 回顾总结系统评价:n图

ACM Trans. Speech Lang. Process. Pub Date : 2008-10-01 DOI: 10.1145/1410358.1410359

George Giannakopoulos, V. Karkaletsis, G. Vouros, Panagiotis Stamatopoulos

引用次数: 165

A game-theoretic model of referential coherence and its empirical verification using large Japanese and English corpora 参照连贯的博弈论模型及其在大型日语和英语语料库中的实证验证

ACM Trans. Speech Lang. Process. Pub Date : 2008-10-01 DOI: 10.1145/1410358.1410360

Shun Shiramatsu, Kazunori Komatani, K. Hasida, T. Ogata, HIroshi G. Okuno

{"title":"A game-theoretic model of referential coherence and its empirical verification using large Japanese and English corpora","authors":"Shun Shiramatsu, Kazunori Komatani, K. Hasida, T. Ogata, HIroshi G. Okuno","doi":"10.1145/1410358.1410360","DOIUrl":"https://doi.org/10.1145/1410358.1410360","url":null,"abstract":"Referential coherence represents the smoothness of discourse resulting from topic continuity and pronominalization. Rational individuals prefer a referentially coherent structure of discourse when they select a language expression and its interpretation. This is a preference for cooperation in communication. By what principle do they share coherent expressions and interpretations? Centering theory is the standard theory of referential coherence [Grosz et al. 1995]. Although it is well designed on the bases of first-order inference rules [Joshi and Kuhn 1979], it does not embody a behavioral principle for the cooperation evident in communication. Hasida [1996] proposed a game-theoretic hypothesis in relation to this issue. We aim to empirically verify Hasida's hypothesis by using corpora of multiple languages. We statistically design language-dependent parameters by using a corpus of the target language. This statistical design enables us to objectively absorb language-specific differences and to verify the universality of Hasida's hypothesis by using corpora. We empirically verified our model by using large Japanese and English corpora. The result proves the language universality of the hypothesis.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132047614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Extrinsic summarization evaluation: A decision audit task 外部总结评价:一项决策审计任务

ACM Trans. Speech Lang. Process. Pub Date : 2008-09-08 DOI: 10.1145/1596517.1596518

Gabriel Murray, Thomas Kleinbauer, P. Poller, Tilman Becker, S. Renals, J. Kilgour

{"title":"Extrinsic summarization evaluation: A decision audit task","authors":"Gabriel Murray, Thomas Kleinbauer, P. Poller, Tilman Becker, S. Renals, J. Kilgour","doi":"10.1145/1596517.1596518","DOIUrl":"https://doi.org/10.1145/1596517.1596518","url":null,"abstract":"In this work we describe a large-scale extrinsic evaluation of automatic speech summarization technologies for meeting speech. The particular task is a decision audit, wherein a user must satisfy a complex information need, navigating several meetings in order to gain an understanding of how and why a given decision was made. We compare the usefulness of extractive and abstractive technologies in satisfying this information need, and assess the impact of automatic speech recognition (ASR) errors on user performance. We employ several evaluation methods for participant performance, including post-questionnaire data, human subjective and objective judgments, and a detailed analysis of participant browsing behavior. We find that while ASR errors affect user satisfaction on an information retrieval task, users can adapt their browsing behavior to complete the task satisfactorily. Results also indicate that users consider extractive summaries to be intuitive and useful tools for browsing multimodal meeting data. We discuss areas in which automatic summarization techniques can be improved in comparison with gold-standard meeting abstracts.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121101014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Chinese word segmentation and statistical machine translation 中文分词与统计机器翻译

ACM Trans. Speech Lang. Process. Pub Date : 2008-05-01 DOI: 10.1145/1363108.1363109

Ruiqiang Zhang, K. Yasuda, E. Sumita

{"title":"Chinese word segmentation and statistical machine translation","authors":"Ruiqiang Zhang, K. Yasuda, E. Sumita","doi":"10.1145/1363108.1363109","DOIUrl":"https://doi.org/10.1145/1363108.1363109","url":null,"abstract":"Chinese word segmentation (CWS) is a necessary step in Chinese-English statistical machine translation (SMT) and its performance has an impact on the results of SMT. However, there are many choices involved in creating a CWS system such as various specifications and CWS methods. The choices made will create a new CWS scheme, but whether it will produce a superior or inferior translation has remained unknown to date. This article examines the relationship between CWS and SMT. The effects of CWS on SMT were investigated using different specifications and CWS methods. Four specifications were selected for investigation: Beijing University (PKU), Hong Kong City University (CITYU), Microsoft Research (MSR), and Academia SINICA (AS). We created 16 CWS schemes under different settings to examine the relationship between CWS and SMT. Our experimental results showed that the MSR's specifications produced the lowest quality translations. In examining the effects of CWS methods, we tested dictionary-based and CRF-based approaches and found there was no significant difference between the two in the quality of the resulting translations. We also found the correlation between the CWS F-score and SMT BLEU score was very weak. We analyzed CWS errors and their effect on SMT by evaluating systems trained with and without these errors. This article also proposes two methods for combining advantages of different specifications: a simple concatenation of training data and a feature interpolation approach in which the same types of features of translation models from various CWS schemes are linearly interpolated. We found these approaches were very effective in improving the quality of translations.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"220 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130483616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Web resources for language modeling in conversational speech recognition 会话语音识别中语言建模的Web资源

ACM Trans. Speech Lang. Process. Pub Date : 2007-12-01 DOI: 10.1145/1322391.1322392

I. Bulyko, Mari Ostendorf, M. Siu, Tim Ng, A. Stolcke, Ö. Çetin

引用次数: 72

Morph-based speech recognition and modeling of out-of-vocabulary words across languages 基于词形的语音识别及跨语言词汇外词建模

ACM Trans. Speech Lang. Process. Pub Date : 2007-12-01 DOI: 10.1145/1322391.1322394

Mathias Creutz, Teemu Hirsimäki, M. Kurimo, Antti Puurula, Janne Pylkkönen, Vesa Siivola, Matti Varjokallio, E. Arisoy, M. Saraçlar, A. Stolcke

引用次数: 164

Relation extraction and the influence of automatic named-entity recognition 关系提取及其对自动命名实体识别的影响

ACM Trans. Speech Lang. Process. Pub Date : 2007-12-01 DOI: 10.1145/1322391.1322393

C. Giuliano, A. Lavelli, Lorenza Romano

{"title":"Relation extraction and the influence of automatic named-entity recognition","authors":"C. Giuliano, A. Lavelli, Lorenza Romano","doi":"10.1145/1322391.1322393","DOIUrl":"https://doi.org/10.1145/1322391.1322393","url":null,"abstract":"We present an approach for extracting relations between named entities from natural language documents. The approach is based solely on shallow linguistic processing, such as tokenization, sentence splitting, part-of-speech tagging, and lemmatization. It uses a combination of kernel functions to integrate two different information sources: (i) the whole sentence where the relation appears, and (ii) the local contexts around the interacting entities. We present the results of experiments on extracting five different types of relations from a dataset of newswire documents and show that each information source provides a useful contribution to the recognition task. Usually the combined kernel significantly increases the precision with respect to the basic kernels, sometimes at the cost of a slightly lower recall. Moreover, we performed a set of experiments to assess the influence of the accuracy of named-entity recognition on the performance of the relation-extraction algorithm. Such experiments were performed using both the correct named entities (i.e., those manually annotated in the corpus) and the noisy named entities (i.e., those produced by a machine learning-based named-entity recognizer). The results show that our approach significantly improves the previous results obtained on the same dataset.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130412983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 52

An unsupervised method for learning generation dictionaries for spoken dialogue systems by mining user reviews 一种通过挖掘用户评论学习口语对话系统生成词典的无监督方法

ACM Trans. Speech Lang. Process. Pub Date : 2007-10-01 DOI: 10.1145/1289600.1289601

Ryuichiro Higashinaka, M. Walker, R. Prasad

引用次数: 14

Adaptive text correction with Web-crawled domain-dependent dictionaries 自适应文本校正与web抓取域相关的字典

ACM Trans. Speech Lang. Process. Pub Date : 2007-10-01 DOI: 10.1145/1289600.1289602

Christoph Ringlstetter, K. Schulz, S. Mihov

{"title":"Adaptive text correction with Web-crawled domain-dependent dictionaries","authors":"Christoph Ringlstetter, K. Schulz, S. Mihov","doi":"10.1145/1289600.1289602","DOIUrl":"https://doi.org/10.1145/1289600.1289602","url":null,"abstract":"For the success of lexical text correction, high coverage of the underlying background dictionary is crucial. Still, most correction tools are built on top of static dictionaries that represent fixed collections of expressions of a given language. When treating texts from specific domains and areas, often a significant part of the vocabulary is missed. In this situation, both automated and interactive correction systems produce suboptimal results. In this article, we describe strategies for crawling Web pages that fit the thematic domain of the given input text. Special filtering techniques are introduced to avoid pages with many orthographic errors. Collecting the vocabulary of filtered pages that meet the vocabulary of the input text, dynamic dictionaries of modest size are obtained that reach excellent coverage values. A tool has been developed that automatically crawls dictionaries in the indicated way. Our correction experiments with crawled dictionaries, which address English and German document collections from a variety of thematic fields, show that with these dictionaries even the error rate of highly accurate texts can be reduced, using completely automated correction methods. For interactive text correction, more sensible candidate sets for correcting erroneous words are obtained and the manual effort is reduced in a significant way. To complete this picture, we study the effect when using word trigram models for correction. Again, trigram models from crawled corpora outperform those obtained from static corpora.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117264590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

A block bigram prediction model for statistical machine translation 统计机器翻译的块图预测模型

ACM Trans. Speech Lang. Process. Pub Date : 2007-07-01 DOI: 10.1145/1255171.1255172

C. Tillmann, Tong Zhang

引用次数: 12