ACM Trans. Speech Lang. Process.最新文献

A new benchmark dataset with production methodology for short text semantic similarity algorithms 基于短文本语义相似度算法生成方法的基准数据集

ACM Trans. Speech Lang. Process. Pub Date : 2013-12-01 DOI: 10.1145/2537046

J. O'Shea, Z. Bandar, Keeley A. Crockett

{"title":"A new benchmark dataset with production methodology for short text semantic similarity algorithms","authors":"J. O'Shea, Z. Bandar, Keeley A. Crockett","doi":"10.1145/2537046","DOIUrl":"https://doi.org/10.1145/2537046","url":null,"abstract":"This research presents a new benchmark dataset for evaluating Short Text Semantic Similarity (STSS) measurement algorithms and the methodology used for its creation. The power of the dataset is evaluated by using it to compare two established algorithms, STASIS and Latent Semantic Analysis. This dataset focuses on measures for use in Conversational Agents; other potential applications include email processing and data mining of social networks. Such applications involve integrating the STSS algorithm in a complex system, but STSS algorithms must be evaluated in their own right and compared with others for their effectiveness before systems integration. Semantic similarity is an artifact of human perception; therefore its evaluation is inherently empirical and requires benchmark datasets derived from human similarity ratings. The new dataset of 64 sentence pairs, STSS-131, has been designed to meet these requirements drawing on a range of resources from traditional grammar to cognitive neuroscience. The human ratings are obtained from a set of trials using new and improved experimental methods, with validated measures and statistics. The results illustrate the increased challenge and the potential longevity of the STSS-131 dataset as the Gold Standard for future STSS algorithm evaluation.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124685172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Combining co-clustering with noise detection for theme-based summarization 结合共聚类和噪声检测进行主题摘要

ACM Trans. Speech Lang. Process. Pub Date : 2013-12-01 DOI: 10.1145/2513563

Xiaoyan Cai, Wenjie Li, Renxian Zhang

引用次数: 9

Lattice BLEU oracles in machine translation 莱迪思BLEU预言机翻译

ACM Trans. Speech Lang. Process. Pub Date : 2013-12-01 DOI: 10.1145/2513147

Artem Sokolov, Guillaume Wisniewski, François Yvon

引用次数: 9

Composition of semantic relations: Theoretical framework and case study 语义关系的构成:理论框架与案例研究

ACM Trans. Speech Lang. Process. Pub Date : 2013-12-01 DOI: 10.1145/2513146

Eduardo Blanco, D. Moldovan

引用次数: 10

Learning to control listening-oriented dialogue using partially observable markov decision processes 学习使用部分可观察的马尔可夫决策过程控制以听力为导向的对话

ACM Trans. Speech Lang. Process. Pub Date : 2013-12-01 DOI: 10.1145/2513145

Toyomi Meguro, Yasuhiro Minami, Ryuichiro Higashinaka, Kohji Dohsaka

引用次数: 21

Cognitive canonicalization of natural language queries using semantic strata 基于语义层的自然语言查询认知规范化

ACM Trans. Speech Lang. Process. Pub Date : 2013-12-01 DOI: 10.1145/2539053

S. Roy, W. Zeng

{"title":"Cognitive canonicalization of natural language queries using semantic strata","authors":"S. Roy, W. Zeng","doi":"10.1145/2539053","DOIUrl":"https://doi.org/10.1145/2539053","url":null,"abstract":"Natural language search relies strongly on perceiving semantics in a query sentence. Semantics is captured by the relationship among the query words, represented as a network (graph). Such a network of words can be fed into larger ontologies, like DBpedia or Google Knowledge Graph, where they appear as subgraphs— fashioning the name subnetworks (subnets). Thus, subnet is a canonical form for interfacing a natural language query to a graph database and is an integral step for graph-based searching. In this article, we present a novel standalone NLP technique that leverages the cognitive psychology notion of semantic strata for semantic subnetwork extraction from natural language queries. The cognitive model describes some of the fundamental structures employed by the human cognition to construct semantic information in the brain, called semantic strata. We propose a computational model based on conditional random fields to capture the cognitive abstraction provided by semantic strata, facilitating cognitive canonicalization of the query. Our results, conducted on approximately 5000 queries, suggest that the cognitive canonicals based on semantic strata are capable of significantly improving parsing and role labeling performance beyond pure lexical approaches, such as parts-of-speech based techniques. We also find that cognitive canonicalized subnets are more semantically coherent compared to syntax trees when explored in graph ontologies like DBpedia and improve ranking of retrieved documents.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128651304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Semantic interpretation of noun compounds using verbal and other paraphrases 名词复合词的语义解释，用动词或其他释义

ACM Trans. Speech Lang. Process. Pub Date : 2013-07-01 DOI: 10.1145/2483969.2483975

Preslav Nakov, Marti A. Hearst

引用次数: 46

Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields 结合词格和条件随机场的复合识别和PCFG-LA解析

ACM Trans. Speech Lang. Process. Pub Date : 2013-07-01 DOI: 10.1145/2483969.2483970

M. Constant, Joseph Le Roux, Anthony Sigogne

引用次数: 23

On collocations and topic models 关于搭配和主题模型

ACM Trans. Speech Lang. Process. Pub Date : 2013-07-01 DOI: 10.1145/2483969.2483972

Jey Han Lau, Timothy Baldwin, D. Newman

{"title":"On collocations and topic models","authors":"Jey Han Lau, Timothy Baldwin, D. Newman","doi":"10.1145/2483969.2483972","DOIUrl":"https://doi.org/10.1145/2483969.2483972","url":null,"abstract":"We investigate the impact of preextracting and tokenizing bigram collocations on topic models. Using extensive experiments on four different corpora, we show that incorporating bigram collocations in the document representation creates more parsimonious models and improves topic coherence. We point out some problems in interpreting test likelihood and test perplexity to compare model fit, and suggest an alternate measure that penalizes model complexity. We show how the Akaike information criterion is a more appropriate measure, which suggests that using a modest number (up to 1000) of top-ranked bigrams is the optimal topic modelling configuration. Using these 1000 bigrams also results in improved topic quality over unigram tokenization. Further increases in topic quality can be achieved by using up to 10,000 bigrams, but this is at the cost of a more complex model. We also show that multiword (bigram and longer) named entities give consistent results, indicating that they should be represented as single tokens. This is the first work to explicitly study the effect of n-gram tokenization on LDA topic models, and the first work to make empirical recommendations to topic modelling practitioners, challenging the standard practice of unigram-based tokenization.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"343 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134209051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 63

Sentiment profiles of multiword expressions in test-taker essays: The case of noun-noun compounds 考生作文中多词表达的情感特征:名词-名词复合词的情况

ACM Trans. Speech Lang. Process. Pub Date : 2013-07-01 DOI: 10.1145/2483969.2483974

Beata Beigman Klebanov, J. Burstein, Nitin Madnani

引用次数: 14