基于短文本语义相似度算法生成方法的基准数据集

ACM Trans. Speech Lang. Process. Pub Date : 2013-12-01 DOI:10.1145/2537046

J. O'Shea, Z. Bandar, Keeley A. Crockett

{"title":"基于短文本语义相似度算法生成方法的基准数据集","authors":"J. O'Shea, Z. Bandar, Keeley A. Crockett","doi":"10.1145/2537046","DOIUrl":null,"url":null,"abstract":"This research presents a new benchmark dataset for evaluating Short Text Semantic Similarity (STSS) measurement algorithms and the methodology used for its creation. The power of the dataset is evaluated by using it to compare two established algorithms, STASIS and Latent Semantic Analysis. This dataset focuses on measures for use in Conversational Agents; other potential applications include email processing and data mining of social networks. Such applications involve integrating the STSS algorithm in a complex system, but STSS algorithms must be evaluated in their own right and compared with others for their effectiveness before systems integration. Semantic similarity is an artifact of human perception; therefore its evaluation is inherently empirical and requires benchmark datasets derived from human similarity ratings. The new dataset of 64 sentence pairs, STSS-131, has been designed to meet these requirements drawing on a range of resources from traditional grammar to cognitive neuroscience. The human ratings are obtained from a set of trials using new and improved experimental methods, with validated measures and statistics. The results illustrate the increased challenge and the potential longevity of the STSS-131 dataset as the Gold Standard for future STSS algorithm evaluation.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"A new benchmark dataset with production methodology for short text semantic similarity algorithms\",\"authors\":\"J. O'Shea, Z. Bandar, Keeley A. Crockett\",\"doi\":\"10.1145/2537046\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research presents a new benchmark dataset for evaluating Short Text Semantic Similarity (STSS) measurement algorithms and the methodology used for its creation. The power of the dataset is evaluated by using it to compare two established algorithms, STASIS and Latent Semantic Analysis. This dataset focuses on measures for use in Conversational Agents; other potential applications include email processing and data mining of social networks. Such applications involve integrating the STSS algorithm in a complex system, but STSS algorithms must be evaluated in their own right and compared with others for their effectiveness before systems integration. Semantic similarity is an artifact of human perception; therefore its evaluation is inherently empirical and requires benchmark datasets derived from human similarity ratings. The new dataset of 64 sentence pairs, STSS-131, has been designed to meet these requirements drawing on a range of resources from traditional grammar to cognitive neuroscience. The human ratings are obtained from a set of trials using new and improved experimental methods, with validated measures and statistics. The results illustrate the increased challenge and the potential longevity of the STSS-131 dataset as the Gold Standard for future STSS algorithm evaluation.\",\"PeriodicalId\":412532,\"journal\":{\"name\":\"ACM Trans. Speech Lang. Process.\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Trans. Speech Lang. Process.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2537046\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Speech Lang. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2537046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

摘要

本研究提出了一个新的基准数据集，用于评估短文本语义相似度(STSS)测量算法及其创建方法。通过比较两种已建立的算法(STASIS和Latent Semantic Analysis)来评估数据集的能力。该数据集侧重于会话代理中使用的度量;其他潜在的应用包括电子邮件处理和社交网络的数据挖掘。这些应用涉及到将STSS算法集成到一个复杂的系统中，但是在系统集成之前，必须对STSS算法本身进行评估，并与其他算法进行比较，以确定其有效性。语义相似度是人类感知的人工产物;因此，它的评估本质上是经验性的，需要从人类相似性评级中获得的基准数据集。新数据集STSS-131包含64个句子对，旨在利用从传统语法到认知神经科学的一系列资源来满足这些要求。人类评级是通过一系列试验获得的，这些试验采用了新的和改进的实验方法，并采用了经过验证的测量和统计数据。研究结果表明，作为未来STSS算法评估的黄金标准，STSS-131数据集面临的挑战越来越大，其潜在寿命也越来越长。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A new benchmark dataset with production methodology for short text semantic similarity algorithms

This research presents a new benchmark dataset for evaluating Short Text Semantic Similarity (STSS) measurement algorithms and the methodology used for its creation. The power of the dataset is evaluated by using it to compare two established algorithms, STASIS and Latent Semantic Analysis. This dataset focuses on measures for use in Conversational Agents; other potential applications include email processing and data mining of social networks. Such applications involve integrating the STSS algorithm in a complex system, but STSS algorithms must be evaluated in their own right and compared with others for their effectiveness before systems integration. Semantic similarity is an artifact of human perception; therefore its evaluation is inherently empirical and requires benchmark datasets derived from human similarity ratings. The new dataset of 64 sentence pairs, STSS-131, has been designed to meet these requirements drawing on a range of resources from traditional grammar to cognitive neuroscience. The human ratings are obtained from a set of trials using new and improved experimental methods, with validated measures and statistics. The results illustrate the increased challenge and the potential longevity of the STSS-131 dataset as the Gold Standard for future STSS algorithm evaluation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Trans. Speech Lang. Process.

自引率

0.00%

发文量