语义文本相似度中的人类集体意见

IF 4.2 1区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Transactions of the Association for Computational Linguistics Pub Date : 2023-08-01 DOI:10.1162/tacl_a_00584

Yuxia Wang, Shimin Tao, Ning Xie, Hao Yang, Timothy Baldwin, K. Verspoor

{"title":"语义文本相似度中的人类集体意见","authors":"Yuxia Wang, Shimin Tao, Ning Xie, Hao Yang, Timothy Baldwin, K. Verspoor","doi":"10.1162/tacl_a_00584","DOIUrl":null,"url":null,"abstract":"Abstract Despite the subjective nature of semantic textual similarity (STS) and pervasive disagreements in STS annotation, existing benchmarks have used averaged human ratings as gold standard. Averaging masks the true distribution of human opinions on examples of low agreement, and prevents models from capturing the semantic vagueness that the individual ratings represent. In this work, we introduce USTS, the first Uncertainty-aware STS dataset with ∼15,000 Chinese sentence pairs and 150,000 labels, to study collective human opinions in STS. Analysis reveals that neither a scalar nor a single Gaussian fits a set of observed judgments adequately. We further show that current STS models cannot capture the variance caused by human disagreement on individual instances, but rather reflect the predictive confidence over the aggregate dataset.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"997-1013"},"PeriodicalIF":4.2000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Collective Human Opinions in Semantic Textual Similarity\",\"authors\":\"Yuxia Wang, Shimin Tao, Ning Xie, Hao Yang, Timothy Baldwin, K. Verspoor\",\"doi\":\"10.1162/tacl_a_00584\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Despite the subjective nature of semantic textual similarity (STS) and pervasive disagreements in STS annotation, existing benchmarks have used averaged human ratings as gold standard. Averaging masks the true distribution of human opinions on examples of low agreement, and prevents models from capturing the semantic vagueness that the individual ratings represent. In this work, we introduce USTS, the first Uncertainty-aware STS dataset with ∼15,000 Chinese sentence pairs and 150,000 labels, to study collective human opinions in STS. Analysis reveals that neither a scalar nor a single Gaussian fits a set of observed judgments adequately. We further show that current STS models cannot capture the variance caused by human disagreement on individual instances, but rather reflect the predictive confidence over the aggregate dataset.\",\"PeriodicalId\":33559,\"journal\":{\"name\":\"Transactions of the Association for Computational Linguistics\",\"volume\":\"11 1\",\"pages\":\"997-1013\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2023-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transactions of the Association for Computational Linguistics\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1162/tacl_a_00584\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions of the Association for Computational Linguistics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1162/tacl_a_00584","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 2

摘要

尽管语义文本相似度(STS)的主观性和STS注释中普遍存在的分歧，但现有的基准都使用平均人类评分作为金标准。平均掩盖了人类对低一致性示例的真实意见分布，并阻止模型捕获单个评级所代表的语义模糊性。在这项工作中，我们引入了USTS，这是第一个具有不确定性感知的STS数据集，包含约15,000个中文句子对和150,000个标签，用于研究STS中的集体人类意见。分析表明，标量和单个高斯分布都不能充分拟合一组观察到的判断。我们进一步表明，目前的STS模型不能捕捉到人类在单个实例上的分歧所引起的方差，而是反映了对总体数据集的预测置信度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Collective Human Opinions in Semantic Textual Similarity

Abstract Despite the subjective nature of semantic textual similarity (STS) and pervasive disagreements in STS annotation, existing benchmarks have used averaged human ratings as gold standard. Averaging masks the true distribution of human opinions on examples of low agreement, and prevents models from capturing the semantic vagueness that the individual ratings represent. In this work, we introduce USTS, the first Uncertainty-aware STS dataset with ∼15,000 Chinese sentence pairs and 150,000 labels, to study collective human opinions in STS. Analysis reveals that neither a scalar nor a single Gaussian fits a set of observed judgments adequately. We further show that current STS models cannot capture the variance caused by human disagreement on individual instances, but rather reflect the predictive confidence over the aggregate dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Transactions of the Association for Computational Linguistics Multiple-

CiteScore

32.60

自引率

4.60%

发文量

审稿时长

8 weeks

期刊介绍： The highly regarded quarterly journal Computational Linguistics has a companion journal called Transactions of the Association for Computational Linguistics. This open access journal publishes articles in all areas of natural language processing and is an important resource for academic and industry computational linguists, natural language processing experts, artificial intelligence and machine learning investigators, cognitive scientists, speech specialists, as well as linguists and philosophers. The journal disseminates work of vital relevance to these professionals on an annual basis.