Curing the SICK and Other NLI Maladies

IF 5.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics Pub Date : 2022-10-12 DOI:10.1162/coli_a_00465

A. Kalouli, Hai Hu, Alexander F. Webb, Larry Moss, Valeria C V de Paiva

{"title":"Curing the SICK and Other NLI Maladies","authors":"A. Kalouli, Hai Hu, Alexander F. Webb, Larry Moss, Valeria C V de Paiva","doi":"10.1162/coli_a_00465","DOIUrl":null,"url":null,"abstract":"Against the backdrop of the ever-improving Natural Language Inference (NLI) models, recent efforts have focused on the suitability of the current NLI datasets and on the feasibility of the NLI task as it is currently approached. Many of the recent studies have exposed the inherent human disagreements of the inference task and have proposed a shift from categorical labels to human subjective probability assessments, capturing human uncertainty. In this work, we show how neither the current task formulation nor the proposed uncertainty gradient are entirely suitable for solving the NLI challenges. Instead, we propose an ordered sense space annotation, which distinguishes between logical and common-sense inference. One end of the space captures non-sensical inferences, while the other end represents strictly logical scenarios. In the middle of the space, we find a continuum of common-sense, namely, the subjective and graded opinion of a “person on the street.” To arrive at the proposed annotation scheme, we perform a careful investigation of the SICK corpus and we create a taxonomy of annotation issues and guidelines. We re-annotate the corpus with the proposed annotation scheme, utilizing four symbolic inference systems, and then perform a thorough evaluation of the scheme by fine-tuning and testing commonly used pre-trained language models on the re-annotated SICK within various settings. We also pioneer a crowd annotation of a small portion of the MultiNLI corpus, showcasing that it is possible to adapt our scheme for annotation by non-experts on another NLI corpus. Our work shows the efficiency and benefits of the proposed mechanism and opens the way for a careful NLI task refinement.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"49 1","pages":"199-243"},"PeriodicalIF":5.3000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Linguistics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/coli_a_00465","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 3

Abstract

Against the backdrop of the ever-improving Natural Language Inference (NLI) models, recent efforts have focused on the suitability of the current NLI datasets and on the feasibility of the NLI task as it is currently approached. Many of the recent studies have exposed the inherent human disagreements of the inference task and have proposed a shift from categorical labels to human subjective probability assessments, capturing human uncertainty. In this work, we show how neither the current task formulation nor the proposed uncertainty gradient are entirely suitable for solving the NLI challenges. Instead, we propose an ordered sense space annotation, which distinguishes between logical and common-sense inference. One end of the space captures non-sensical inferences, while the other end represents strictly logical scenarios. In the middle of the space, we find a continuum of common-sense, namely, the subjective and graded opinion of a “person on the street.” To arrive at the proposed annotation scheme, we perform a careful investigation of the SICK corpus and we create a taxonomy of annotation issues and guidelines. We re-annotate the corpus with the proposed annotation scheme, utilizing four symbolic inference systems, and then perform a thorough evaluation of the scheme by fine-tuning and testing commonly used pre-trained language models on the re-annotated SICK within various settings. We also pioneer a crowd annotation of a small portion of the MultiNLI corpus, showcasing that it is possible to adapt our scheme for annotation by non-experts on another NLI corpus. Our work shows the efficiency and benefits of the proposed mechanism and opens the way for a careful NLI task refinement.

查看原文本刊更多论文

治愈SICK和其他NLI Maladies

在不断改进的自然语言推理(NLI)模型的背景下，最近的努力集中在当前NLI数据集的适用性和NLI任务的可行性上，因为它目前正在接近。最近的许多研究都揭示了人类在推理任务中固有的分歧，并提出了从分类标签到人类主观概率评估的转变，以捕捉人类的不确定性。在这项工作中，我们展示了当前的任务公式和提出的不确定性梯度如何都不完全适合解决NLI挑战。相反，我们提出了一个有序的感觉空间注释，它区分逻辑推理和常识推理。空间的一端捕获无意义的推理，而另一端则表示严格的逻辑场景。在空间的中间，我们发现了一个常识的连续体，即“街上的人”的主观和分级的意见。为了得到建议的注释方案，我们对SICK语料库进行了仔细的调查，并创建了注释问题和指南的分类法。我们利用四种符号推理系统，用提出的标注方案对语料库进行重新标注，然后在不同设置下对重新标注的SICK上常用的预训练语言模型进行微调和测试，对该方案进行全面评估。我们还率先对多语料库的一小部分进行了群体标注，这表明我们的方案可以适用于非专家对另一个多语料库的标注。我们的工作显示了所提出的机制的效率和好处，并为仔细的NLI任务改进开辟了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Linguistics 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Computational Linguistics, the longest-running publication dedicated solely to the computational and mathematical aspects of language and the design of natural language processing systems, provides university and industry linguists, computational linguists, AI and machine learning researchers, cognitive scientists, speech specialists, and philosophers with the latest insights into the computational aspects of language research.