A hybrid approach to Natural Language Inference for the SICK dataset

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language Pub Date : 2024-10-10 DOI:10.1016/j.csl.2024.101736

Rodrigo Souza, Marcos Lopes

{"title":"A hybrid approach to Natural Language Inference for the SICK dataset","authors":"Rodrigo Souza, Marcos Lopes","doi":"10.1016/j.csl.2024.101736","DOIUrl":null,"url":null,"abstract":"<div><div>Natural Language Inference (NLI) can be described as the task of answering if a short text called <em>Hypothesis</em> (H) can be inferred from another text called <em>Premise</em> (P) (Poliak, 2020; Dagan et al., 2013). Affirmative answers are considered as semantic entailments and negative ones are either contradictions or semantically “neutral” statements. In the last three decades, many Natural Language Processing (NLP) methods have been put to use for solving this task. As it so happened to almost every other NLP task, Deep Learning (DL) techniques in general (and Transformer neural networks in particular) have been achieving the best results in this task in recent years, progressively increasing their outcomes when compared to classical, symbolic Knowledge Representation models in solving NLI.</div><div>Nevertheless, however successful DL models are in measurable results like accuracy and F-score, their outcomes are far from being explicable, and this is an undesirable feature specially in a task such as NLI, which is meant to deal with language understanding together with rational reasoning inherent to entailment and to contradiction judgements. It is therefore tempting to evaluate how more explainable models would perform in NLI and to compare their performance with DL models later on.</div><div>This paper puts forth a pipeline that we called IsoLex. It provides explainable, transparent NLP models for NLI. It has been tested on a partial version of the SICK corpus (Marelli, 2014) called SICK-CE, containing only the contradiction and the entailment pairs (4245 in total), thus leaving aside the neutral pairs, as an attempt to concentrate on unambiguous semantic relationships, which arguably favor the intelligibility of the results.</div><div>The pipeline consists of three serialized commonly used NLP models: first, an Isolation Forest module is used to filter off highly dissimilar Premise-Hypothesis pairs; second, a WordNet-based Lexical Relations module is employed to check whether the Premise and the Hypothesis textual contents are related to each other in terms of synonymy, hyperonymy, or holonymy; finally, similarities between Premise and Hypothesis texts are evaluated by a simple cosine similarity function based on Word2Vec embeddings.</div><div>IsoLex has achieved 92% accuracy and 94% F-1 on SICK-CE. This is close to SOTA models for this kind of task, such as RoBERTa with a 98% accuracy and 99% F-1 on the same dataset.</div><div>The small performance gap between IsoLex and SOTA DL models is largely compensated by intelligibility on every step of the proposed pipeline. At anytime it is possible to evaluate the role of similarity, lexical relatedness and so forth in the overall process of inference.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101736"},"PeriodicalIF":3.1000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824001190","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Natural Language Inference (NLI) can be described as the task of answering if a short text called Hypothesis (H) can be inferred from another text called Premise (P) (Poliak, 2020; Dagan et al., 2013). Affirmative answers are considered as semantic entailments and negative ones are either contradictions or semantically “neutral” statements. In the last three decades, many Natural Language Processing (NLP) methods have been put to use for solving this task. As it so happened to almost every other NLP task, Deep Learning (DL) techniques in general (and Transformer neural networks in particular) have been achieving the best results in this task in recent years, progressively increasing their outcomes when compared to classical, symbolic Knowledge Representation models in solving NLI.

Nevertheless, however successful DL models are in measurable results like accuracy and F-score, their outcomes are far from being explicable, and this is an undesirable feature specially in a task such as NLI, which is meant to deal with language understanding together with rational reasoning inherent to entailment and to contradiction judgements. It is therefore tempting to evaluate how more explainable models would perform in NLI and to compare their performance with DL models later on.

This paper puts forth a pipeline that we called IsoLex. It provides explainable, transparent NLP models for NLI. It has been tested on a partial version of the SICK corpus (Marelli, 2014) called SICK-CE, containing only the contradiction and the entailment pairs (4245 in total), thus leaving aside the neutral pairs, as an attempt to concentrate on unambiguous semantic relationships, which arguably favor the intelligibility of the results.

The pipeline consists of three serialized commonly used NLP models: first, an Isolation Forest module is used to filter off highly dissimilar Premise-Hypothesis pairs; second, a WordNet-based Lexical Relations module is employed to check whether the Premise and the Hypothesis textual contents are related to each other in terms of synonymy, hyperonymy, or holonymy; finally, similarities between Premise and Hypothesis texts are evaluated by a simple cosine similarity function based on Word2Vec embeddings.

IsoLex has achieved 92% accuracy and 94% F-1 on SICK-CE. This is close to SOTA models for this kind of task, such as RoBERTa with a 98% accuracy and 99% F-1 on the same dataset.

The small performance gap between IsoLex and SOTA DL models is largely compensated by intelligibility on every step of the proposed pipeline. At anytime it is possible to evaluate the role of similarity, lexical relatedness and so forth in the overall process of inference.

查看原文本刊更多论文

针对 SICK 数据集的自然语言推理混合方法

自然语言推理（NLI）可被描述为回答一个名为假设（H）的短文是否可以从另一个名为前提（P）的短文中推断出来的任务（Poliak，2020；Dagan 等人，2013）。肯定的答案被视为语义蕴含，否定的答案被视为矛盾或语义 "中性 "陈述。在过去的三十年中，许多自然语言处理（NLP）方法已被用于解决这一任务。正如几乎所有其他 NLP 任务一样，近年来，深度学习（DL）技术（尤其是 Transformer 神经网络）在这一任务中取得了最佳成果，与经典的符号化知识表示模型相比，其在解决 NLI 方面的成果逐步提高。然而，无论 DL 模型在准确率和 F 分数等可测量的结果上取得多大成功，它们的结果都远远无法解释，这对于像 NLI 这样旨在处理语言理解以及蕴含和矛盾判断所固有的理性推理的任务来说，尤其是不可取的。因此，我们很想评估更多可解释模型在 NLI 中的表现，并在以后将它们的表现与 DL 模型进行比较。它为 NLI 提供了可解释的、透明的 NLP 模型。它已在名为 SICK-CE 的 SICK 语料库（Marelli，2014 年）的部分版本上进行了测试，该语料库仅包含矛盾对和蕴涵对（共 4245 对），因此撇开了中性对，试图将注意力集中在无歧义的语义关系上，这可以说有利于结果的可理解性。该管道由三个序列化的常用 NLP 模型组成：首先，隔离森林模块用于过滤高度不相似的前提-假设对；其次，基于 WordNet 的词法关系模块用于检查前提和假设文本内容之间是否存在同义、超同义或全同关系；最后，通过基于 Word2Vec 嵌入的简单余弦相似度函数评估前提和假设文本之间的相似性。IsoLex 在 SICK-CE 上达到了 92% 的准确率和 94% 的 F-1。IsoLex 与 SOTA DL 模型在性能上的微小差距在很大程度上可以通过拟议管道每一步的可理解性来弥补。我们可以随时评估相似性、词汇相关性等因素在整个推理过程中的作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Speech and Language 工程技术-计算机：人工智能

CiteScore

11.30

自引率

4.70%

发文量

审稿时长

22.9 weeks

期刊介绍： Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.