The geometry of meaning: evaluating sentence embeddings from diverse transformer-based models for natural language inference.

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science Pub Date : 2025-06-16 eCollection Date: 2025-01-01 DOI:10.7717/peerj-cs.2957

Mohammed Alsuhaibani

{"title":"The geometry of meaning: evaluating sentence embeddings from diverse transformer-based models for natural language inference.","authors":"Mohammed Alsuhaibani","doi":"10.7717/peerj-cs.2957","DOIUrl":null,"url":null,"abstract":"<p><p>Natural language inference (NLI) is a fundamental task in natural language processing that focuses on determining the relationship between pairs of sentences. In this article, we present a simple and straightforward approach to evaluate the effectiveness of various transformer-based models such as bidirectional encoder representations from transformers (BERT), Generative Pre-trained Transformer (GPT), robustly optimized BERT approach (RoBERTa), and XLNet in generating sentence embeddings for NLI. We conduct comprehensive experiments with different pooling techniques and evaluate the embeddings using different norms across multiple layers of each model. Our results demonstrate that the choice of pooling strategy, norm, and model layer significantly impacts the performance of NLI, with the best results achieved using max pooling and the L2 norm across specific model layers. On the Stanford Natural Language Inference (SNLI) dataset, the model reached 90% accuracy and 86% F1-score, while on the MedNLI dataset, the highest F1-score recorded was 84%. This article provides insights into how different models and evaluation strategies can be effectively combined to improve the understanding and classification of sentence relationships in NLI tasks.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2957"},"PeriodicalIF":3.5000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12193426/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2957","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Natural language inference (NLI) is a fundamental task in natural language processing that focuses on determining the relationship between pairs of sentences. In this article, we present a simple and straightforward approach to evaluate the effectiveness of various transformer-based models such as bidirectional encoder representations from transformers (BERT), Generative Pre-trained Transformer (GPT), robustly optimized BERT approach (RoBERTa), and XLNet in generating sentence embeddings for NLI. We conduct comprehensive experiments with different pooling techniques and evaluate the embeddings using different norms across multiple layers of each model. Our results demonstrate that the choice of pooling strategy, norm, and model layer significantly impacts the performance of NLI, with the best results achieved using max pooling and the L2 norm across specific model layers. On the Stanford Natural Language Inference (SNLI) dataset, the model reached 90% accuracy and 86% F1-score, while on the MedNLI dataset, the highest F1-score recorded was 84%. This article provides insights into how different models and evaluation strategies can be effectively combined to improve the understanding and classification of sentence relationships in NLI tasks.

查看原文本刊更多论文

意义的几何：评估自然语言推理中基于变换的不同模型的句子嵌入。

自然语言推理（NLI）是自然语言处理中的一项基本任务，其重点是确定句子对之间的关系。在本文中，我们提出了一种简单而直接的方法来评估各种基于变压器的模型的有效性，例如来自变压器的双向编码器表示（BERT），生成预训练变压器（GPT），稳健优化的BERT方法（RoBERTa）和XLNet在为NLI生成句子嵌入方面的有效性。我们对不同的池化技术进行了全面的实验，并在每个模型的多层中使用不同的规范来评估嵌入。我们的研究结果表明，池化策略、范数和模型层的选择显著影响NLI的性能，使用跨特定模型层的最大池化和L2范数可以获得最佳结果。在斯坦福自然语言推理（Stanford Natural Language Inference， SNLI）数据集上，该模型达到了90%的准确率和86%的f1得分，而在MedNLI数据集上，该模型的最高f1得分为84%。本文探讨了如何将不同的模型和评价策略有效地结合起来，以提高NLI任务中句子关系的理解和分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PeerJ Computer Science Computer Science-General Computer Science

CiteScore

6.10

自引率

5.30%

发文量

332

审稿时长

10 weeks

期刊介绍： PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.