The geometry of meaning: evaluating sentence embeddings from diverse transformer-based models for natural language inference.

IF 3.5 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
PeerJ Computer Science Pub Date : 2025-06-16 eCollection Date: 2025-01-01 DOI:10.7717/peerj-cs.2957
Mohammed Alsuhaibani
{"title":"The geometry of meaning: evaluating sentence embeddings from diverse transformer-based models for natural language inference.","authors":"Mohammed Alsuhaibani","doi":"10.7717/peerj-cs.2957","DOIUrl":null,"url":null,"abstract":"<p><p>Natural language inference (NLI) is a fundamental task in natural language processing that focuses on determining the relationship between pairs of sentences. In this article, we present a simple and straightforward approach to evaluate the effectiveness of various transformer-based models such as bidirectional encoder representations from transformers (BERT), Generative Pre-trained Transformer (GPT), robustly optimized BERT approach (RoBERTa), and XLNet in generating sentence embeddings for NLI. We conduct comprehensive experiments with different pooling techniques and evaluate the embeddings using different norms across multiple layers of each model. Our results demonstrate that the choice of pooling strategy, norm, and model layer significantly impacts the performance of NLI, with the best results achieved using max pooling and the L2 norm across specific model layers. On the Stanford Natural Language Inference (SNLI) dataset, the model reached 90% accuracy and 86% F1-score, while on the MedNLI dataset, the highest F1-score recorded was 84%. This article provides insights into how different models and evaluation strategies can be effectively combined to improve the understanding and classification of sentence relationships in NLI tasks.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2957"},"PeriodicalIF":3.5000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12193426/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2957","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Natural language inference (NLI) is a fundamental task in natural language processing that focuses on determining the relationship between pairs of sentences. In this article, we present a simple and straightforward approach to evaluate the effectiveness of various transformer-based models such as bidirectional encoder representations from transformers (BERT), Generative Pre-trained Transformer (GPT), robustly optimized BERT approach (RoBERTa), and XLNet in generating sentence embeddings for NLI. We conduct comprehensive experiments with different pooling techniques and evaluate the embeddings using different norms across multiple layers of each model. Our results demonstrate that the choice of pooling strategy, norm, and model layer significantly impacts the performance of NLI, with the best results achieved using max pooling and the L2 norm across specific model layers. On the Stanford Natural Language Inference (SNLI) dataset, the model reached 90% accuracy and 86% F1-score, while on the MedNLI dataset, the highest F1-score recorded was 84%. This article provides insights into how different models and evaluation strategies can be effectively combined to improve the understanding and classification of sentence relationships in NLI tasks.

意义的几何:评估自然语言推理中基于变换的不同模型的句子嵌入。
自然语言推理(NLI)是自然语言处理中的一项基本任务,其重点是确定句子对之间的关系。在本文中,我们提出了一种简单而直接的方法来评估各种基于变压器的模型的有效性,例如来自变压器的双向编码器表示(BERT),生成预训练变压器(GPT),稳健优化的BERT方法(RoBERTa)和XLNet在为NLI生成句子嵌入方面的有效性。我们对不同的池化技术进行了全面的实验,并在每个模型的多层中使用不同的规范来评估嵌入。我们的研究结果表明,池化策略、范数和模型层的选择显著影响NLI的性能,使用跨特定模型层的最大池化和L2范数可以获得最佳结果。在斯坦福自然语言推理(Stanford Natural Language Inference, SNLI)数据集上,该模型达到了90%的准确率和86%的f1得分,而在MedNLI数据集上,该模型的最高f1得分为84%。本文探讨了如何将不同的模型和评价策略有效地结合起来,以提高NLI任务中句子关系的理解和分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
PeerJ Computer Science
PeerJ Computer Science Computer Science-General Computer Science
CiteScore
6.10
自引率
5.30%
发文量
332
审稿时长
10 weeks
期刊介绍: PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信