Comparative Analysis of Traditional and Modern NLP Techniques on the CoLA Dataset: From POS Tagging to Large Language Models

IEEE Open Journal of the Computer Society Pub Date : 2025-01-07 DOI:10.1109/OJCS.2025.3526712

Abdessamad Benlahbib;Achraf Boumhidi;Anass Fahfouh;Hamza Alami

{"title":"Comparative Analysis of Traditional and Modern NLP Techniques on the CoLA Dataset: From POS Tagging to Large Language Models","authors":"Abdessamad Benlahbib;Achraf Boumhidi;Anass Fahfouh;Hamza Alami","doi":"10.1109/OJCS.2025.3526712","DOIUrl":null,"url":null,"abstract":"The task of classifying linguistic acceptability, exemplified by the CoLA (Corpus of Linguistic Acceptability) dataset, poses unique challenges for natural language processing (NLP) models. These challenges include distinguishing between subtle grammatical errors, understanding complex syntactic structures, and detecting semantic inconsistencies, all of which make the task difficult even for human annotators. In this article, we compare a range of techniques, from traditional methods such as Part-of-Speech (POS) tagging and feature extraction methods like CountVectorizer with Term Frequency-Inverse Document Frequency (TF-IDF) and N-grams, to modern embeddings such as FastText and Embeddings from Language Models (ELMo), as well as deep learning architectures like transformers and Large Language Models (LLMs). Our experiments show a clear improvement in performance as models evolve from traditional to more advanced approaches. Notably, state-of-the-art (SOTA) results were obtained by fine-tuning GPT-4o with extensive hyperparameter tuning, including experimenting with various epochs and batch sizes. This comparative analysis provides valuable insights into the relative strengths of each technique for identifying morphological, syntactic, and semantic violations, highlighting the effectiveness of LLMs in these tasks.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"248-260"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10829978","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10829978/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The task of classifying linguistic acceptability, exemplified by the CoLA (Corpus of Linguistic Acceptability) dataset, poses unique challenges for natural language processing (NLP) models. These challenges include distinguishing between subtle grammatical errors, understanding complex syntactic structures, and detecting semantic inconsistencies, all of which make the task difficult even for human annotators. In this article, we compare a range of techniques, from traditional methods such as Part-of-Speech (POS) tagging and feature extraction methods like CountVectorizer with Term Frequency-Inverse Document Frequency (TF-IDF) and N-grams, to modern embeddings such as FastText and Embeddings from Language Models (ELMo), as well as deep learning architectures like transformers and Large Language Models (LLMs). Our experiments show a clear improvement in performance as models evolve from traditional to more advanced approaches. Notably, state-of-the-art (SOTA) results were obtained by fine-tuning GPT-4o with extensive hyperparameter tuning, including experimenting with various epochs and batch sizes. This comparative analysis provides valuable insights into the relative strengths of each technique for identifying morphological, syntactic, and semantic violations, highlighting the effectiveness of LLMs in these tasks.

查看原文本刊更多论文

传统与现代自然语言处理技术在CoLA数据集上的比较分析：从词性标注到大型语言模型

以语言可接受性语料库（CoLA）数据集为例，对语言可接受性进行分类的任务对自然语言处理（NLP）模型提出了独特的挑战。这些挑战包括区分细微的语法错误、理解复杂的句法结构和检测语义不一致，所有这些都使得这项任务即使对人类注释者来说也很困难。在本文中，我们比较了一系列技术，从词性（POS）标记等传统方法和特征提取方法（如词频逆文档频率（TF-IDF）和n -gram的CountVectorizer）到现代嵌入（如FastText和语言模型嵌入（ELMo）），以及深度学习架构（如变压器和大型语言模型（llm））。我们的实验表明，随着模型从传统方法进化到更先进的方法，性能有了明显的提高。值得注意的是，最先进的（SOTA）结果是通过对gpt - 40进行广泛的超参数调优而获得的，包括对各种时代和批量大小进行实验。这种比较分析为识别形态、句法和语义违规的每种技术的相对优势提供了有价值的见解，突出了llm在这些任务中的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Open Journal of the Computer Society

CiteScore

12.60

自引率

0.00%

发文量