{"title":"Text-to-text generative approach for enhanced complex word identification","authors":"","doi":"10.1016/j.neucom.2024.128501","DOIUrl":null,"url":null,"abstract":"<div><p>This paper presents a novel approach for solving the Complex Word Identification (CWI) task using the text-to-text generative model. The CWI task involves identifying complex words in text, which is a challenging Natural Language Processing task. To our knowledge, it is a first attempt to address CWI problem into text-to-text context. In this work, we propose a new methodology that leverages the power of the Transformer model to evaluate complexity of words in binary and probabilistic settings. We also propose a novel CWI dataset, which consists of 62,200 phrases, both complex and simple. We train and fine-tune our proposed model on our CWI dataset. We also evaluate its performance on separate test sets across three different domains. Our experimental results demonstrate the effectiveness of our proposed approach compared to state-of-the-art methods.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0925231224012724/pdfft?md5=f8ab474940958df48eb8630b15af37e4&pid=1-s2.0-S0925231224012724-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224012724","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a novel approach for solving the Complex Word Identification (CWI) task using the text-to-text generative model. The CWI task involves identifying complex words in text, which is a challenging Natural Language Processing task. To our knowledge, it is a first attempt to address CWI problem into text-to-text context. In this work, we propose a new methodology that leverages the power of the Transformer model to evaluate complexity of words in binary and probabilistic settings. We also propose a novel CWI dataset, which consists of 62,200 phrases, both complex and simple. We train and fine-tune our proposed model on our CWI dataset. We also evaluate its performance on separate test sets across three different domains. Our experimental results demonstrate the effectiveness of our proposed approach compared to state-of-the-art methods.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.