{"title":"Automated text classification of opinion vs. news French press articles. A comparison of transformer and feature-based approaches","authors":"","doi":"10.1016/j.langcom.2024.09.004","DOIUrl":null,"url":null,"abstract":"<div><div>This study explores Natural Language Processing (NLP) methods for distinguishing between press articles belonging to the journalistic genres of ‘objective’ <em>news</em> and ‘subjective’ <em>opinion</em>. Two classification models are compared: CamemBERT, a French transformer model fine-tuned for the task, and a machine learning model using 32 linguistic features. Trained on 8000 Belgian French articles, both models are evaluated on 1000 Canadian French articles. Results show CamemBERT’s superiority but highlight potential for hybrid approaches and emphasizes the need for robust and transparent methods in NLP. The research contributes to understanding NLP’s role in journalism by addressing challenges of point of view detection in press discourse.</div></div>","PeriodicalId":47575,"journal":{"name":"Language & Communication","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language & Communication","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0271530924000624","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMMUNICATION","Score":null,"Total":0}
引用次数: 0
Abstract
This study explores Natural Language Processing (NLP) methods for distinguishing between press articles belonging to the journalistic genres of ‘objective’ news and ‘subjective’ opinion. Two classification models are compared: CamemBERT, a French transformer model fine-tuned for the task, and a machine learning model using 32 linguistic features. Trained on 8000 Belgian French articles, both models are evaluated on 1000 Canadian French articles. Results show CamemBERT’s superiority but highlight potential for hybrid approaches and emphasizes the need for robust and transparent methods in NLP. The research contributes to understanding NLP’s role in journalism by addressing challenges of point of view detection in press discourse.
期刊介绍:
This journal is unique in that it provides a forum devoted to the interdisciplinary study of language and communication. The investigation of language and its communicational functions is treated as a concern shared in common by those working in applied linguistics, child development, cultural studies, discourse analysis, intellectual history, legal studies, language evolution, linguistic anthropology, linguistics, philosophy, the politics of language, pragmatics, psychology, rhetoric, semiotics, and sociolinguistics. The journal invites contributions which explore the implications of current research for establishing common theoretical frameworks within which findings from different areas of study may be accommodated and interrelated. By focusing attention on the many ways in which language is integrated with other forms of communicational activity and interactional behaviour, it is intended to encourage approaches to the study of language and communication which are not restricted by existing disciplinary boundaries.