法国报刊文章观点与新闻的自动文本分类。转换器和基于特征的方法比较

IF 1.3 2区 文学 Q2 COMMUNICATION
{"title":"法国报刊文章观点与新闻的自动文本分类。转换器和基于特征的方法比较","authors":"","doi":"10.1016/j.langcom.2024.09.004","DOIUrl":null,"url":null,"abstract":"<div><div>This study explores Natural Language Processing (NLP) methods for distinguishing between press articles belonging to the journalistic genres of ‘objective’ <em>news</em> and ‘subjective’ <em>opinion</em>. Two classification models are compared: CamemBERT, a French transformer model fine-tuned for the task, and a machine learning model using 32 linguistic features. Trained on 8000 Belgian French articles, both models are evaluated on 1000 Canadian French articles. Results show CamemBERT’s superiority but highlight potential for hybrid approaches and emphasizes the need for robust and transparent methods in NLP. The research contributes to understanding NLP’s role in journalism by addressing challenges of point of view detection in press discourse.</div></div>","PeriodicalId":47575,"journal":{"name":"Language & Communication","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automated text classification of opinion vs. news French press articles. A comparison of transformer and feature-based approaches\",\"authors\":\"\",\"doi\":\"10.1016/j.langcom.2024.09.004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This study explores Natural Language Processing (NLP) methods for distinguishing between press articles belonging to the journalistic genres of ‘objective’ <em>news</em> and ‘subjective’ <em>opinion</em>. Two classification models are compared: CamemBERT, a French transformer model fine-tuned for the task, and a machine learning model using 32 linguistic features. Trained on 8000 Belgian French articles, both models are evaluated on 1000 Canadian French articles. Results show CamemBERT’s superiority but highlight potential for hybrid approaches and emphasizes the need for robust and transparent methods in NLP. The research contributes to understanding NLP’s role in journalism by addressing challenges of point of view detection in press discourse.</div></div>\",\"PeriodicalId\":47575,\"journal\":{\"name\":\"Language & Communication\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Language & Communication\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0271530924000624\",\"RegionNum\":2,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMMUNICATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language & Communication","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0271530924000624","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMMUNICATION","Score":null,"Total":0}
引用次数: 0

摘要

本研究探讨了自然语言处理(NLP)方法,用于区分属于 "客观 "新闻和 "主观 "观点两种新闻体裁的报刊文章。本研究比较了两种分类模型:CamemBERT是一种针对该任务进行微调的法语转换器模型,而机器学习模型则使用了32种语言特征。这两个模型都在 8000 篇比利时法语文章中进行了训练,并在 1000 篇加拿大法语文章中进行了评估。结果显示了 CamemBERT 的优越性,但也凸显了混合方法的潜力,并强调了在 NLP 中采用稳健而透明的方法的必要性。这项研究通过解决新闻话语中观点检测的难题,有助于理解 NLP 在新闻业中的作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automated text classification of opinion vs. news French press articles. A comparison of transformer and feature-based approaches
This study explores Natural Language Processing (NLP) methods for distinguishing between press articles belonging to the journalistic genres of ‘objective’ news and ‘subjective’ opinion. Two classification models are compared: CamemBERT, a French transformer model fine-tuned for the task, and a machine learning model using 32 linguistic features. Trained on 8000 Belgian French articles, both models are evaluated on 1000 Canadian French articles. Results show CamemBERT’s superiority but highlight potential for hybrid approaches and emphasizes the need for robust and transparent methods in NLP. The research contributes to understanding NLP’s role in journalism by addressing challenges of point of view detection in press discourse.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.40
自引率
6.70%
发文量
67
期刊介绍: This journal is unique in that it provides a forum devoted to the interdisciplinary study of language and communication. The investigation of language and its communicational functions is treated as a concern shared in common by those working in applied linguistics, child development, cultural studies, discourse analysis, intellectual history, legal studies, language evolution, linguistic anthropology, linguistics, philosophy, the politics of language, pragmatics, psychology, rhetoric, semiotics, and sociolinguistics. The journal invites contributions which explore the implications of current research for establishing common theoretical frameworks within which findings from different areas of study may be accommodated and interrelated. By focusing attention on the many ways in which language is integrated with other forms of communicational activity and interactional behaviour, it is intended to encourage approaches to the study of language and communication which are not restricted by existing disciplinary boundaries.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信