{"title":"信息混乱检测技术的比较与批判性反思:执行跨数据和跨模型评估","authors":"Mark Nicolas Gruensteidl, Sabrina Kirrane","doi":"10.1016/j.inffus.2025.103806","DOIUrl":null,"url":null,"abstract":"<div><div>Information disorders, such as dis-, mis-, and malinformation, can lead to societal and/or economic harm. They are rapidly spread, extensively consumed on the web, and represent a threat to democracy. AI-based detection models can identify information disorders to some extent. However, major issues are the dynamics of news characteristics and concept drift. The generalization ability of a model is an important requirement and refers to its robustness when applied on unseen data. The aim of this work is to better understand the state of the art regarding information disorder detection approaches by conducting a reproducibility study and a cross-data and cross-model comparative analysis that leads to: (i) insights with respect to the effectiveness of binary information disorder classification; (ii) performance results on seen and unseen data; and (iii) new mixed European datasets named MENA. We conduct an evaluation of a fine-tuned BERT-based model applied on European data, which has received limited attention to date. The best performing models in our experiments are the RoBERTa and the Longformer models. The evaluation gives insights about potential biases of datasets that can be used to improve a model’s generalization ability. We also show that using domain-specific datasets for fine-tuning contributes to the robustness of models. Finally, we provide takeaways concerning reproducibility and stress the need for more transparent AI-based detection techniques.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103806"},"PeriodicalIF":15.5000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comparison and Critical Reflection of Information Disorder Detection Techniques: Performing a Cross-Data and Cross-Model Evaluation\",\"authors\":\"Mark Nicolas Gruensteidl, Sabrina Kirrane\",\"doi\":\"10.1016/j.inffus.2025.103806\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Information disorders, such as dis-, mis-, and malinformation, can lead to societal and/or economic harm. They are rapidly spread, extensively consumed on the web, and represent a threat to democracy. AI-based detection models can identify information disorders to some extent. However, major issues are the dynamics of news characteristics and concept drift. The generalization ability of a model is an important requirement and refers to its robustness when applied on unseen data. The aim of this work is to better understand the state of the art regarding information disorder detection approaches by conducting a reproducibility study and a cross-data and cross-model comparative analysis that leads to: (i) insights with respect to the effectiveness of binary information disorder classification; (ii) performance results on seen and unseen data; and (iii) new mixed European datasets named MENA. We conduct an evaluation of a fine-tuned BERT-based model applied on European data, which has received limited attention to date. The best performing models in our experiments are the RoBERTa and the Longformer models. The evaluation gives insights about potential biases of datasets that can be used to improve a model’s generalization ability. We also show that using domain-specific datasets for fine-tuning contributes to the robustness of models. Finally, we provide takeaways concerning reproducibility and stress the need for more transparent AI-based detection techniques.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"127 \",\"pages\":\"Article 103806\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525008681\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525008681","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A Comparison and Critical Reflection of Information Disorder Detection Techniques: Performing a Cross-Data and Cross-Model Evaluation
Information disorders, such as dis-, mis-, and malinformation, can lead to societal and/or economic harm. They are rapidly spread, extensively consumed on the web, and represent a threat to democracy. AI-based detection models can identify information disorders to some extent. However, major issues are the dynamics of news characteristics and concept drift. The generalization ability of a model is an important requirement and refers to its robustness when applied on unseen data. The aim of this work is to better understand the state of the art regarding information disorder detection approaches by conducting a reproducibility study and a cross-data and cross-model comparative analysis that leads to: (i) insights with respect to the effectiveness of binary information disorder classification; (ii) performance results on seen and unseen data; and (iii) new mixed European datasets named MENA. We conduct an evaluation of a fine-tuned BERT-based model applied on European data, which has received limited attention to date. The best performing models in our experiments are the RoBERTa and the Longformer models. The evaluation gives insights about potential biases of datasets that can be used to improve a model’s generalization ability. We also show that using domain-specific datasets for fine-tuning contributes to the robustness of models. Finally, we provide takeaways concerning reproducibility and stress the need for more transparent AI-based detection techniques.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.