Enhancing cross-lingual text classification through linguistic and interpretability-guided attack strategies

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems Pub Date : 2025-01-27 DOI:10.1016/j.is.2025.102526

Abdelmounaim Kerkri , Mohamed Amine Madani , Aya Qeraouch , Kaoutar Zouin

{"title":"Enhancing cross-lingual text classification through linguistic and interpretability-guided attack strategies","authors":"Abdelmounaim Kerkri , Mohamed Amine Madani , Aya Qeraouch , Kaoutar Zouin","doi":"10.1016/j.is.2025.102526","DOIUrl":null,"url":null,"abstract":"<div><div>While adversarial attacks on natural language processing systems have been extensively studied in English, their impact on morphologically complex languages remains poorly understood. We investigate how text classification systems respond to adversarial attacks across Arabic, English, and French — languages chosen for their distinct linguistic properties. Building on the DeepWordBug framework, we develop multilingual attack strategies that combine random perturbations with targeted modifications guided by model interpretability. We also introduce novel attack methods that exploit language-specific features like orthographic variations and syntactic patterns. Testing these approaches on a diverse dataset of news articles (9,030 Arabic, 14,501 English) and movie reviews (200,000 French), we find that interpretability-guided attacks are particularly effective, achieving misclassification rates of 58%–62% across languages. Language-specific perturbations also proved potent, degrading model performance to F1-scores between 0.38 and 0.63. However, incorporating adversarial examples during training markedly improved model robustness, with F1-scores recovering to above 0.82 across all test conditions. Beyond the immediate findings, this work reveals how adversarial vulnerability manifests differently across languages with varying morphological complexity, offering key insights for building more resilient multilingual NLP systems.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"131 ","pages":"Article 102526"},"PeriodicalIF":3.4000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437925000110","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

While adversarial attacks on natural language processing systems have been extensively studied in English, their impact on morphologically complex languages remains poorly understood. We investigate how text classification systems respond to adversarial attacks across Arabic, English, and French — languages chosen for their distinct linguistic properties. Building on the DeepWordBug framework, we develop multilingual attack strategies that combine random perturbations with targeted modifications guided by model interpretability. We also introduce novel attack methods that exploit language-specific features like orthographic variations and syntactic patterns. Testing these approaches on a diverse dataset of news articles (9,030 Arabic, 14,501 English) and movie reviews (200,000 French), we find that interpretability-guided attacks are particularly effective, achieving misclassification rates of 58%–62% across languages. Language-specific perturbations also proved potent, degrading model performance to F1-scores between 0.38 and 0.63. However, incorporating adversarial examples during training markedly improved model robustness, with F1-scores recovering to above 0.82 across all test conditions. Beyond the immediate findings, this work reveals how adversarial vulnerability manifests differently across languages with varying morphological complexity, offering key insights for building more resilient multilingual NLP systems.

查看原文本刊更多论文

通过语言和可解释性引导的攻击策略增强跨语言文本分类

虽然对自然语言处理系统的对抗性攻击已经在英语中得到了广泛的研究，但它们对形态复杂语言的影响仍然知之甚少。我们研究了文本分类系统如何对阿拉伯语、英语和法语的对抗性攻击做出反应，这些语言因其独特的语言特性而被选择。在DeepWordBug框架的基础上，我们开发了多语言攻击策略，将随机扰动与模型可解释性指导下的目标修改相结合。我们还介绍了利用语言特定特征（如正字法变化和句法模式）的新颖攻击方法。在新闻文章（9030篇阿拉伯语，14501篇英语）和电影评论（20万篇法语）的不同数据集上测试这些方法，我们发现可解释性引导的攻击特别有效，跨语言的错误分类率达到58%-62%。语言特异性扰动也被证明是有效的，将模型性能降低到f1分数在0.38到0.63之间。然而，在训练过程中加入对抗性示例显著提高了模型的稳健性，f1得分在所有测试条件下都恢复到0.82以上。除了直接的发现之外，这项工作揭示了对抗性漏洞如何在不同形态复杂性的语言中表现不同，为构建更具弹性的多语言NLP系统提供了关键见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

2.70%

发文量

112

审稿时长

53 days

期刊介绍： Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.