NGU CNLP atWANLP 2022 Shared Task: Propaganda Detection in Arabic

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI:10.18653/v1/2022.wanlp-1.66

A. S. Hussein, Abu Bakr Soliman Mohammad, Mohamed Ibrahim, Laila H. Afify, S. El-Beltagy

{"title":"NGU CNLP atWANLP 2022 Shared Task: Propaganda Detection in Arabic","authors":"A. S. Hussein, Abu Bakr Soliman Mohammad, Mohamed Ibrahim, Laila H. Afify, S. El-Beltagy","doi":"10.18653/v1/2022.wanlp-1.66","DOIUrl":null,"url":null,"abstract":"This paper presents the system developed by the NGU_CNLP team for addressing the shared task on Propaganda Detection in Arabic at WANLP 2022. The team participated in the shared tasks’ two sub-tasks which are: 1) Propaganda technique identification in text and 2) Propaganda technique span identification. In the first sub-task, the goal is to detect all employed propaganda techniques in some given piece of text out of a possible 17 different techniques or to detect that no propaganda technique is being used in that piece of text. As such, this first sub-task is a multi-label classification problem with a pool of 18 possible labels. Subtask 2 extends sub-task 1, by requiring the identification of the exact text span in which a propaganda technique was employed, making it a sequence labeling problem. For task 1, a combination of a data augmentation strategy coupled with an enabled transformer-based model comprised our classification model. This classification model ranked first amongst the 14 systems participating in this subtask. For sub-task two, a transfer learning model was adopted. The system ranked third among the 3 different models that participated in this subtask.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Arabic Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.wanlp-1.66","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

This paper presents the system developed by the NGU_CNLP team for addressing the shared task on Propaganda Detection in Arabic at WANLP 2022. The team participated in the shared tasks’ two sub-tasks which are: 1) Propaganda technique identification in text and 2) Propaganda technique span identification. In the first sub-task, the goal is to detect all employed propaganda techniques in some given piece of text out of a possible 17 different techniques or to detect that no propaganda technique is being used in that piece of text. As such, this first sub-task is a multi-label classification problem with a pool of 18 possible labels. Subtask 2 extends sub-task 1, by requiring the identification of the exact text span in which a propaganda technique was employed, making it a sequence labeling problem. For task 1, a combination of a data augmentation strategy coupled with an enabled transformer-based model comprised our classification model. This classification model ranked first amongst the 14 systems participating in this subtask. For sub-task two, a transfer learning model was adopted. The system ranked third among the 3 different models that participated in this subtask.

查看原文本刊更多论文

NGU CNLP atWANLP 2022共享任务:阿拉伯语宣传检测

本文介绍了由NGU_CNLP团队开发的系统，用于解决WANLP 2022上阿拉伯语宣传检测的共享任务。团队参与了共享任务的两个子任务:1)文本中的宣传技术识别和2)宣传技术跨度识别。在第一个子任务中，目标是从可能的17种不同的技术中检测出某一给定文本中所有使用的宣传技术，或者检测出该文本中没有使用宣传技术。因此，第一个子任务是一个包含18个可能标签的多标签分类问题。子任务2扩展了子任务1，要求识别使用宣传技术的确切文本跨度，使其成为一个序列标记问题。对于任务1，将数据增强策略与启用的基于转换器的模型相结合，组成了我们的分类模型。该分类模型在参与该子任务的14个系统中排名第一。子任务二采用迁移学习模型。该系统在参与该子任务的3个不同模型中排名第三。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop on Arabic Natural Language Processing

自引率

0.00%

发文量