Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence

A. Weichselbraun, Sandro Hörler, Christian Hauser, Anina Havelka
{"title":"Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence","authors":"A. Weichselbraun, Sandro Hörler, Christian Hauser, Anina Havelka","doi":"10.1145/3405962.3405988","DOIUrl":null,"url":null,"abstract":"A substantial number of international corporations have been affected by corruption. The research presented in this paper introduces the Integrity Risks Monitor, an analytics dashboard that applies Web Intelligence and Deep Learning to english and german-speaking documents for the task of (i) tracking and visualizing past corruption management gaps and their respective impacts, (ii) understanding present and past integrity issues, (iii) supporting companies in analyzing news media for identifying and mitigating integrity risks. Afterwards, we discuss the design, implementation, training and evaluation of classification components capable of identifying English documents covering the integrity topic of corruption. Domain experts created a gold standard dataset compiled from Anglo-American media coverage on corruption cases that has been used for training and evaluating the classifier. The experiments performed to evaluate the classifiers draw upon popular algorithms used for text classification such as Naïve Bayes, Support Vector Machines (SVM) and Deep Learning architectures (LSTM, BiLSTM, CNN) that draw upon different word embeddings and document representations. They also demonstrate that although classical machine learning approaches such as Naïve Bayes struggle with the diversity of the media coverage on corruption, state-of-the art Deep Learning models perform sufficiently well in the project's context.","PeriodicalId":247414,"journal":{"name":"Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3405962.3405988","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

A substantial number of international corporations have been affected by corruption. The research presented in this paper introduces the Integrity Risks Monitor, an analytics dashboard that applies Web Intelligence and Deep Learning to english and german-speaking documents for the task of (i) tracking and visualizing past corruption management gaps and their respective impacts, (ii) understanding present and past integrity issues, (iii) supporting companies in analyzing news media for identifying and mitigating integrity risks. Afterwards, we discuss the design, implementation, training and evaluation of classification components capable of identifying English documents covering the integrity topic of corruption. Domain experts created a gold standard dataset compiled from Anglo-American media coverage on corruption cases that has been used for training and evaluating the classifier. The experiments performed to evaluate the classifiers draw upon popular algorithms used for text classification such as Naïve Bayes, Support Vector Machines (SVM) and Deep Learning architectures (LSTM, BiLSTM, CNN) that draw upon different word embeddings and document representations. They also demonstrate that although classical machine learning approaches such as Naïve Bayes struggle with the diversity of the media coverage on corruption, state-of-the art Deep Learning models perform sufficiently well in the project's context.
基于深度学习和网络智能的腐败风险管理新闻报道分类
相当数量的国际公司受到腐败的影响。本文介绍的研究介绍了诚信风险监视器,这是一个分析仪表板,将网络智能和深度学习应用于英语和德语文件,用于以下任务:(i)跟踪和可视化过去的腐败管理差距及其各自的影响,(ii)理解现在和过去的诚信问题,(iii)支持公司分析新闻媒体以识别和减轻诚信风险。随后,我们讨论了分类组件的设计、实施、培训和评估,这些组件能够识别涉及腐败诚信主题的英文文件。领域专家根据英美媒体对腐败案件的报道,创建了一个黄金标准数据集,用于训练和评估分类器。评估分类器的实验利用了用于文本分类的流行算法,如Naïve贝叶斯、支持向量机(SVM)和深度学习架构(LSTM、BiLSTM、CNN),这些算法利用了不同的词嵌入和文档表示。他们还证明,尽管经典的机器学习方法(如Naïve Bayes)难以应对媒体对腐败报道的多样性,但最先进的深度学习模型在项目背景下表现得足够好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信