Data & Knowledge Engineering最新文献

筛选
英文 中文
Blockchain-based ontology driven reference framework for security risk management 基于区块链本体驱动的安全风险管理参考框架
IF 2.5 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2023-12-04 DOI: 10.1016/j.datak.2023.102257
Mubashar Iqbal , Aleksandr Kormiltsyn , Vimal Dwivedi , Raimundas Matulevičius
{"title":"Blockchain-based ontology driven reference framework for security risk management","authors":"Mubashar Iqbal ,&nbsp;Aleksandr Kormiltsyn ,&nbsp;Vimal Dwivedi ,&nbsp;Raimundas Matulevičius","doi":"10.1016/j.datak.2023.102257","DOIUrl":"10.1016/j.datak.2023.102257","url":null,"abstract":"<div><p>Security risk management<span><span> (SRM) is crucial for protecting valuable assets from malicious harm. While blockchain technology has been proposed to mitigate security threats in traditional applications, it is not a perfect solution, and its security threats must be managed. This paper addresses the research problem of having no unified and formal knowledge models to support the SRM of traditional applications using blockchain and the SRM of blockchain-based applications. In accordance with this, we present a blockchain-based reference model (BbRM) and an ontology driven reference framework (OntReF) for the SRM of traditional and blockchain-based applications. The BbRM consolidates security threats of traditional and blockchain-based applications, structured following the SRM domain model and offers guidance for creating the OntReF using the domain model. OntReF is grounded on unified foundational ontology (UFO) and provides semantic interoperability and supporting the dynamic knowledge representation and </span>instantiation of information security knowledge for the SRM. Our evaluation approaches demonstrate that OntReF is practical to use.</span></p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138534223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrated detection and localization of concept drifts in process mining with batch and stream trace clustering support 基于批和流轨迹聚类支持的过程挖掘中概念漂移的集成检测与定位
IF 2.5 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2023-12-02 DOI: 10.1016/j.datak.2023.102253
Rafael Gaspar de Sousa , Antonio Carlos Meira Neto , Marcelo Fantinato , Sarajane Marques Peres , Hajo Alexander Reijers
{"title":"Integrated detection and localization of concept drifts in process mining with batch and stream trace clustering support","authors":"Rafael Gaspar de Sousa ,&nbsp;Antonio Carlos Meira Neto ,&nbsp;Marcelo Fantinato ,&nbsp;Sarajane Marques Peres ,&nbsp;Hajo Alexander Reijers","doi":"10.1016/j.datak.2023.102253","DOIUrl":"10.1016/j.datak.2023.102253","url":null,"abstract":"<div><p><span>Process mining can help organizations by extracting knowledge from event logs. However, process mining techniques often assume business processes are stationary, while actual business processes are constantly subject to change because of the complexity of organizations and their external environment. Thus, addressing process changes over time – known as </span><em>concept drifts</em><span><span><span><span> – allows for a better understanding of process behavior and can provide a competitive edge for organizations, especially in an online data stream scenario. Current approaches to handling process concept drift focus primarily on detecting and locating concept drifts, often through an integrated, albeit offline, approach. However, part of these integrated approaches rely on complex </span>data structures<span> related to tree-based process models, usually discovered through algorithms whose results are influenced by specific heuristic rules. Moreover, most of the proposed approaches have not been tested on public true concept drift-labeled event logs commonly used as benchmark, making comparative analysis difficult. In this article, we propose an online approach to detect and localize concept drifts in an integrated way using batch and stream trace clustering support. In our approach, cluster models provide input information for both concept drift detection and </span></span>localization methods. Each cluster abstracts a behavior profile underlying the process and reveals </span>descriptive information about the discovered concept drifts. Experiments with benchmark synthetic event logs with different control-flow changes, as well as with real-world event logs, showed that our approach, when relying on the same clustering model, is competitive in relation to baselines concept drift detection method. In addition, the experiment showed our approach is able to correctly locate the concept drifts detected and allows the analysis of such concept drifts through different process behavior profiles.</span></p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138534225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial for VSI:NLDB-saarbruecken-2021 VSI:NLDB-saarbruecken-2021 的社论
IF 2.5 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2023-11-30 DOI: 10.1016/j.datak.2023.102259
Helmut Horacek , Epaminondas Kapetanios , Elisabeth Metais , Farid Meziane
{"title":"Editorial for VSI:NLDB-saarbruecken-2021","authors":"Helmut Horacek ,&nbsp;Epaminondas Kapetanios ,&nbsp;Elisabeth Metais ,&nbsp;Farid Meziane","doi":"10.1016/j.datak.2023.102259","DOIUrl":"https://doi.org/10.1016/j.datak.2023.102259","url":null,"abstract":"","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138474745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepScraper: A complete and efficient tweet scraping method using authenticated multiprocessing DeepScraper:一个完整而高效的推文抓取方法,使用身份验证的多处理
IF 2.5 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2023-11-30 DOI: 10.1016/j.datak.2023.102260
Jaebeom You , Kisung Lee , Hyuk-Yoon Kwon
{"title":"DeepScraper: A complete and efficient tweet scraping method using authenticated multiprocessing","authors":"Jaebeom You ,&nbsp;Kisung Lee ,&nbsp;Hyuk-Yoon Kwon","doi":"10.1016/j.datak.2023.102260","DOIUrl":"10.1016/j.datak.2023.102260","url":null,"abstract":"<div><p>In this paper, we propose a scraping method for collecting tweets, which we call <em>DeepScraper</em><span>. DeepScraper provides the complete scraping for the entire tweets written by a certain group of users or them containing search keywords<span> with a fast speed. To improve the crawling speed of DeepScraper, we devise a multiprocessing architecture while providing authentication<span> to the multiple processes based on the simulation of the user access behavior to Twitter. This allows us to maximize the parallelism of crawling even in a single machine. Through extensive experiments, we show that DeepScraper can crawl the entire tweets of 99 users, which amounts to 5,798,052 tweets while Twitter standard API can crawl only 243,650 tweets of them due to the constraints of the number of tweets to scrape. In other words, DeepScraper could collect 23.7 times more tweets for the 99 users than the standard API. We also show the efficiency of DeepScraper. First, we show the effect of the authenticated multiprocessing by showing that it increases the crawling speed from 2.03</span></span></span><span><math><mo>∼</mo></math></span>10.57 times as the number of running processes increases from 2 to 32 compared to DeepScraper with a single process. Then, we compare the crawling speed of DeepScraper with the existing studies. The result shows that DeepScraper is compared to even Twitter standard APIs and Twitter4J while DeepScraper can scrape much more tweets than them. Furthermore, DeepScraper is much faster than Twitter Scrapy roughly 3.69 times while both can scrape the entire tweets for the target users or keywords.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138534226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
S_IDS: An efficient skyline query algorithm over incomplete data streams S_IDS:在不完整数据流上高效的skyline查询算法
IF 2.5 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2023-11-30 DOI: 10.1016/j.datak.2023.102258
Mei Bai, Yuxue Han, Peng Yin, Xite Wang, Guanyu Li, Bo Ning, Qian Ma
{"title":"S_IDS: An efficient skyline query algorithm over incomplete data streams","authors":"Mei Bai,&nbsp;Yuxue Han,&nbsp;Peng Yin,&nbsp;Xite Wang,&nbsp;Guanyu Li,&nbsp;Bo Ning,&nbsp;Qian Ma","doi":"10.1016/j.datak.2023.102258","DOIUrl":"10.1016/j.datak.2023.102258","url":null,"abstract":"<div><p>The efficient processing of mass stream data has attracted wide attention in the database field. The skyline query on the sensor data stream can monitor multiple targets in real time, to avoid abnormal events such as fire and explosion, which is very useful in the practical application of sensor data monitoring. However, real-world stream data may often contain incomplete data attributes due to faulty sensing devices or imperfect data collection techniques. Skyline queries over incomplete data streams may lead to a lack of transitivity and loop domination issues. To solve the problem of the skyline query over incomplete data streams, firstly, this paper uses differential dependency rule (DD) to fill the missing attribute values of data in the incomplete data stream. Then, the hierarchical grid index (HGrid) is introduced into the field of skyline query to improve pruning efficiency. In the process of skyline calculation, this paper only keeps as few calculation results as possible for the data that may affect the result to avoid a large number of repeated calculations. Thus, S_IDS (Skyline query algorithm over Incomplete Data Stream) is proposed to query skyline results with high confidence from the incomplete data stream. Finally, by comparing with the most advanced skyline query algorithms over incomplete data streams, the correctness and efficiency of the proposed S_IDS algorithm are proved.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138534229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable influenza forecasting scheme using DCC-based feature selection 基于dcc特征选择的可解释流感预测方案
IF 2.5 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2023-11-26 DOI: 10.1016/j.datak.2023.102256
Sungwoo Park , Jaeuk Moon , Seungwon Jung , Seungmin Rho , Eenjun Hwang
{"title":"Explainable influenza forecasting scheme using DCC-based feature selection","authors":"Sungwoo Park ,&nbsp;Jaeuk Moon ,&nbsp;Seungwon Jung ,&nbsp;Seungmin Rho ,&nbsp;Eenjun Hwang","doi":"10.1016/j.datak.2023.102256","DOIUrl":"https://doi.org/10.1016/j.datak.2023.102256","url":null,"abstract":"<div><p>As influenza is easily converted to another type of virus and spreads very quickly from person to person, it is more likely to develop into a pandemic. Even though vaccines are the most effective way to prevent influenza, it takes a lot of time to produce them. Due to this, there has been an imbalance in the supply and demand of influenza vaccines every year. For a smooth vaccine supply, it is necessary to accurately forecast vaccine demand at least three to six months in advance. So far, many machine learning-based predictive models have shown excellent performance. However, their use was limited due to performance deterioration due to inappropriate training data and inability to explain the results. To solve these problems, in this paper, we propose an explainable influenza forecasting model. In particular, the model selects highly related data based on the distance correlation coefficient for effective training and explains the prediction results using shapley additive explanations. We evaluated its performance through extensive experiments. We report some of the results.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138471983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A-MKMC: An effective adaptive-based multilevel K-means clustering with optimal centroid selection using hybrid heuristic approach for handling the incomplete data A-MKMC:一种有效的基于自适应的多级k -均值聚类方法,采用混合启发式方法进行最优质心选择,用于处理不完整数据
IF 2.5 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2023-11-22 DOI: 10.1016/j.datak.2023.102243
Hima Vijayan , Subramaniam M , Sathiyasekar K
{"title":"A-MKMC: An effective adaptive-based multilevel K-means clustering with optimal centroid selection using hybrid heuristic approach for handling the incomplete data","authors":"Hima Vijayan ,&nbsp;Subramaniam M ,&nbsp;Sathiyasekar K","doi":"10.1016/j.datak.2023.102243","DOIUrl":"10.1016/j.datak.2023.102243","url":null,"abstract":"<div><p><span><span>In general, clustering is defined as partitioning similar and dissimilar objects into several groups. It has been widely used in applications like pattern recognition, image processing, and data analysis. When the dataset contains some missing data or value, it is termed incomplete data. In such implications, the incomplete dataset issue is untreatable while validating the data. Due to these flaws, the quality or standard level of the data gets an impact. Hence, the handling of missing values is done by influencing the clustering mechanisms for sorting out the missing data. Yet, the traditional </span>clustering algorithms<span> fail to combat the issues as it is not supposed to maintain large dimensional data. It is also caused by errors of human intervention or inaccurate outcomes. To alleviate the challenging issue of incomplete data, a novel clustering algorithm is proposed. Initially, incomplete or mixed data is garnered from the five different standard data sources. Once the data is to be collected, it is undergone the pre-processing phase, which is accomplished using data normalization. Subsequently, the final step is processed by the new clustering algorithm that is termed Adaptive centroid based Multilevel K-Means Clustering (A-MKMC), in which the cluster centroid is optimized by integrating the two conventional algorithms such as Border Collie Optimization (BCO) and </span></span>Whale Optimization Algorithm<span> (WOA) named as Hybrid Border Collie Whale Optimization (HBCWO). Therefore, the validation of the novel clustering model is estimated using various measures and compared against traditional mechanisms. From the overall result analysis, the accuracy and precision of the designed HBCWO-A-MKMC method attain 93 % and 95 %. Hence, the adaptive clustering process exploits the higher performance that aids in sorting out the missing data issuecompared to the other conventional methods.</span></p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138534224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Global and item-by-item reasoning fusion-based multi-hop KGQA 基于全局逐项推理融合的多跳KGQA
IF 2.5 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2023-11-20 DOI: 10.1016/j.datak.2023.102244
Tongzhao Xu, Turdi Tohti, Askar Hamdulla
{"title":"Global and item-by-item reasoning fusion-based multi-hop KGQA","authors":"Tongzhao Xu,&nbsp;Turdi Tohti,&nbsp;Askar Hamdulla","doi":"10.1016/j.datak.2023.102244","DOIUrl":"https://doi.org/10.1016/j.datak.2023.102244","url":null,"abstract":"<div><p><span><span><span>Existing embedded multi-hop Question Answering over Knowledge Graph (KGQA) methods attempted to handle Knowledge Graph (KG) sparsity using Knowledge Graph Embedding (KGE) to improve KGQA. However, they almost ignore the intermediate path reasoning process of answer prediction, do not consider the information interaction between the question and the KG, and rarely consider the problem that the triple scoring reasoning mechanism is inadequate in extracting deep features. To address the above issues, this paper proposes Global and Item-by-item Reasoning Fusion-based Multi-hop KGQA (GIRFM-KGQA). In global reasoning, a convolutional attention reasoning mechanism is proposed and fused with the triple scoring reasoning mechanism to jointly implement global reasoning, thus enhancing the long-chain reasoning ability of the global reasoning model. In item-by-item reasoning, the reasoning path is formed by serially predicting relations, and then the answer is predicted, which effectively solves the problem that the embedded multi-hop KGQA method lacks the intermediate path reasoning ability. In addition, we introduce an information interaction method between the question and the KG to improve the accuracy of the answer prediction. Finally, we merge the global reasoning score with the item-by-item reasoning score to jointly predict the answer. Our model, compared to the </span>baseline model (EmbedKGQA), achieves an accuracy improvement of 0.5% and 2.7% on two-hop questions, and 6.2% and 4.6% on three-hop questions for the MetaQA_Full and MetaQA_Half datasets, and 1.7% on the WebQuestionSP dataset, respectively. The experimental results show that the proposed model can effectively improve the accuracy of the multi-hop KGQA model and enhance the </span>interpretability<span> of the model. We have made our model’s source code available at github: </span></span><span>https://github.com/feixiongfeixiong/GIRFM</span><svg><path></path></svg>.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138430916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The power and potentials of Flexible Query Answering Systems: A critical and comprehensive analysis 灵活的查询应答系统的力量和潜力:一个批判和全面的分析
IF 2.5 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2023-11-19 DOI: 10.1016/j.datak.2023.102246
Troels Andreasen , Gloria Bordogna , Guy De Tré , Janusz Kacprzyk , Henrik Legind Larsen , Sławomir Zadrożny
{"title":"The power and potentials of Flexible Query Answering Systems: A critical and comprehensive analysis","authors":"Troels Andreasen ,&nbsp;Gloria Bordogna ,&nbsp;Guy De Tré ,&nbsp;Janusz Kacprzyk ,&nbsp;Henrik Legind Larsen ,&nbsp;Sławomir Zadrożny","doi":"10.1016/j.datak.2023.102246","DOIUrl":"https://doi.org/10.1016/j.datak.2023.102246","url":null,"abstract":"<div><p>The popularity of chatbots, such as ChatGPT, has brought research attention to question answering systems, capable to generate natural language answers to user’s natural language queries. However, also in other kinds of systems, flexibility of querying, including but also going beyond the use of natural language, is an important feature. With this consideration in mind the paper presents a critical and comprehensive analysis of recent developments, trends and challenges of Flexible Query Answering Systems (FQASs). Flexible query answering is a multidisciplinary research field that is not limited to question answering in natural language, but comprises other query forms and interaction modalities, which aim to provide powerful means and techniques for better reflecting human preferences and intentions to retrieve relevant information. It adopts methods at the crossroad of several disciplines among which Information Retrieval (IR), databases, knowledge based systems, knowledge and data engineering, Natural Language Processing (NLP) and the semantic web may be mentioned. The analysis principles are inspired by the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) framework, characterized by a top-down process, starting with relevant keywords for the topic of interest to retrieve relevant articles from meta-sources And complementing these articles with other relevant articles from seed sources Identified by a bottom-up process. to mine the retrieved publication data a network analysis is performed Which allows to present in a synthetic way intrinsic topics of the selected publications. issues dealt with are related to query answering methods Both model-based and data-driven (the latter based on either machine learning or deep learning) And to their needs for explainability and fairness to deal with big data Notably by taking into account data veracity. conclusions point out trends and challenges to help better shaping the future of the FQAS field.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X23001064/pdfft?md5=a520b95a7109e1b8dddc31cb9594841b&pid=1-s2.0-S0169023X23001064-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138471982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CALEB: A Conditional Adversarial Learning Framework to enhance bot detection 迦勒:一个条件对抗学习框架,以增强机器人检测
IF 2.5 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2023-11-14 DOI: 10.1016/j.datak.2023.102245
Ilias Dimitriadis, George Dialektakis, Athena Vakali
{"title":"CALEB: A Conditional Adversarial Learning Framework to enhance bot detection","authors":"Ilias Dimitriadis,&nbsp;George Dialektakis,&nbsp;Athena Vakali","doi":"10.1016/j.datak.2023.102245","DOIUrl":"10.1016/j.datak.2023.102245","url":null,"abstract":"<div><p><span>The high growth of Online Social Networks<span> (OSNs) over the last few years has allowed automated accounts, known as social bots, to gain ground. As highlighted by other researchers, many of these bots have malicious purposes and tend to mimic human behavior, posing high-level security threats on OSN platforms. Moreover, recent studies have shown that social bots evolve over time by reforming and reinventing unforeseen and sophisticated characteristics, making them capable of evading the current machine learning<span> state-of-the-art bot detection systems. This work is motivated by the critical need to establish adaptive bot detection methods in order to proactively capture unseen evolved bots towards healthier OSNs interactions. In contrast with most earlier supervised ML approaches which are limited by the inability to effectively detect new types of bots, this paper proposes CALEB, a robust end-to-end proactive framework based on the Conditional </span></span></span>Generative Adversarial Network<span><span> (CGAN) and its extension, Auxiliary Classifier GAN (AC-GAN), to simulate bot evolution by creating realistic synthetic instances of different bot types. These simulated evolved bots augment existing bot datasets and therefore enhance the detection of emerging generations of bots before they even appear. Furthermore, we show that our augmentation approach overpasses other earlier augmentation techniques which fail at simulating evolving bots. Extensive experimentation on well established public bot datasets, show that our approach offers a performance boost of up to 10% regarding the detection of new unseen bots. Finally, the use of the AC-GAN </span>Discriminator as a bot detector, has outperformed former ML approaches, showcasing the efficiency of our end to end framework.</span></p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135763694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信