数字市场图网络异常检测。

IF 2.6 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

PLoS ONE Pub Date : 2024-12-23 eCollection Date: 2024-01-01 DOI:10.1371/journal.pone.0315849

Agata Skorupka

{"title":"数字市场图网络异常检测。","authors":"Agata Skorupka","doi":"10.1371/journal.pone.0315849","DOIUrl":null,"url":null,"abstract":"The study examines different graph-based methods of detecting anomalous activities on digital markets, proposing the most efficient way to increase market actors' protection and reduce information asymmetry. Anomalies are defined below as both bots and fraudulent users (who can be both bots and real people). Methods are compared against each other, and state-of-the-art results from the literature and a new algorithm is proposed. The goal is to find an efficient method suitable for threat detection, both in terms of predictive performance and computational efficiency. It should scale well and remain robust on the advancements of the newest technologies. The article utilized three publicly accessible graph-based datasets: one describing the Twitter social network (TwiBot-20) and two describing Bitcoin cryptocurrency markets (Bitcoin OTC and Bitcoin Alpha). In the former, an anomaly is defined as a bot, as opposed to a human user, whereas in the latter, an anomaly is a user who conducted a fraudulent transaction, which may (but does not have to) imply being a bot. The study proves that graph-based data is a better-performing predictor than text data. It compares different graph algorithms to extract feature sets for anomaly detection models. It states that methods based on nodes' statistics result in better model performance than state-of-the-art graph embeddings. They also yield a significant improvement in computational efficiency. This often means reducing the time by hours or enabling modeling on significantly larger graphs (usually not feasible in the case of embeddings). On that basis, the article proposes its own graph-based statistics algorithm. Furthermore, using embeddings requires two engineering choices: the type of embedding and its dimension. The research examines whether there are types of graph embeddings and dimensions that perform significantly better than others. The solution turned out to be dataset-specific and needed to be tailored on a case-by-case basis, adding even more engineering overhead to using embeddings (building a leaderboard of grid of embedding instances, where each of them takes hours to be generated). This, again, speaks in favor of the proposed algorithm based on nodes' statistics. The research proposes its own efficient algorithm, which makes this engineering overhead redundant.","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"19 12","pages":"e0315849"},"PeriodicalIF":2.6000,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11666048/pdf/","citationCount":"0","resultStr":"{\"title\":\"Detecting anomalies in graph networks on digital markets.\",\"authors\":\"Agata Skorupka\",\"doi\":\"10.1371/journal.pone.0315849\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The study examines different graph-based methods of detecting anomalous activities on digital markets, proposing the most efficient way to increase market actors' protection and reduce information asymmetry. Anomalies are defined below as both bots and fraudulent users (who can be both bots and real people). Methods are compared against each other, and state-of-the-art results from the literature and a new algorithm is proposed. The goal is to find an efficient method suitable for threat detection, both in terms of predictive performance and computational efficiency. It should scale well and remain robust on the advancements of the newest technologies. The article utilized three publicly accessible graph-based datasets: one describing the Twitter social network (TwiBot-20) and two describing Bitcoin cryptocurrency markets (Bitcoin OTC and Bitcoin Alpha). In the former, an anomaly is defined as a bot, as opposed to a human user, whereas in the latter, an anomaly is a user who conducted a fraudulent transaction, which may (but does not have to) imply being a bot. The study proves that graph-based data is a better-performing predictor than text data. It compares different graph algorithms to extract feature sets for anomaly detection models. It states that methods based on nodes' statistics result in better model performance than state-of-the-art graph embeddings. They also yield a significant improvement in computational efficiency. This often means reducing the time by hours or enabling modeling on significantly larger graphs (usually not feasible in the case of embeddings). On that basis, the article proposes its own graph-based statistics algorithm. Furthermore, using embeddings requires two engineering choices: the type of embedding and its dimension. The research examines whether there are types of graph embeddings and dimensions that perform significantly better than others. The solution turned out to be dataset-specific and needed to be tailored on a case-by-case basis, adding even more engineering overhead to using embeddings (building a leaderboard of grid of embedding instances, where each of them takes hours to be generated). This, again, speaks in favor of the proposed algorithm based on nodes' statistics. The research proposes its own efficient algorithm, which makes this engineering overhead redundant.\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"19 12\",\"pages\":\"e0315849\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-12-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11666048/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0315849\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0315849","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

该研究考察了检测数字市场异常活动的不同基于图表的方法，提出了增加市场参与者保护和减少信息不对称的最有效方法。下面将异常定义为机器人和欺诈用户（既可以是机器人，也可以是真人）。对各种方法进行了比较，并从文献和新算法中提出了最新的结果。目标是在预测性能和计算效率方面找到一种适合于威胁检测的有效方法。它应该可以很好地扩展，并在最新技术的进步中保持稳健。这篇文章使用了三个可公开访问的基于图形的数据集：一个描述Twitter社交网络（TwiBot-20），两个描述比特币加密货币市场（比特币OTC和比特币Alpha）。在前者中，异常被定义为机器人，而不是人类用户，而在后者中，异常是进行欺诈性交易的用户，这可能（但不一定）意味着是机器人。该研究证明，基于图形的数据是比文本数据更好的预测器。它比较了不同的图算法来提取异常检测模型的特征集。它指出，基于节点统计的方法比最先进的图嵌入的模型性能更好。它们还显著提高了计算效率。这通常意味着减少几个小时的时间，或者在更大的图上启用建模（在嵌入的情况下通常是不可行的）。在此基础上，本文提出了自己的基于图的统计算法。此外，使用嵌入需要两个工程选择：嵌入的类型和它的尺寸。该研究考察了是否有一些类型的图嵌入和维度表现明显优于其他类型。解决方案是特定于数据集的，需要根据具体情况进行调整，这增加了使用嵌入的更多工程开销（构建嵌入实例网格的排行榜，其中每个实例都需要几个小时才能生成）。这再次支持了基于节点统计的算法。该研究提出了自己的高效算法，使这种工程开销变得多余。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detecting anomalies in graph networks on digital markets.

The study examines different graph-based methods of detecting anomalous activities on digital markets, proposing the most efficient way to increase market actors' protection and reduce information asymmetry. Anomalies are defined below as both bots and fraudulent users (who can be both bots and real people). Methods are compared against each other, and state-of-the-art results from the literature and a new algorithm is proposed. The goal is to find an efficient method suitable for threat detection, both in terms of predictive performance and computational efficiency. It should scale well and remain robust on the advancements of the newest technologies. The article utilized three publicly accessible graph-based datasets: one describing the Twitter social network (TwiBot-20) and two describing Bitcoin cryptocurrency markets (Bitcoin OTC and Bitcoin Alpha). In the former, an anomaly is defined as a bot, as opposed to a human user, whereas in the latter, an anomaly is a user who conducted a fraudulent transaction, which may (but does not have to) imply being a bot. The study proves that graph-based data is a better-performing predictor than text data. It compares different graph algorithms to extract feature sets for anomaly detection models. It states that methods based on nodes' statistics result in better model performance than state-of-the-art graph embeddings. They also yield a significant improvement in computational efficiency. This often means reducing the time by hours or enabling modeling on significantly larger graphs (usually not feasible in the case of embeddings). On that basis, the article proposes its own graph-based statistics algorithm. Furthermore, using embeddings requires two engineering choices: the type of embedding and its dimension. The research examines whether there are types of graph embeddings and dimensions that perform significantly better than others. The solution turned out to be dataset-specific and needed to be tailored on a case-by-case basis, adding even more engineering overhead to using embeddings (building a leaderboard of grid of embedding instances, where each of them takes hours to be generated). This, again, speaks in favor of the proposed algorithm based on nodes' statistics. The research proposes its own efficient algorithm, which makes this engineering overhead redundant.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage