Detecting Pump-and-Dumps with Crypto-Assets: Dealing with Imbalanced Datasets and Insiders’ Anticipated Purchases

IF 1.1 Q3 ECONOMICS
Dean Fantazzini, Yufeng Xiao
{"title":"Detecting Pump-and-Dumps with Crypto-Assets: Dealing with Imbalanced Datasets and Insiders’ Anticipated Purchases","authors":"Dean Fantazzini, Yufeng Xiao","doi":"10.3390/econometrics11030022","DOIUrl":null,"url":null,"abstract":"Detecting pump-and-dump schemes involving cryptoassets with high-frequency data is challenging due to imbalanced datasets and the early occurrence of unusual trading volumes. To address these issues, we propose constructing synthetic balanced datasets using resampling methods and flagging a pump-and-dump from the moment of public announcement up to 60 min beforehand. We validated our proposals using data from Pumpolymp and the CryptoCurrency eXchange Trading Library to identify 351 pump signals relative to the Binance crypto exchange in 2021 and 2022. We found that the most effective approach was using the original imbalanced dataset with pump-and-dumps flagged 60 min in advance, together with a random forest model with data segmented into 30-s chunks and regressors computed with a moving window of 1 h. Our analysis revealed that a better balance between sensitivity and specificity could be achieved by simply selecting an appropriate probability threshold, such as setting the threshold close to the observed prevalence in the original dataset. Resampling methods were useful in some cases, but threshold-independent measures were not affected. Moreover, detecting pump-and-dumps in real-time involves high-dimensional data, and the use of resampling methods to build synthetic datasets can be time-consuming, making them less practical.","PeriodicalId":11499,"journal":{"name":"Econometrics","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Econometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/econometrics11030022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0

Abstract

Detecting pump-and-dump schemes involving cryptoassets with high-frequency data is challenging due to imbalanced datasets and the early occurrence of unusual trading volumes. To address these issues, we propose constructing synthetic balanced datasets using resampling methods and flagging a pump-and-dump from the moment of public announcement up to 60 min beforehand. We validated our proposals using data from Pumpolymp and the CryptoCurrency eXchange Trading Library to identify 351 pump signals relative to the Binance crypto exchange in 2021 and 2022. We found that the most effective approach was using the original imbalanced dataset with pump-and-dumps flagged 60 min in advance, together with a random forest model with data segmented into 30-s chunks and regressors computed with a moving window of 1 h. Our analysis revealed that a better balance between sensitivity and specificity could be achieved by simply selecting an appropriate probability threshold, such as setting the threshold close to the observed prevalence in the original dataset. Resampling methods were useful in some cases, but threshold-independent measures were not affected. Moreover, detecting pump-and-dumps in real-time involves high-dimensional data, and the use of resampling methods to build synthetic datasets can be time-consuming, making them less practical.
检测加密资产的泵和倾倒:处理不平衡的数据集和内部人士的预期购买
由于数据集不平衡和异常交易量的早期出现,检测涉及具有高频数据的加密资产的抽取和转储方案具有挑战性。为了解决这些问题,我们建议使用重采样方法构建合成平衡数据集,并从公开发布的那一刻到提前60分钟标记泵和转储。我们使用Pumpolymp和CryptoCurrency eXchange交易库的数据验证了我们的提议,以确定2021年和2022年与币安加密交易所相关的351个泵信号。我们发现,最有效的方法是使用原始的不平衡数据集,其中泵和转储提前60分钟标记,以及随机森林模型,其中数据被分割成30-s的块,回归器以1小时的移动窗口计算。我们的分析表明,只需选择适当的概率阈值,例如将阈值设置为接近原始数据集中观察到的流行率,就可以在敏感性和特异性之间实现更好的平衡。在某些情况下,重新采样方法是有用的,但与阈值无关的测量不受影响。此外,实时检测泵和转储涉及高维数据,使用重新采样方法构建合成数据集可能很耗时,因此不太实用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Econometrics
Econometrics Economics, Econometrics and Finance-Economics and Econometrics
CiteScore
2.40
自引率
20.00%
发文量
30
审稿时长
11 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信