{"title":"ROBO-SPOT: Detecting Robocalls by Understanding User Engagement and Connectivity Graph","authors":"Muhammad Ajmal Azad, J. Arshad, Farhan Riaz","doi":"10.26599/bdma.2023.9020020","DOIUrl":"https://doi.org/10.26599/bdma.2023.9020020","url":null,"abstract":"—Robo or unsolicited calls have become a persistent issue in telecommunication networks, posing significant challenges to individuals, businesses, and regulatory authorities. These calls not only trick users to disclose their private and financial information but also affect their productivity through unwanted phone ringing. A proactive approach to identify and block such unsolicited calls is essential to protect users and service providers from potential harm. Therein, this paper proposes a solution to identify robo-callers in the telephony network utilising a set of novel features to evaluate the trustworthiness of callers in a network. The trust score of the callers is then used along with machine learning models to classify them as legitimate or robo-caller. We used a large anonymized data set (call detailed records) from a large telecommunication provider containing more than 1 billion records collected over 10 days. We have conducted extensive evaluation demonstrating that the proposed approach achieves high accuracy and detection rate whilst minimizing the error rate. Specifically, the proposed features when used collectively achieve a true-positive rate of around 97% with a false-positive rate of less than 0.01%.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":null,"pages":null},"PeriodicalIF":13.6,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141233263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interpretable Detection of Malicious Behavior in Windows Portable Executables Using Multi-Head 2D Transformers","authors":"Sohail Khan, Mohammad Nauman","doi":"10.26599/bdma.2023.9020025","DOIUrl":"https://doi.org/10.26599/bdma.2023.9020025","url":null,"abstract":": Windows malware is becoming an increasingly pressing problem as the amount of malware continues to grow and more sensitive information is stored on systems. One of the major challenges in tackling this problem is the complexity of malware analysis, which requires expertise from human analysts. Recent developments in machine learning have led to the creation of deep models for malware detection. However, these models often lack transparency, making it difficult to understand the reasoning behind the model’s decisions, otherwise known as the black-box problem. To address these limitations, this paper presents a novel model for malware detection, utilizing vision transformers to analyze the opcode sequences of more than 350,000 Windows portable executable malware samples from real-world datasets. The model achieved a high accuracy of 0.9864, not only surpassing previous results but also providing valuable insights into the reasoning behind the classification. Our model is able to pinpoint specific instructions that lead to malicious behavior in malware samples, aiding human experts in their analysis and driving further advancements in the field. We report our findings and show how causality can be established between malicious code and actual classification by a deep learning model thus opening up this black-box problem for deeper analysis.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":null,"pages":null},"PeriodicalIF":13.6,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141231638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mona Ahamd Alghamdi, Abdullah S. Al-Malaise Al-Ghamdi, Mahmoud Ragab
{"title":"Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble","authors":"Mona Ahamd Alghamdi, Abdullah S. Al-Malaise Al-Ghamdi, Mahmoud Ragab","doi":"10.26599/bdma.2023.9020030","DOIUrl":"https://doi.org/10.26599/bdma.2023.9020030","url":null,"abstract":"","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":null,"pages":null},"PeriodicalIF":13.6,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141232601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiang Sun, Leilei Shi, Lu Liu, Zi-xuan Han, Liang Jiang, Yan Wu, Yeling Zhao
{"title":"A Novel Recommendation Algorithm Integrates Resource Allocation and Resource Transfer in Weighted Bipartite Network","authors":"Qiang Sun, Leilei Shi, Lu Liu, Zi-xuan Han, Liang Jiang, Yan Wu, Yeling Zhao","doi":"10.26599/bdma.2023.9020029","DOIUrl":"https://doi.org/10.26599/bdma.2023.9020029","url":null,"abstract":"","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":null,"pages":null},"PeriodicalIF":13.6,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141229821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identification of Proteins and Genes Associated with Hedgehog Signaling Pathway Involved in Neoplasm Formation Using Text-Mining Approach","authors":"","doi":"10.26599/BDMA.2023.9020007","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020007","url":null,"abstract":"Analysis of molecular mechanisms that lead to the development of various types of tumors is essential for biology and medicine, because it may help to find new therapeutic opportunities for cancer treatment and cure including personalized treatment approaches. One of the pathways known to be important for the development of neoplastic diseases and pathological processes is the Hedgehog signaling pathway that normally controls human embryonic development. Systematic accumulation of various types of biological data, including interactions between proteins, regulation of genes transcription, proteomics, and metabolomics experiments results, allows the application of computational analysis of these big data for identification of key molecular mechanisms of certain diseases and pathologies and promising therapeutic targets. The aim of this study is to develop a computational approach for revealing associations between human proteins and genes interacting with the Hedgehog pathway components, as well as for identifying their roles in the development of various types of tumors. We automatically collect sets of abstract texts from the NCBI PubMed bibliographic database. For recognition of the Hedgehog pathway proteins and genes and neoplastic diseases we use a dictionary-based named entity recognition approach, while for all other proteins and genes machine learning method is used. For association extraction, we develop a set of semantic rules. We complete the results of the text analysis with the gene set enrichment analysis. The identified key pathways that may influence the Hedgehog pathway and their roles in tumor development are then verified using the information in the literature.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10373000","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Limits of Depth: Over-Smoothing and Over-Squashing in GNNs","authors":"Aafaq Mohi ud din;Shaima Qureshi","doi":"10.26599/BDMA.2023.9020019","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020019","url":null,"abstract":"Graph Neural Networks (GNNs) have become a widely used tool for learning and analyzing data on graph structures, largely due to their ability to preserve graph structure and properties via graph representation learning. However, the effect of depth on the performance of GNNs, particularly isotropic and anisotropic models, remains an active area of research. This study presents a comprehensive exploration of the impact of depth on GNNs, with a focus on the phenomena of over-smoothing and the bottleneck effect in deep graph neural networks. Our research investigates the tradeoff between depth and performance, revealing that increasing depth can lead to over-smoothing and a decrease in performance due to the bottleneck effect. We also examine the impact of node degrees on classification accuracy, finding that nodes with low degrees can pose challenges for accurate classification. Our experiments use several benchmark datasets and a range of evaluation metrics to compare isotropic and anisotropic GNNs of varying depths, also explore the scalability of these models. Our findings provide valuable insights into the design of deep GNNs and offer potential avenues for future research to improve their performance.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372997","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PURP: A Scalable System for Predicting Short-Term Urban Traffic Flow Based on License Plate Recognition Data","authors":"Shan Zhang;Qinkai Jiang;Hao Li;Bin Cao;Jing Fan","doi":"10.26599/BDMA.2023.9020017","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020017","url":null,"abstract":"Accurate and efficient urban traffic flow prediction can help drivers identify road traffic conditions in real-time, consequently helping them avoid congestion and accidents to a certain extent. However, the existing methods for real-time urban traffic flow prediction focus on improving the model prediction accuracy or efficiency while ignoring the training efficiency, which results in a prediction system that lacks the scalability to integrate real-time traffic flow into the training procedure. To conduct accurate and real-time urban traffic flow prediction while considering the latest historical data and avoiding time-consuming online retraining, herein, we propose a scalable system for Predicting short-term URban traffic flow in real-time based on license Plate recognition data (PURP). First, to ensure prediction accuracy, PURP constructs the spatio-temporal contexts of traffic flow prediction from License Plate Recognition (LPR) data as effective characteristics. Subsequently, to utilize the recent data without retraining the model online, PURP uses the nonparametric method k-Nearest Neighbor (namely KNN) as the prediction framework because the KNN can efficiently identify the top-k most similar spatio-temporal contexts and make predictions based on these contexts without time-consuming model retraining online. The experimental results show that PURP retains strong prediction efficiency as the prediction period increases.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372996","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yassine El Moudene;Jaafar Idrais;Rida El Abassi;Abderrahim Sabour
{"title":"Gender-Based Analysis of User Reactions to Facebook Posts","authors":"Yassine El Moudene;Jaafar Idrais;Rida El Abassi;Abderrahim Sabour","doi":"10.26599/BDMA.2023.9020005","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020005","url":null,"abstract":"Online Social Networks (OSNs) are based on the sharing of different types of information and on various interactions (comments, reactions, and sharing). One of these important actions is the emotional reaction to the content. The diversity of reaction types available on Facebook (namely FB) enables users to express their feelings, and its traceability creates and enriches the users' emotional identity in the virtual world. This paper is based on the analysis of 119875012 FB reactions (Like, Love, Haha, Wow, Sad, Angry, Thankful, and Pride) made at multiple levels (publications, comments, and sub-comments) to study and classify the users' emotional behavior, visualize the distribution of different types of reactions, and analyze the gender impact on emotion generation. All of these can be achieved by addressing these research questions: who reacts the most? Which emotion is the most expressed?","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372951","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey on Event Tracking in Social Media Data Streams","authors":"Zixuan Han;Leilei Shi;Lu Liu;Liang Jiang;Jiawei Fang;Fanyuan Lin;Jinjuan Zhang;John Panneerselvam;Nick Antonopoulos","doi":"10.26599/BDMA.2023.9020021","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020021","url":null,"abstract":"Social networks are inevitable parts of our daily life, where an unprecedented amount of complex data corresponding to a diverse range of applications are generated. As such, it is imperative to conduct research on social events and patterns from the perspectives of conventional sociology to optimize services that originate from social networks. Event tracking in social networks finds various applications, such as network security and societal governance, which involves analyzing data generated by user groups on social networks in real time. Moreover, as deep learning techniques continue to advance and make important breakthroughs in various fields, researchers are using this technology to progressively optimize the effectiveness of Event Detection (ED) and tracking algorithms. In this regard, this paper presents an in-depth comprehensive review of the concept and methods involved in ED and tracking in social networks. We introduce mainstream event tracking methods, which involve three primary technical steps: ED, event propagation, and event evolution. Finally, we introduce benchmark datasets and evaluation metrics for ED and tracking, which allow comparative analysis on the performance of mainstream methods. Finally, we present a comprehensive analysis of the main research findings and existing limitations in this field, as well as future research prospects and challenges.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372955","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}