{"title":"Call for Papers: Special Issue on Edge AI Empowered Giant Model Training","authors":"","doi":"","DOIUrl":"https://doi.org/","url":null,"abstract":"","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"526-526"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233251.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68007722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juli Yin;Linfeng Wei;Zhiquan Liu;Xi Yang;Hongliang Sun;Yudan Cheng;Jianbin Mai
{"title":"VDCM: A Data Collection Mechanism for Crowd Sensing in Vehicular Ad Hoc Networks","authors":"Juli Yin;Linfeng Wei;Zhiquan Liu;Xi Yang;Hongliang Sun;Yudan Cheng;Jianbin Mai","doi":"10.26599/BDMA.2022.9020041","DOIUrl":"10.26599/BDMA.2022.9020041","url":null,"abstract":"With the rapid development of mobile devices, aggregation security and efficiency topics are more important than past in crowd sensing. When collecting large-scale vehicle-provided data, the data transmitted via autonomous networks are publicly accessible to all attackers, which increases the risk of vehicle exposure. So we need to ensure data aggregation security. In addition, low aggregation efficiency will lead to insufficient sensing data, making the data unable to provide data mining services. Aiming at the problem of aggregation security and efficiency in large-scale data collection, this article proposes a data collection mechanism (VDCM) for crowd sensing in vehicular ad hoc networks (VANETs). The mechanism includes two mechanism assumptions and selects appropriate methods to reduce consumption. It selects sub mechanism 1 when there exist very few vehicles or the coalition cannot be formed, otherwise selects sub mechanism 2. Single aggregation is used to collect data in sub mechanism 1. In sub mechanism 2, cooperative vehicles are selected by using coalition formation strategy and auction cooperation agreement, and multi aggregation is used to collect data. Two sub mechanisms use Paillier homomorphic encryption technology to ensure the security of data aggregation. In addition, mechanism supplements the data update and scoring steps to increase the amount of available data. The performance analysis shows that the mechanism proposed in this paper can safely aggregate data and reduce consumption. The simulation results indicate that the proposed mechanism reduces time consumption and increases the amount of available data compared with existing mechanisms.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"391-403"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233240.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48342786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AI-Based Hybrid Models for Predicting Loan Risk in the Banking Sector","authors":"Vikas Kumar;Shaiku Shahida Saheb;Preeti;Atif Ghayas;Sunil Kumari;Jai Kishan Chandel;Saroj Kumar Pandey;Santosh Kumar","doi":"10.26599/BDMA.2022.9020037","DOIUrl":"10.26599/BDMA.2022.9020037","url":null,"abstract":"Every real-world scenario is now digitally replicated in order to reduce paperwork and human labor costs. Machine Learning (ML) models are also being used to make predictions in these applications. Accurate forecasting requires knowledge of these machine learning models and their distinguishing features. The datasets we use as input for each of these different types of ML models, yielding different results. The choice of an ML model for a dataset is critical. A loan risk model is used to show how ML models for a dataset can be linked together. The purpose of this study is to look into how we could use machine learning to quantify or forecast mortgage credit risk. This phrase refers to the process of evaluating massive amounts of data in order to derive useful information for making decisions in a variety of fields. If credit risk is considered, a method based on an examination of what caused and how mortgage credit risk affected credit defaults during the still-current economic crisis of 2021 will be tried. Various approaches to credit risk calculation will be examined, ranging from the most basic to the most complex. In addition, we will conduct a case study on a sample of mortgage loans and compare the results of three different analytical approaches, logistic regression, decision tree, and gradient boost to see which one produced the most commercially useful insights.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"478-490"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233246.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43857463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaodong Qu;Chengcheng Guan;Gang Xie;Zhiyi Tian;Keshav Sood;Chaoli Sun;Lei Cui
{"title":"Personalized Federated Learning for Heterogeneous Residential Load Forecasting","authors":"Xiaodong Qu;Chengcheng Guan;Gang Xie;Zhiyi Tian;Keshav Sood;Chaoli Sun;Lei Cui","doi":"10.26599/BDMA.2022.9020043","DOIUrl":"10.26599/BDMA.2022.9020043","url":null,"abstract":"Accurate load forecasting is critical for electricity production, transmission, and maintenance. Deep learning (DL) model has replaced other classical models as the most popular prediction models. However, the deep prediction model requires users to provide a large amount of private electricity consumption data, which has potential privacy risks. Edge nodes can federally train a global model through aggregation using federated learning (FL). As a novel distributed machine learning (ML) technique, it only exchanges model parameters without sharing raw data. However, existing forecasting methods based on FL still face challenges from data heterogeneity and privacy disclosure. Accordingly, we propose a user-level load forecasting system based on personalized federated learning (PFL) to address these issues. The obtained personalized model outperforms the global model on local data. Further, we introduce a novel differential privacy (DP) algorithm in the proposed system to provide an additional privacy guarantee. Based on the principle of generative adversarial network (GAN), the algorithm achieves the balance between privacy and prediction accuracy throughout the game. We perform simulation experiments on the real-world dataset and the experimental results show that the proposed system can comply with the requirement for accuracy and privacy in real load forecasting scenarios.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"421-432"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233242.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48886250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"K-Means Clustering with Local Distance Privacy","authors":"Mengmeng Yang;Longxia Huang;Chenghua Tang","doi":"10.26599/BDMA.2022.9020050","DOIUrl":"10.26599/BDMA.2022.9020050","url":null,"abstract":"With the development of information technology, a mass of data are generated every day. Collecting and analysing these data help service providers improve their services and gain an advantage in the fierce market competition. K-means clustering has been widely used for cluster analysis in real life. However, these analyses are based on users' data, which disclose users' privacy. Local differential privacy has attracted lots of attention recently due to its strong privacy guarantee and has been applied for clustering analysis. However, existing \u0000<tex>$K$</tex>\u0000-means clustering methods with local differential privacy protection cannot get an ideal clustering result due to the large amount of noise introduced to the whole dataset to ensure the privacy guarantee. To solve this problem, we propose a novel method that provides local distance privacy for users who participate in the clustering analysis. Instead of making the users' records in-distinguish from each other in high-dimensional space, we map the user's record into a one-dimensional distance space and make the records in such a distance space not be distinguished from each other. To be specific, we generate a noisy distance first and then synthesize the high-dimensional data record. We propose a Bounded Laplace Method (BLM) and a Cluster Indistinguishable Method (CIM) to sample such a noisy distance, which satisfies the local differential privacy guarantee and local d\u0000<inf>E</inf>\u0000-privacy guarantee, respectively. Furthermore, we introduce a way to generate synthetic data records in high-dimensional space. Our experimental evaluation results show that our methods outperform the traditional methods significantly.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"433-442"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233248.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46837075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Elastic Optimization for Stragglers in Edge Federated Learning","authors":"Khadija Sultana;Khandakar Ahmed;Bruce Gu;Hua Wang","doi":"10.26599/BDMA.2022.9020046","DOIUrl":"10.26599/BDMA.2022.9020046","url":null,"abstract":"To fully exploit enormous data generated by intelligent devices in edge computing, edge federated learning (EFL) is envisioned as a promising solution. The distributed collaborative training in EFL deals with delay and privacy issues compared to traditional centralized model training. However, the existence of straggling devices, responding slow to servers, degrades model performance. We consider data heterogeneity from two aspects: high dimensional data generated at edge devices where the number of features is greater than that of observations and the heterogeneity caused by partial device participation. With large number of features, computation overhead on the devices increases, causing edge devices to become stragglers. And incorporation of partial training results causes gradients to be diverged which further exaggerates when more training is performed to reach local optima. In this paper, we introduce elastic optimization methods for stragglers due to data heterogeneity in edge federated learning. Specifically, we define the problem of stragglers in EFL. Then, we formulate an optimization problem to be solved at edge devices. We customize a benchmark algorithm, FedAvg, to obtain a new elastic optimization algorithm (FedEN) which is applied in local training of edge devices. FedEN mitigates stragglers by having a balance between lasso and ridge penalization thereby generating sparse model updates and enforcing parameters as close as to local optima. We have evaluated the proposed model on MNIST and CIFAR-10 datasets. Simulated experiments demonstrate that our approach improves run time training performance by achieving average accuracy with less communication rounds. The results confirm the improved performance of our approach over benchmark algorithms.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"404-420"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233241.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47729450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A PLS-SEM Based Approach: Analyzing Generation Z Purchase Intention Through Facebook's Big Data","authors":"Vikas Kumar;Preeti;Shaiku Shahida Saheb;Sunil Kumari;Kanishka Pathak;Jai Kishan Chandel;Neeraj Varshney;Ankit Kumar","doi":"10.26599/BDMA.2022.9020033","DOIUrl":"10.26599/BDMA.2022.9020033","url":null,"abstract":"The objective of this paper is to provide a better rendition of Generation Z purchase intentions of retail products through Facebook. The study gyrated around the favorable attitude formation of Generation Z translating into intentions to purchase retail products through Facebook. The role of antecedents of attitude, namely enjoyment, credibility, and peer communication was also explored. The main purpose was to analyze the F-commerce pervasiveness (retail purchases through Facebook) among Generation Z in India and how could it be materialized effectively. A conceptual façade was proposed after trotting out germane and urbane literature. The study focused exclusively on Generation Z population. The data were statistically analyzed using partial least squares structural equation modelling. The study found the proposed conceptual model had a high prediction power of Generation Z intentions to purchase retail products through Facebook verifying the materialization of F-commerce. Enjoyment, credibility, and peer communication were proved to be good predictors of attitude (R\u0000<sup>2</sup>\u0000=0.589) and furthermore attitude was found to be a stellar antecedent to purchase intentions (R\u0000<sup>2</sup>\u0000=0.540).","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"491-503"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233245.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46940167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Youyang Qu;Lichuan Ma;Wenjie Ye;Xuemeng Zhai;Shui Yu;Yunfeng Li;David Smith
{"title":"Towards Privacy-Aware and Trustworthy Data Sharing Using Blockchain for Edge Intelligence","authors":"Youyang Qu;Lichuan Ma;Wenjie Ye;Xuemeng Zhai;Shui Yu;Yunfeng Li;David Smith","doi":"10.26599/BDMA.2023.9020012","DOIUrl":"10.26599/BDMA.2023.9020012","url":null,"abstract":"The popularization of intelligent healthcare devices and big data analytics significantly boosts the development of Smart Healthcare Networks (SHNs). To enhance the precision of diagnosis, different participants in SHNs share health data that contain sensitive information. Therefore, the data exchange process raises privacy concerns, especially when the integration of health data from multiple sources (linkage attack) results in further leakage. Linkage attack is a type of dominant attack in the privacy domain, which can leverage various data sources for private data mining. Furthermore, adversaries launch poisoning attacks to falsify the health data, which leads to misdiagnosing or even physical damage. To protect private health data, we propose a personalized differential privacy model based on the trust levels among users. The trust is evaluated by a defined community density, while the corresponding privacy protection level is mapped to controllable randomized noise constrained by differential privacy. To avoid linkage attacks in personalized differential privacy, we design a noise correlation decoupling mechanism using a Markov stochastic process. In addition, we build the community model on a blockchain, which can mitigate the risk of poisoning attacks during differentially private data transmission over SHNs. Extensive experiments and analysis on real-world datasets have testified the proposed model, and achieved better performance compared with existing research from perspectives of privacy protection and effectiveness.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"443-464"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233247.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48363167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"τSQWRL: A TSQL2-Like Query Language for Temporal Ontologies Generated from JSON Big Data","authors":"Zouhaier Brahmia;Fabio Grandi;Rafik Bouaziz","doi":"10.26599/BDMA.2022.9020044","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020044","url":null,"abstract":"Temporal ontologies allow to represent not only concepts, their properties, and their relationships, but also time-varying information through explicit versioning of definitions or through the four-dimensional perdurantist view. They are widely used to formally represent temporal data semantics in several applications belonging to different fields (e.g., Semantic Web, expert systems, knowledge bases, big data, and artificial intelligence). They facilitate temporal knowledge representation and discovery, with the support of temporal data querying and reasoning. However, there is no standard or consensual temporal ontology query language. In a previous work, we have proposed an approach named τJOWL (temporal OWL 2 from temporal JSON, where OWL 2 stands for “OWL 2 Web Ontology Language” and JSON stands for “JavaScript Object Notation”). τJOWL allows (1) to automatically build a temporal OWL 2 ontology of data, following the Closed World Assumption (CWA), from temporal JSON-based big data, and (2) to manage its incremental maintenance accommodating their evolution, in a temporal and multi-schema-version environment. In this paper, we propose a temporal ontology query language for rJOWL, named rSQWRL (temporal SQWRL), designed as a temporal extension of the ontology query language-Semantic Query-enhanced Web Rule Language (SQWRL). The new language has been inspired by the features of the consensual temporal query language TSQL2 (Temporal SQL2), well known in the temporal (relational) database community. The aim of the proposal is to enable and simplify the task of retrieving any desired ontology version or of specifying any (complex) temporal query on time-varying ontologies generated from time-varying big data. Some examples, in the Internet of Healthcare Things (IoHT) domain, are provided to motivate and illustrate our proposal.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 3","pages":"288-300"},"PeriodicalIF":13.6,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10097649/10097652.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67837480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Call for Papers: Special Issue on Intelligent Network Video Advances Based on Transformers","authors":"","doi":"10.26599/BDMA.2022.9020053","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020053","url":null,"abstract":"","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 3","pages":"390-390"},"PeriodicalIF":13.6,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10097649/10097663.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67838274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}