Big Data Mining and Analytics最新文献_第2页

IoTDQ: An Industrial IoT Data Analysis Library for Apache IoTDB IoTDQ：适用于 Apache IoTDB 的工业物联网数据分析库

1区计算机科学

Big Data Mining and Analytics Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020010

Pengyu Chen;Wendi He;Wenxuan Ma;Xiangdong Huang;Chen Wang

引用次数: 0

Call for Papers: Special Issue on Challenges and Opportunities in Biomedical Big Data Analysis: From Large Language Models to Clinical Applications 征稿：生物医学大数据分析的挑战与机遇特刊：从大型语言模型到临床应用

1区计算机科学

Big Data Mining and Analytics Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020026

引用次数: 0

Molecular Generation and Optimization of Molecular Properties Using a Transformer Model 分子生成和使用变压器模型优化分子特性

1区计算机科学

Big Data Mining and Analytics Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020009

Zhongyin Xu;Xiujuan Lei;Mei Ma;Yi Pan

{"title":"Molecular Generation and Optimization of Molecular Properties Using a Transformer Model","authors":"Zhongyin Xu;Xiujuan Lei;Mei Ma;Yi Pan","doi":"10.26599/BDMA.2023.9020009","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020009","url":null,"abstract":"Generating novel molecules to satisfy specific properties is a challenging task in modern drug discovery, which requires the optimization of a specific objective based on satisfying chemical rules. Herein, we aim to optimize the properties of a specific molecule to satisfy the specific properties of the generated molecule. The Matched Molecular Pairs (MMPs), which contain the source and target molecules, are used herein, and logD and solubility are selected as the optimization properties. The main innovative work lies in the calculation related to a specific transformer from the perspective of a matrix dimension. Threshold intervals and state changes are then used to encode logD and solubility for subsequent tests. During the experiments, we screen the data based on the proportion of heavy atoms to all atoms in the groups and select 12 365, 1503, and 1570 MMPs as the training, validation, and test sets, respectively. Transformer models are compared with the baseline models with respect to their abilities to generate molecules with specific properties. Results show that the transformer model can accurately optimize the source molecules to satisfy specific properties.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"142-155"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10373001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Incremental Data Stream Classification with Adaptive Multi-Task Multi-View Learning 利用自适应多任务多视图学习进行增量数据流分类

1区计算机科学

Big Data Mining and Analytics Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020006

Jun Wang;Maiwang Shi;Xiao Zhang;Yan Li;Yunsheng Yuan;Chenglei Yang;Dongxiao Yu

{"title":"Incremental Data Stream Classification with Adaptive Multi-Task Multi-View Learning","authors":"Jun Wang;Maiwang Shi;Xiao Zhang;Yan Li;Yunsheng Yuan;Chenglei Yang;Dongxiao Yu","doi":"10.26599/BDMA.2023.9020006","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020006","url":null,"abstract":"With the enhancement of data collection capabilities, massive streaming data have been accumulated in numerous application scenarios. Specifically, the issue of classifying data streams based on mobile sensors can be formalized as a multi-task multi-view learning problem with a specific task comprising multiple views with shared features collected from multiple sensors. Existing incremental learning methods are often single-task single-view, which cannot learn shared representations between relevant tasks and views. An adaptive multi-task multi-view incremental learning framework for data stream classification called MTMVIS is proposed to address the above challenges, utilizing the idea of multi-task multi-view learning. Specifically, the attention mechanism is first used to align different sensor data of different views. In addition, MTMVIS uses adaptive Fisher regularization from the perspective of multi-task multi-view learning to overcome catastrophic forgetting in incremental learning. Results reveal that the proposed framework outperforms state-of-the-art methods based on the experiments on two different datasets with other baselines.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"87-106"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10373002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discriminatively Constrained Semi-Supervised Multi-View Nonnegative Matrix Factorization with Graph Regularization 带图正则化的判别约束半监督多视图非负矩阵因式分解

1区计算机科学

Big Data Mining and Analytics Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020004

Guosheng Cui;Ye Li;Jianzhong Li;Jianping Fan

{"title":"Discriminatively Constrained Semi-Supervised Multi-View Nonnegative Matrix Factorization with Graph Regularization","authors":"Guosheng Cui;Ye Li;Jianzhong Li;Jianping Fan","doi":"10.26599/BDMA.2023.9020004","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020004","url":null,"abstract":"Nonnegative Matrix Factorization (NMF) is one of the most popular feature learning technologies in the field of machine learning and pattern recognition. It has been widely used and studied in the multi-view clustering tasks because of its effectiveness. This study proposes a general semi-supervised multi-view nonnegative matrix factorization algorithm. This algorithm incorporates discriminative and geometric information on data to learn a better-fused representation, and adopts a feature normalizing strategy to align the different views. Two specific implementations of this algorithm are developed to validate the effectiveness of the proposed framework: Graph regularization based Discriminatively Constrained Multi-View Nonnegative Matrix Factorization (GDCMVNMF) and Extended Multi-View Constrained Nonnegative Matrix Factorization (ExMVCNMF). The intrinsic connection between these two specific implementations is discussed, and the optimization based on multiply update rules is presented. Experiments on six datasets show that the effectiveness of GDCMVNMF and ExMVCNMF outperforms several representative unsupervised and semi-supervised multi-view NMF approaches.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"55-74"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372950","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

QAR Data Imputation Using Generative Adversarial Network with Self-Attention Mechanism 利用具有自我关注机制的生成对抗网络进行 QAR 数据推算

1区计算机科学

Big Data Mining and Analytics Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020001

Jingqi Zhao;Chuitian Rong;Xin Dang;Huabo Sun

{"title":"QAR Data Imputation Using Generative Adversarial Network with Self-Attention Mechanism","authors":"Jingqi Zhao;Chuitian Rong;Xin Dang;Huabo Sun","doi":"10.26599/BDMA.2023.9020001","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020001","url":null,"abstract":"Quick Access Recorder (QAR), an important device for storing data from various flight parameters, contains a large amount of valuable data and comprehensively records the real state of the airline flight. However, the recorded data have certain missing values due to factors, such as weather and equipment anomalies. These missing values seriously affect the analysis of QAR data by aeronautical engineers, such as airline flight scenario reproduction and airline flight safety status assessment. Therefore, imputing missing values in the QAR data, which can further guarantee the flight safety of airlines, is crucial. QAR data also have multivariate, multiprocess, and temporal features. Therefore, we innovatively propose the imputation models A-AEGAN (“A” denotes attention mechanism, “AE” denotes autoencoder, and “GAN” denotes generative adversarial network) and SA-AEGAN (“SA” denotes self-attentive mechanism) for missing values of QAR data, which can be effectively applied to QAR data. Specifically, we apply an innovative generative adversarial network to impute missing values from QAR data. The improved gated recurrent unit is then introduced as the neural unit of GAN, which can successfully capture the temporal relationships in QAR data. In addition, we modify the basic structure of GAN by using an autoencoder as the generator and a recurrent neural network as the discriminator. The missing values in the QAR data are imputed by using the adversarial relationship between generator and discriminator. We introduce an attention mechanism in the autoencoder to further improve the capability of the proposed model to capture the features of QAR data. Attention mechanisms can maintain the correlation among QAR data and improve the capability of the model to impute missing data. Furthermore, we improve the proposed model by integrating a self-attention mechanism to further capture the relationship between different parameters within the QAR data. Experimental results on real datasets demonstrate that the model can reasonably impute the missing values in QAR data with excellent results.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"12-28"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372953","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Smart Meter Data Encryption Scheme Based on Distributed Differential Privacy 基于分布式差分隐私的多智能电表数据加密方案

1区计算机科学

Big Data Mining and Analytics Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020008

Renwu Yan;Yang Zheng;Ning Yu;Cen Liang

{"title":"Multi-Smart Meter Data Encryption Scheme Based on Distributed Differential Privacy","authors":"Renwu Yan;Yang Zheng;Ning Yu;Cen Liang","doi":"10.26599/BDMA.2023.9020008","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020008","url":null,"abstract":"Under the general trend of the rapid development of smart grids, data security and privacy are facing serious challenges; protecting the privacy data of single users under the premise of obtaining user-aggregated data has attracted widespread attention. In this study, we propose an encryption scheme on the basis of differential privacy for the problem of user privacy leakage when aggregating data from multiple smart meters. First, we use an improved homomorphic encryption method to realize the encryption aggregation of users' data. Second, we propose a double-blind noise addition protocol to generate distributed noise through interaction between users and a cloud platform to prevent semi-honest participants from stealing data by colluding with one another. Finally, the simulation results show that the proposed scheme can encrypt the transmission of multi-intelligent meter data under the premise of satisfying the differential privacy mechanism. Even if an attacker has enough background knowledge, the security of the electricity information of one another can be ensured.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"131-141"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372998","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Diagnosis and Detection of Alzheimer's Disease Using Learning Algorithm 基于学习算法的阿尔茨海默病诊断与检测

IF 13.6 1区计算机科学

Big Data Mining and Analytics Pub Date : 2023-12-01 DOI: 10.26599/bdma.2022.9020049

G. Shukla, Santosh Kumar, S. Pandey, Rohit Agarwal, Neeraj Varshney, Ankit Kumar

引用次数: 0

Replication-Based Query Management for Resource Allocation Using Hadoop and MapReduce over Big Data 利用Hadoop和MapReduce实现基于复制的大数据资源分配查询管理

IF 13.6 1区计算机科学

Big Data Mining and Analytics Pub Date : 2023-08-29 DOI: 10.26599/BDMA.2022.9020026

Ankit Kumar;Neeraj Varshney;Surbhi Bhatiya;Kamred Udham Singh

{"title":"Replication-Based Query Management for Resource Allocation Using Hadoop and MapReduce over Big Data","authors":"Ankit Kumar;Neeraj Varshney;Surbhi Bhatiya;Kamred Udham Singh","doi":"10.26599/BDMA.2022.9020026","DOIUrl":"10.26599/BDMA.2022.9020026","url":null,"abstract":"We live in an age where everything around us is being created. Data generation rates are so scary, creating pressure to implement costly and straightforward data storage and recovery processes. MapReduce model functionality is used for creating a cluster parallel, distributed algorithm, and large datasets. The MapReduce strategy from Hadoop helps develop a community of non-commercial use to offer a new algorithm for resolving such problems for commercial applications as expected from this working algorithm with insights as a result of disproportionate or discriminatory Hadoop cluster results. Expected results are obtained in the work and the exam conducted under this job; many of them are scheduled to set schedules, match matrices' data positions, clustering before determining to click, and accurate mapping and internal reliability to be closed together to avoid running and execution times. Mapper output and proponents have been implemented, and the map has been used to reduce the function. The execution input key/value pair and output key/value pair have been set. This paper focuses on evaluating this technique for the efficient retrieval of large volumes of data. The technique allows for capabilities to inform a massive database of information, from storage and indexing techniques to the distribution of queries, scalability, and performance in heterogeneous environments. The results show that the proposed work reduces the data processing time by 30%.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"465-477"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233249.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49356278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Clinical Data Analysis Based Diagnostic Systems for Heart Disease Prediction Using Ensemble Method 基于临床数据分析的心脏病集成预测诊断系统

IF 13.6 1区计算机科学

Big Data Mining and Analytics Pub Date : 2023-08-29 DOI: 10.26599/BDMA.2022.9020052

Ankit Kumar;Kamred Udham Singh;Manish Kumar

{"title":"A Clinical Data Analysis Based Diagnostic Systems for Heart Disease Prediction Using Ensemble Method","authors":"Ankit Kumar;Kamred Udham Singh;Manish Kumar","doi":"10.26599/BDMA.2022.9020052","DOIUrl":"10.26599/BDMA.2022.9020052","url":null,"abstract":"The correct diagnosis of heart disease can save lives, while the incorrect diagnosis can be lethal. The UCI machine learning heart disease dataset compares the results and analyses of various machine learning approaches, including deep learning. We used a dataset with 13 primary characteristics to carry out the research. Support vector machine and logistic regression algorithms are used to process the datasets, and the latter displays the highest accuracy in predicting coronary disease. Python programming is used to process the datasets. Multiple research initiatives have used machine learning to speed up the healthcare sector. We also used conventional machine learning approaches in our investigation to uncover the links between the numerous features available in the dataset and then used them effectively in anticipation of heart infection risks. Using the accuracy and confusion matrix has resulted in some favorable outcomes. To get the best results, the dataset contains certain unnecessary features that are dealt with using isolation logistic regression and Support Vector Machine (SVM) classification.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"513-525"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233243.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42487577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0