Journal of Big Data最新文献

筛选
英文 中文
Hybrid wrapper feature selection method based on genetic algorithm and extreme learning machine for intrusion detection 基于遗传算法和极端学习机的混合包装特征选择方法用于入侵检测
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-02-01 DOI: 10.1186/s40537-024-00887-9
{"title":"Hybrid wrapper feature selection method based on genetic algorithm and extreme learning machine for intrusion detection","authors":"","doi":"10.1186/s40537-024-00887-9","DOIUrl":"https://doi.org/10.1186/s40537-024-00887-9","url":null,"abstract":"<h3>Abstract</h3> <p>Intrusion detection systems play a critical role in the mitigation of cyber-attacks on the Internet of Things (IoT) environment. Due to the integration of many devices within the IoT environment, a huge amount of data is generated. The generated data sets in most cases consist of irrelevant and redundant features that affect the performance of the existing intrusion detection systems (IDS). The selection of optimal features plays a critical role in the enhancement of intrusion detection systems. This study proposes a sequential feature selection approach using an optimized extreme learning machine (ELM) with an SVM (support vector machine) classifier. The main challenge of ELM is the selection of the input parameters, which affect its performance. In this study, the genetic algorithm (GA) is used to optimize the weights of ELM to boost its performance. After the optimization, the algorithm is applied as an estimator in the sequential forward selection (wrapper technique) to select key features. The final obtained feature subset is applied for classification using SVM. The IoT_ToN network and UNSWNB15 datasets were used to test the model's performance. The performance of the model was compared with other existing state-of-the-art classifiers such as k-nearest neighbors, gradient boosting, random forest, and decision tree. The model had the best quality of the selected feature subset. The results indicate that the proposed model had a better intrusion detection performance with 99%, and 86% accuracy for IoT_ToN network dataset and UNSWNB15 datasets, respectively. The model can be used as a promising tool for enhancing the classification performance of IDS datasets.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"7 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139667765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel approach for detecting deep fake videos using graph neural network 利用图神经网络检测深度伪造视频的新方法
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-02-01 DOI: 10.1186/s40537-024-00884-y
{"title":"A novel approach for detecting deep fake videos using graph neural network","authors":"","doi":"10.1186/s40537-024-00884-y","DOIUrl":"https://doi.org/10.1186/s40537-024-00884-y","url":null,"abstract":"<h3>Abstract</h3> <p>Deep fake technology has emerged as a double-edged sword in the digital world. While it holds potential for legitimate uses, it can also be exploited to manipulate video content, causing severe social and security concerns. The research gap lies in the fact that traditional deep fake detection methods, such as visual quality analysis or inconsistency detection, need help to keep up with the rapidly advancing technology used to create deep fakes. That means there's a need for more sophisticated detection techniques. This paper introduces an enhanced approach for detecting deep fake videos using graph neural network (GNN). The proposed method splits the detection process into two phases: a mini-batch graph convolution network stream four-block CNN stream comprising Convolution, Batch Normalization, and Activation function. The final step is a flattening operation, which is essential for connecting the convolutional layers to the dense layer. The fusion of these two phases is performed using three different fusion networks: FuNet-A (additive fusion), FuNet-M (element-wise multiplicative fusion), and FuNet-C (concatenation fusion). The paper further evaluates the proposed model on different datasets, where it achieved an impressive training and validation accuracy of 99.3% after 30 epochs.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"14 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139667552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A machine learning-based credit risk prediction engine system using a stacked classifier and a filter-based feature selection method 基于机器学习的信用风险预测引擎系统,使用堆叠分类器和基于过滤器的特征选择方法
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-02-01 DOI: 10.1186/s40537-024-00882-0
Ileberi Emmanuel, Yanxia Sun, Zenghui Wang
{"title":"A machine learning-based credit risk prediction engine system using a stacked classifier and a filter-based feature selection method","authors":"Ileberi Emmanuel, Yanxia Sun, Zenghui Wang","doi":"10.1186/s40537-024-00882-0","DOIUrl":"https://doi.org/10.1186/s40537-024-00882-0","url":null,"abstract":"<p>Credit risk prediction is a crucial task for financial institutions. The technological advancements in machine learning, coupled with the availability of data and computing power, has given rise to more credit risk prediction models in financial institutions. In this paper, we propose a stacked classifier approach coupled with a filter-based feature selection (FS) technique to achieve efficient credit risk prediction using multiple datasets. The proposed stacked model includes the following base estimators: Random Forest (RF), Gradient Boosting (GB), and Extreme Gradient Boosting (XGB). Furthermore, the estimators in the Stacked architecture were linked sequentially to extract the best performance. The filter- based FS method that is used in this research is based on information gain (IG) theory. The proposed algorithm was evaluated using the accuracy, the F1-Score and the Area Under the Curve (AUC). Furthermore, the Stacked algorithm was compared to the following methods: Artificial Neural Network (ANN), Decision Tree (DT), and k-Nearest Neighbour (KNN). The experimental results show that stacked model obtained AUCs of 0.934, 0.944 and 0.870 on the Australian, German and Taiwan datasets, respectively. These results, in conjunction with the accuracy and F1-score metrics, demonstrated that the proposed stacked classifier outperforms the individual estimators and other existing methods.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"62 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139667441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual channel and multi-scale adaptive morphological methods for infrared small targets 针对红外小目标的双通道和多尺度自适应形态学方法
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-02-01 DOI: 10.1186/s40537-024-00880-2
Ying-Bin Liu, Yu-Hui Zeng, Jian-Hua Qin
{"title":"Dual channel and multi-scale adaptive morphological methods for infrared small targets","authors":"Ying-Bin Liu, Yu-Hui Zeng, Jian-Hua Qin","doi":"10.1186/s40537-024-00880-2","DOIUrl":"https://doi.org/10.1186/s40537-024-00880-2","url":null,"abstract":"<p>Infrared small target detection is a challenging task. Morphological operators with a single structural element size are easily affected by complex background noise, and the detection performance is easily affected by multi-scale background noise environments. In order to enhance the detection performance of infrared small targets, we propose a dual channel and multi-scale adaptive morphological method (DMAM), which consists of three stages. Stages 1 and 2 are mainly used to suppress background noise, while stage 3 is mainly used to enhance the small target area. The multi-scale adaptive morphological operator is used to enhance the algorithm’s adaptability to complex background environments, and in order to further eliminate background noise, we have set up a dual channel module. The experimental results indicate that this method has shown superiority in both quantitative and qualitative aspects in comparison methods, and the effectiveness of each stage and module has been demonstrated in ablation experiments. The code and data of the paper are placed in https://pan.baidu.com/s/19psdwJoh-0MpPD41g6N_rw.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"224 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139667553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized Estimating Equations Boosting (GEEB) machine for correlated data 用于相关数据的广义估计方程提升(GEEB)机器
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-01-22 DOI: 10.1186/s40537-023-00875-5
{"title":"Generalized Estimating Equations Boosting (GEEB) machine for correlated data","authors":"","doi":"10.1186/s40537-023-00875-5","DOIUrl":"https://doi.org/10.1186/s40537-023-00875-5","url":null,"abstract":"<h3>Abstract</h3> <p>Rapid development in data science enables machine learning and artificial intelligence to be the most popular research tools across various disciplines. While numerous articles have shown decent predictive ability, little research has examined the impact of complex correlated data. We aim to develop a more accurate model under repeated measures or hierarchical data structures. Therefore, this study proposes a novel algorithm, the Generalized Estimating Equations Boosting (GEEB) machine, to integrate the gradient boosting technique into the benchmark statistical approach that deals with the correlated data, the generalized Estimating Equations (GEE). Unlike the previous gradient boosting utilizing all input features, we randomly select some input features when building the model to reduce predictive errors. The simulation study evaluates the predictive performance of the GEEB, GEE, eXtreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM) across several hierarchical structures with different sample sizes. Results suggest that the new strategy GEEB outperforms the GEE and demonstrates superior predictive accuracy than the SVM and XGBoost in most situations. An application to a real-world dataset, the Forest Fire Data, also revealed that the GEEB reduced mean squared errors by 4.5% to 25% compared to GEE, XGBoost, and SVM. This research also provides a freely available R function that could implement the GEEB machine effortlessly for longitudinal or hierarchical data.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"29 1 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139516440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bilingual video captioning model for enhanced video retrieval 用于增强视频检索的双语视频字幕模型
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-01-16 DOI: 10.1186/s40537-024-00878-w
{"title":"Bilingual video captioning model for enhanced video retrieval","authors":"","doi":"10.1186/s40537-024-00878-w","DOIUrl":"https://doi.org/10.1186/s40537-024-00878-w","url":null,"abstract":"<h3>Abstract</h3> <p>Many video platforms rely on the descriptions that uploaders provide for video retrieval. However, this reliance may cause inaccuracies. Although deep learning-based video captioning can resolve this problem, it has some limitations: (1) traditional keyframe extraction techniques do not consider video length/content, resulting in low accuracy, high storage requirements, and long processing times; (2) Arabic language support in video captioning is not extensive. This study proposes a new video captioning approach that uses an efficient keyframe extraction method and supports both Arabic and English. The proposed keyframe extraction technique uses time- and content-based approaches for better quality captions, fewer storage space requirements, and faster processing. The English and Arabic models use a sequence-to-sequence framework with long short-term memory in both the encoder and decoder. Both models were evaluated on caption quality using four metrics: bilingual evaluation understudy (BLEU), metric for evaluation of translation with explicit ORdering (METEOR), recall-oriented understudy of gisting evaluation (ROUGE-L), and consensus-based image description evaluation (CIDE-r). They were also evaluated using cosine similarity to determine their suitability for video retrieval. The results demonstrated that the English model performed better with regards to caption quality and video retrieval. In terms of BLEU, METEOR, ROUGE-L, and CIDE-r, the English model scored 47.18, 30.46, 62.07, and 59.98, respectively, whereas the Arabic model scored 21.65, 36.30, 44.897, and 45.52, respectively. According to the video retrieval, the English and Arabic models successfully retrieved 67% and 40% of the videos, respectively, with 20% similarity. These models have potential applications in storytelling, sports commentaries, and video surveillance.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"11 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139476825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Block size estimation for data partitioning in HPC applications using machine learning techniques 利用机器学习技术为高性能计算应用中的数据分区估算块大小
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-01-16 DOI: 10.1186/s40537-023-00862-w
Riccardo Cantini, Fabrizio Marozzo, Alessio Orsino, Domenico Talia, Paolo Trunfio, Rosa M. Badia, Jorge Ejarque, Fernando Vázquez-Novoa
{"title":"Block size estimation for data partitioning in HPC applications using machine learning techniques","authors":"Riccardo Cantini, Fabrizio Marozzo, Alessio Orsino, Domenico Talia, Paolo Trunfio, Rosa M. Badia, Jorge Ejarque, Fernando Vázquez-Novoa","doi":"10.1186/s40537-023-00862-w","DOIUrl":"https://doi.org/10.1186/s40537-023-00862-w","url":null,"abstract":"<p>The extensive use of HPC infrastructures and frameworks for running data-intensive applications has led to a growing interest in data partitioning techniques and strategies. In fact, application performance can be heavily affected by how data are partitioned, which in turn depends on the selected size for data blocks, i.e. the block size. Therefore, finding an effective partitioning, i.e. a suitable block size, is a key strategy to speed-up parallel data-intensive applications and increase scalability. This paper describes a methodology, namely BLEST-ML (BLock size ESTimation through Machine Learning), for block size estimation that relies on supervised machine learning techniques. The proposed methodology was evaluated by designing an implementation tailored to <i>dislib</i>, a distributed computing library highly focused on machine learning algorithms built on top of the PyCOMPSs framework. We assessed the effectiveness of the provided implementation through an extensive experimental evaluation considering different algorithms from dislib, datasets, and infrastructures, including the MareNostrum 4 supercomputer. The results we obtained show the ability of BLEST-ML to efficiently determine a suitable way to split a given dataset, thus providing a proof of its applicability to enable the efficient execution of data-parallel applications in high performance environments.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"12 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139498191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions 图神经网络综述:概念、架构、技术、挑战、数据集、应用和未来方向
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-01-16 DOI: 10.1186/s40537-023-00876-4
Bharti Khemani, Shruti Patil, Ketan Kotecha, Sudeep Tanwar
{"title":"A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions","authors":"Bharti Khemani, Shruti Patil, Ketan Kotecha, Sudeep Tanwar","doi":"10.1186/s40537-023-00876-4","DOIUrl":"https://doi.org/10.1186/s40537-023-00876-4","url":null,"abstract":"<p>Deep learning has seen significant growth recently and is now applied to a wide range of conventional use cases, including graphs. Graph data provides relational information between elements and is a standard data format for various machine learning and deep learning tasks. Models that can learn from such inputs are essential for working with graph data effectively. This paper identifies nodes and edges within specific applications, such as text, entities, and relations, to create graph structures. Different applications may require various graph neural network (GNN) models. GNNs facilitate the exchange of information between nodes in a graph, enabling them to understand dependencies within the nodes and edges. The paper delves into specific GNN models like graph convolution networks (GCNs), GraphSAGE, and graph attention networks (GATs), which are widely used in various applications today. It also discusses the message-passing mechanism employed by GNN models and examines the strengths and limitations of these models in different domains. Furthermore, the paper explores the diverse applications of GNNs, the datasets commonly used with them, and the Python libraries that support GNN models. It offers an extensive overview of the landscape of GNN research and its practical implementations.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"190 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139498236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cyberattack detection in wireless sensor networks using a hybrid feature reduction technique with AI and machine learning methods 利用人工智能和机器学习方法的混合特征缩减技术检测无线传感器网络中的网络攻击
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-01-13 DOI: 10.1186/s40537-023-00870-w
Mohamed H. Behiry, Mohammed Aly
{"title":"Cyberattack detection in wireless sensor networks using a hybrid feature reduction technique with AI and machine learning methods","authors":"Mohamed H. Behiry, Mohammed Aly","doi":"10.1186/s40537-023-00870-w","DOIUrl":"https://doi.org/10.1186/s40537-023-00870-w","url":null,"abstract":"","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"30 4","pages":"1-39"},"PeriodicalIF":8.1,"publicationDate":"2024-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gct-TTE: graph convolutional transformer for travel time estimation Gct-TTE:用于旅行时间估算的图卷积变换器
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-01-13 DOI: 10.1186/s40537-023-00841-1
Vladimir Mashurov, Vaagn Chopuryan, Vadim Porvatov, Arseny Ivanov, Natalia Semenova
{"title":"Gct-TTE: graph convolutional transformer for travel time estimation","authors":"Vladimir Mashurov, Vaagn Chopuryan, Vadim Porvatov, Arseny Ivanov, Natalia Semenova","doi":"10.1186/s40537-023-00841-1","DOIUrl":"https://doi.org/10.1186/s40537-023-00841-1","url":null,"abstract":"<p>This paper introduces a new transformer-based model for the problem of travel time estimation. The key feature of the proposed GCT-TTE architecture is the utilization of different data modalities capturing different properties of an input path. Along with the extensive study regarding the model configuration, we implemented and evaluated a sufficient number of actual baselines for path-aware and path-blind settings. The conducted computational experiments have confirmed the viability of our pipeline, which outperformed state-of-the-art models on both considered datasets. Additionally, GCT-TTE was deployed as a web service accessible for further experiments with user-defined routes.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"14 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139465372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信