Big Data ResearchPub Date : 2024-05-28Epub Date: 2024-04-30DOI: 10.1016/j.bdr.2024.100462
Li Deng , Shihu Liu , Weihua Xu , Xianghong Lin
{"title":"Similarity Measurement for Graph Data: An Improved Centrality and Geometric Perspective-Based Approach","authors":"Li Deng , Shihu Liu , Weihua Xu , Xianghong Lin","doi":"10.1016/j.bdr.2024.100462","DOIUrl":"https://doi.org/10.1016/j.bdr.2024.100462","url":null,"abstract":"<div><p>How to make a precise similarity measurement for graph data is considered as highly recommended research in many fields. Hereinto, the so-named graph data is the coalition of patterns and edges that connect patterns. By taking both of pattern information and edge information into consideration, this paper introduces an improved centrality and geometric perspective-based approach to measure the similarity between any two graph data. Once these two graph data are projected into a plane, the pattern distance can be calculated by Euclid metric. With the help of the area composed by length of each edge and angle that constructed by the positive X-axis and the edge, the area-based edge distance is computed. To get better measurement, position-based edge distance is used to modify the edge distance. Up to now, the global distance between any two graph data can be determined by combining the above mentioned two distance results. Finally, the <span>letter dataset</span> is applied for experiment to examine the proposed similarity approach. The experimental results show that the proposed approach captures the similarity of graph data commendably and gets a tradeoff between time and precision.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100462"},"PeriodicalIF":3.3,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140824127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2024-05-28Epub Date: 2024-05-08DOI: 10.1016/j.bdr.2024.100465
Mughair Aslam Bhatti , M.S. Syam , Huafeng Chen , Yurong Hu , Li Wai Keung , Zeeshan Zeeshan , Yasser A. Ali , Nadia Sarhan
{"title":"Utilizing convolutional neural networks (CNN) and U-Net architecture for precise crop and weed segmentation in agricultural imagery: A deep learning approach","authors":"Mughair Aslam Bhatti , M.S. Syam , Huafeng Chen , Yurong Hu , Li Wai Keung , Zeeshan Zeeshan , Yasser A. Ali , Nadia Sarhan","doi":"10.1016/j.bdr.2024.100465","DOIUrl":"10.1016/j.bdr.2024.100465","url":null,"abstract":"<div><p>This study presents the implementation and evaluation of a convolutional neural network (CNN) based image segmentation model using the U-Net architecture for forest image segmentation. The proposed algorithm starts by preprocessing the datasets of satellite images and corresponding masks from a repository source. Data preprocessing involves resizing, normalizing, and splitting the images and masks into training and testing datasets. The U-Net model architecture, comprising encoder and decoder parts with skip connections, is defined and compiled with binary cross-entropy loss and Adam optimizer. Training includes early stopping and checkpoint saving mechanisms to prevent overfitting and retain the best model weights. Evaluation metrics such as Intersection over Union (IoU), Dice coefficient, pixel accuracy, precision, recall, specificity, and F1-score are computed to assess the model's performance. Visualization of results includes comparing predicted segmentation masks with ground truth masks for qualitative analysis. The study emphasizes the importance of training data size in achieving accurate segmentation models and highlights the potential of U-Net architecture for forest image segmentation tasks.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100465"},"PeriodicalIF":3.3,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141026200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2024-05-28Epub Date: 2024-04-26DOI: 10.1016/j.bdr.2024.100455
Ricardo de A. Araújo , Paulo S.G. de Mattos Neto , Nadia Nedjah , Sergio C.B. Soares
{"title":"On the Sea Surface Temperature Forecasting Problem with Deep Dilation-Erosion-Linear Models","authors":"Ricardo de A. Araújo , Paulo S.G. de Mattos Neto , Nadia Nedjah , Sergio C.B. Soares","doi":"10.1016/j.bdr.2024.100455","DOIUrl":"https://doi.org/10.1016/j.bdr.2024.100455","url":null,"abstract":"<div><p>The sea surface temperature (SST) is considered an important measure for detecting changes in climate and marine ecosystems. So, its forecasting is essential for supporting governmental strategies to avoid side effects on the global population. In this paper, we analyze the SST time series and suggest that a combination between a linear component and a nonlinear component with long-term dependency can better represent it. Based on this assumption, we propose a deep neural network architecture with dilation-erosion-linear (DEL) processing units to deal with this particular kind of time series. An empirical analysis is performed in this work using three SST time series, where we explore three statistical measures. The experimental results demonstrate that the proposed model outperformed recent and classical literature forecasting techniques according to well-known performance metrics.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100455"},"PeriodicalIF":3.3,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140813373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2024-05-28Epub Date: 2024-02-27DOI: 10.1016/j.bdr.2024.100446
Christian Callegari, Stefano Giordano, Michele Pagano
{"title":"A Real Time Deep Learning Based Approach for Detecting Network Attacks","authors":"Christian Callegari, Stefano Giordano, Michele Pagano","doi":"10.1016/j.bdr.2024.100446","DOIUrl":"10.1016/j.bdr.2024.100446","url":null,"abstract":"<div><p>Anomaly-based Intrusion Detection is a key research topic in network security due to its ability to face unknown attacks and new security threats. For this reason, many works on the topic have been proposed in the last decade. Nonetheless, an ultimate solution, able to provide a high detection rate with an acceptable false alarm rate, has still to be identified. In the last years big research efforts have focused on the application of Deep Learning techniques to the field, but no work has been able, so far, to propose a system achieving good detection performance, while processing raw network traffic in real time. For this reason in the paper we propose an Intrusion Detection System that, leveraging on probabilistic data structures and Deep Learning techniques, is able to process in real time the traffic collected in a backbone network, offering <em>excellent</em> detection performance and low false alarm rate. Indeed, the extensive experimental tests, run to validate our system and compare different Deep Learning techniques, confirm that, with a proper parameter setting, we can achieve about 92% of detection rate, with an accuracy of 0.899. Finally, with minimal changes, the proposed system can provide some information about the kind of anomaly, although in the multi-class scenario the detection rate is slightly lower (around 86%).</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100446"},"PeriodicalIF":3.3,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579624000224/pdfft?md5=bbd19915547bc28f9b5784f2f0ddcb21&pid=1-s2.0-S2214579624000224-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140004622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2024-05-28Epub Date: 2024-04-04DOI: 10.1016/j.bdr.2024.100453
Helen Karatza
{"title":"Scheduling critical periodic jobs with selective partial computations along with gang jobs","authors":"Helen Karatza","doi":"10.1016/j.bdr.2024.100453","DOIUrl":"https://doi.org/10.1016/j.bdr.2024.100453","url":null,"abstract":"<div><p>One of the main issues with distributed systems, like clouds, is scheduling complex workloads, which are made up of various job types with distinct features. Gang jobs are one kind of parallel applications that these systems support. This paper examines the scheduling of workloads that comprise gangs and critical periodic jobs that can allow for partial computations when necessary to overcome gang job execution. The simulation's results shed important light on how gang performance is impacted by partial computations of critical jobs. The results also reveal that, under the proposed scheduling scheme, partial computations which take into account gangs’ degree of parallelism, might lower the average response time of gang jobs, resulting in an acceptable level of the average results precision of the critical jobs. Additionally, it is observed that as the deviation from the average partial computation increases, the performance improvement due to partial computations increases with the aforementioned tradeoff remaining significant.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100453"},"PeriodicalIF":3.3,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140547395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning for Tsunami Waves Forecasting Using Regression Trees","authors":"Eugenio Cesario , Salvatore Giampá , Enrico Baglione , Louise Cordrie , Jacopo Selva , Domenico Talia","doi":"10.1016/j.bdr.2024.100452","DOIUrl":"https://doi.org/10.1016/j.bdr.2024.100452","url":null,"abstract":"<div><p>After a seismic event, tsunami early warning systems (TEWSs) try to accurately forecast the maximum height of incident waves at specific target points in front of the coast, so that early warnings can be launched on locations where the impact of tsunami waves can be destructive to deliver aids in these locations in the immediate post-event management. The uncertainty on the forecast can be quantified with ensembles of alternative scenarios. Similarly, in probabilistic tsunami hazard analysis (PTHA) a large number of simulations is required to cover the natural variability of the source process in each location. To improve the accuracy and computational efficiency of tsunami forecasting methods, scientists have recently started to exploit machine learning techniques to process pre-computed simulation data. However, the approaches proposed in literature, mainly based on neural networks, suffer of high training time and limited model explainability. To overtake these issues, this paper describes a machine learning approach based on regression trees to model and forecast tsunami evolutions. The algorithm takes as input a set of simulations forming an ensemble that describes potential benefit regional impact of tsunami source scenarios in a given source area, and it provides predictive models to forecast the tsunami waves for other potential tsunami sources in the same area. The experimental evaluation, performed on the 2003 M6.8 Zemmouri-Boumerdes earthquake and tsunami simulation data, shows that regression trees achieve high forecasting accuracy. Moreover, they provide domain experts with fully-explainable and interpretable models, which are a valuable support for environmental scientists because they describe underlying rules and patterns behind the models and allow for an explicit inspection of their functioning. This can enable a full and trustable exploration of source uncertainty in tsunami early-warning and urgent computing scenarios, with large ensembles of computationally light tsunami simulations.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100452"},"PeriodicalIF":3.3,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579624000285/pdfft?md5=942e994d950c715c0c020e511bc26341&pid=1-s2.0-S2214579624000285-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140559033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2024-05-28Epub Date: 2024-03-20DOI: 10.1016/j.bdr.2024.100448
Min Peng , Yunxiang Liu , Asad Khan , Bilal Ahmed , Subrata K. Sarker , Yazeed Yasin Ghadi , Uzair Aslam Bhatti , Muna Al-Razgan , Yasser A. Ali
{"title":"Crop monitoring using remote sensing land use and land change data: Comparative analysis of deep learning methods using pre-trained CNN models","authors":"Min Peng , Yunxiang Liu , Asad Khan , Bilal Ahmed , Subrata K. Sarker , Yazeed Yasin Ghadi , Uzair Aslam Bhatti , Muna Al-Razgan , Yasser A. Ali","doi":"10.1016/j.bdr.2024.100448","DOIUrl":"10.1016/j.bdr.2024.100448","url":null,"abstract":"<div><p>In the context of the rapidly evolving climate dynamics of the early twenty-first century, the interplay between climate change and biospheric integrity is becoming increasingly critical. The pervasive impact of climate change on ecosystems is manifested not only through alterations in average environmental conditions and their variability but also through ancillary shifts such as escalated oceanic acidification and heightened atmospheric CO<sub>2</sub> levels. These climatic transformations are further compounded by concurrent ecological stressors, including habitat degradation, defaunation, and fragmentation. Against this backdrop, this study delves into the efficacy of advanced deep learning methodologies for the classification of land cover from satellite imagery, with a particular emphasis on agricultural crop monitoring. The study leverages state-of-the-art pre-trained Convolutional Neural Network (CNN) architectures, namely VGG16, MobileNetV2, DenseNet121, and ResNet50, selected for their architectural sophistication and proven competence in image recognition domains. The research framework encompasses a comprehensive data preparation phase incorporating augmentation techniques, a thorough exploratory data analysis to pinpoint and address class imbalances through the computation of class weights, and the strategic fine-tuning of CNN architectures with tailored classification layers to suit the specificities of land cover classification challenges. The models' performance was rigorously evaluated against benchmarks of accuracy and loss, both during the training phase and on validation datasets, with preventative strategies against overfitting, such as early stopping and adaptive learning rate modifications, being integral to the methodology. The findings illuminate the considerable potential of leveraging pre-trained deep learning models for remote sensing in agriculture, demonstrating that advanced CNN architectures, particularly DenseNet121 and ResNet50, are notably effective in enhancing crop type classification accuracy from satellite imagery. This study contributes valuable insights to the field of precision agriculture, advocating for the integration of sophisticated image recognition technologies to bolster crop monitoring efficacy, thereby enabling more nuanced agricultural decision-making and resource allocation.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100448"},"PeriodicalIF":3.3,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140282143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2024-05-28Epub Date: 2024-02-05DOI: 10.1016/j.bdr.2024.100426
Yuan Liang
{"title":"Attentive Implicit Relation Embedding for Event Recommendation in Event-Based Social Network","authors":"Yuan Liang","doi":"10.1016/j.bdr.2024.100426","DOIUrl":"10.1016/j.bdr.2024.100426","url":null,"abstract":"<div><p>The <u>e</u>vent-<u>b</u>ased <u>s</u>ocial <u>n</u>etwork (EBSN) is a new type of social network that combines online and offline networks, and its primary goal is to recommend appropriate events to users. Most studies do not model event recommendations on the EBSN platform as graph representation learning, nor do they consider the implicit relationship between events, resulting in recommendations that are not accepted by users. Thus, we study graph representation learning, which integrates implicit relationships between social networks and events. First, we propose an algorithm that integrates implicit relationships between social networks and events based on a multiple attention model. The graph structure that integrates implicit relationships between social networks and events is divided into user modeling and event modeling: modeling the interactive information of user events, user social relationships, and implicit relationships between users in user modeling; modeling user information and implicit relationships between events in event modeling; and deeply mining high-level transfer relationships between users and events. Then, the user modeling and event modeling models are fused using a multiattention joint learning mechanism to capture the different impacts of social and implicit relationships on user preferences, improving the recommendation quality of the recommendation system. Finally, the effectiveness of the proposed algorithm is verified in real datasets.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100426"},"PeriodicalIF":3.3,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139688835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2024-05-28Epub Date: 2024-04-23DOI: 10.1016/j.bdr.2024.100457
Wei Zhang, Yu Dai
{"title":"A multiscale electricity theft detection model based on feature engineering","authors":"Wei Zhang, Yu Dai","doi":"10.1016/j.bdr.2024.100457","DOIUrl":"10.1016/j.bdr.2024.100457","url":null,"abstract":"<div><p>With the widespread adoption of smart meters and the growing availability of data mining and machine learning algorithms, there is a pressing demand for methods that are both accurate and explicable in identifying electricity theft patterns among end-users. To address this need, this study proposes a multi-scale anomaly detection model based on feature engineering.Specifically, tsfresh is utilized in feature engineering to extract electricity consumption features from the raw data, and XGBoost is employed to select features that are highly correlated with anomalous behavior, which have clear physical interpretations. Multi-scale convolutional neural networks are then used to analyze and process the data at different temporal and frequency scales. Attention mechanisms are applied to assign weights to different feature channels, and all of the extracted information is fused for anomaly detection. The combination of feature engineering and multi-scale convolutional neural networks not only enhances the interpretability of the model but also improves its performance, as demonstrated by the experimental results, which show that the proposed method outperforms traditional anomaly detection approaches across multiple evaluation metrics.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100457"},"PeriodicalIF":3.3,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140762245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2024-05-28Epub Date: 2024-02-03DOI: 10.1016/j.bdr.2024.100439
Tongfei Li , Mingzheng Lai , Shixian Nie , Haifeng Liu , Zhiyao Liang , Wei Lv
{"title":"Tropical cyclone trajectory based on satellite remote sensing prediction and time attention mechanism ConvLSTM model","authors":"Tongfei Li , Mingzheng Lai , Shixian Nie , Haifeng Liu , Zhiyao Liang , Wei Lv","doi":"10.1016/j.bdr.2024.100439","DOIUrl":"10.1016/j.bdr.2024.100439","url":null,"abstract":"<div><p>The accurate and timely prediction of tropical cyclones is of paramount importance in mitigating the impact of these catastrophic meteorological events. Presently, methods for predicting tropical cyclones based on satellite remote sensing images encounter notable challenges, including the inadequate extraction of three-dimensional spatial features and limitations in long-term forecasting. As a response to these challenges, this study introduces the Temporal Attention Mechanism ConvLSTM (TAM-CL) model, designed to conduct thorough spatiotemporal feature extraction on three-dimensional atmospheric reanalysis data of tropical cyclones. By leveraging ConvLSTM with three-dimensional convolution kernels, our model enhances the extraction of three-dimensional spatiotemporal features. Furthermore, an attention mechanism is integrated to bolster long-term prediction accuracy by emphasizing crucial temporal nodes. In the evaluation of tropical cyclone track and intensity forecasts across 24, 48, and 72 h, TAM-CL demonstrates a notable reduction in prediction errors, thereby underscoring its efficacy in forecasting both cyclone tracks and intensities. This contributes to an effective exploration of the application of deep networks in conjunction with atmospheric reanalysis data.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100439"},"PeriodicalIF":3.3,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139662985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}