Frontiers in Big DataPub Date : 2025-05-14eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1520574
Liliana Ortega-Diaz, Julian Jaramillo-Ibarra, German Osma-Pinto
{"title":"Estimation of the air conditioning energy consumption of a classroom using machine learning in a tropical climate.","authors":"Liliana Ortega-Diaz, Julian Jaramillo-Ibarra, German Osma-Pinto","doi":"10.3389/fdata.2025.1520574","DOIUrl":"https://doi.org/10.3389/fdata.2025.1520574","url":null,"abstract":"<p><p>Air conditioning energy consumption in buildings represents a considerable percentage of total energy consumption, which underlines the importance of implementing measures contributing to its reduction. Predicting energy consumption is critical to making informed decisions and identifying factors influencing power consumption. Machine learning is the most widely used approach for prediction due to its speed, accuracy, and non-linear modeling. In this study, three machine learning models were used to predict the air conditioning energy demand in a classroom of an educational building in a hot tropical climate. The models selected are SVR (Support Vector Regressor), DT (Decision Tree), and RFR (Random Forest Regressor) due to their wide use in the literature; therefore, the goal is to establish which one offers the best performance for this case study based on a comparative analysis using performance metrics. Cross-validation was used to perform robust training. Twenty-two input variables were considered: climatological, operational, and temporal. Occupancy is the variable with the highest correlation with air conditioning consumption; these two variables have a positive relationship of 0.65. Monitoring was carried out for 72 days, including weekends. Six study scenarios were considered, in which the monitoring period varied, influencing the number of samples. In addition, two sensitivity analyses were performed by modifying the time interval of the data (1, 5, 10, 20, 30, and 60 min) and the data split (50:50, 60:40, 70:30, 80:20 and 90:10). The evaluation of the models was performed using RMSE, MAE and <i>R</i> <sup>2</sup> metrics, to different characteristics and approaches to error measurement. During the training phase, the RFR model achieved a coefficient of determination (<i>R</i> <sup>2</sup>) of 0.97, while the SVR obtained an <i>R</i> <sup>2</sup> of 0.78 in the test phase. Finally, it is concluded that using shorter time intervals (every 1 min) in the data improves the performance of the predictive models. Splitting the data into 80:20 and 90:10 ratios resulted in the lowest RMSE values for the three models evaluated. Training the models with a larger amount of data allows for capturing more representative patterns, which improves their generalization ability and performance on new data.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1520574"},"PeriodicalIF":2.4,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12116678/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144175826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-05-14eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1526480
Mohamed Abd Elaziz, Ibrahim A Fares, Abdelghani Dahou, Mansour Shrahili
{"title":"Federated learning framework for IoT intrusion detection using tab transformer and nature-inspired hyperparameter optimization.","authors":"Mohamed Abd Elaziz, Ibrahim A Fares, Abdelghani Dahou, Mansour Shrahili","doi":"10.3389/fdata.2025.1526480","DOIUrl":"10.3389/fdata.2025.1526480","url":null,"abstract":"<p><p>Intrusion detection has been of prime concern in the Internet of Things (IoT) environment due to the rapid increase in cyber threats. Majority of traditional intrusion detection systems (IDSs) rely on centralized models, raising significant privacy concerns. Federated learning (FL) offers a decentralized alternative; however, many existing FL-based IDS frameworks suffer from poor performance due to suboptimal model architectures and ineffective hyperparameter selection. To address these challenges, this paper introduces a novel trust-centric FL framework based on the tab transformer (TTF) model for IDS. We enhance the Tab model through an optimization process, utilizing a hyperparameter tuning algorithm inspired by the nature-based electric eel foraging optimization (EEFO) algorithm. The goal of the developed framework is to improve the detection of IDS without using centralized data to preserve privacy. Whereas it enhances the processing and detection capability of huge amounts of data generated from IoT devices. Our framework is tested on three IoT datasets: N-BaIoT, UNSW-NB15, and CICIoT2023 to ensure the model's performance. Experimental results show that the proposed framework significantly exceeds traditional methods in terms of accuracy, precision, and recall. The results presented in this study confirm the effectiveness and superior performance of the proposed FL-based IDS framework.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1526480"},"PeriodicalIF":2.4,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12116512/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144175827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-05-06eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1611364
Yves Philippe Rybarczyk
{"title":"Editorial: Air quality and biosphere-atmosphere interactions.","authors":"Yves Philippe Rybarczyk","doi":"10.3389/fdata.2025.1611364","DOIUrl":"https://doi.org/10.3389/fdata.2025.1611364","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1611364"},"PeriodicalIF":2.4,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089036/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144112461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-04-23eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1542483
Hossein Hassani, Mohammad Reza Entezarian, Sara Zaeimzadeh, Leila Marvian, Nadejda Komendantova
{"title":"An oversampling-undersampling strategy for large-scale data linkage.","authors":"Hossein Hassani, Mohammad Reza Entezarian, Sara Zaeimzadeh, Leila Marvian, Nadejda Komendantova","doi":"10.3389/fdata.2025.1542483","DOIUrl":"https://doi.org/10.3389/fdata.2025.1542483","url":null,"abstract":"<p><p>Effective record linkage in big data, particularly in imbalanced datasets, is a critical yet highly challenging task due to the inherent complexity involved. This article utilizes an oversampling-undersampling strategy to address linkage imbalances, enabling more accurate and efficient record linkage within large-scale datasets. It tries to increase the instances of the minority class and decrease the dominance of the majority classes to try to reach a more balanced dataset that can be used for training and testing. Sensitivity testing was carried out by varying the training-test ratio and degree of imbalance.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1542483"},"PeriodicalIF":2.4,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12055850/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144060444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-04-16eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1556157
Suresh Neethirajan
{"title":"Safeguarding digital livestock farming - a comprehensive cybersecurity roadmap for dairy and poultry industries.","authors":"Suresh Neethirajan","doi":"10.3389/fdata.2025.1556157","DOIUrl":"https://doi.org/10.3389/fdata.2025.1556157","url":null,"abstract":"<p><p>The rapid digital transformation of dairy and poultry farming through big data analytics and Internet of Things (IoT) innovations has significantly advanced precision management of feeding, animal health, and environmental conditions. However, this digitization has simultaneously escalated cybersecurity vulnerabilities, presenting serious threats to economic stability, animal welfare, and food safety. This paper provides an in-depth analysis of the evolving cyber threat landscape confronting digital livestock farming, examining ransomware incidents, hacktivist interference, and state-sponsored cyber intrusions. It critically assesses how compromised digital systems disrupt critical farm operations, including milking routines, feed formulations, and climate control, profoundly impacting animal health, productivity, and consumer trust. Responding to these challenges, we present a comprehensive cybersecurity roadmap that integrates established IT security practices with agriculture-specific requirements. The roadmap emphasizes advanced solutions, such as AI-driven anomaly detection, blockchain-based traceability, and integrated cybersecurity-biosecurity frameworks, tailored explicitly to safeguard livestock farming. Additionally, we highlight human-centric elements such as targeted workforce education, rural cybersecurity capacity building, and robust cross-sector collaboration as indispensable components of a resilient cybersecurity ecosystem. By synthesizing technical advancements, regulatory perspectives, and socio-economic insights, the paper proposes a proactive strategy to enhance data integrity, secure animal welfare, and reinforce food supply chains. Ultimately, we underscore that effective cybersecurity is not merely a technical consideration but foundational to ensuring the sustainable, ethical, and trustworthy advancement of livestock agriculture in a data-driven world.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1556157"},"PeriodicalIF":2.4,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12040926/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144043040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-04-09eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1557600
Jorge Raul Navarro-Cabrera, Miguel Angel Valles-Coral, María Elena Farro-Roque, Nelly Reátegui-Lozano, Lolita Arévalo-Fasanando
{"title":"Machine vision model using nail images for non-invasive detection of iron deficiency anemia in university students.","authors":"Jorge Raul Navarro-Cabrera, Miguel Angel Valles-Coral, María Elena Farro-Roque, Nelly Reátegui-Lozano, Lolita Arévalo-Fasanando","doi":"10.3389/fdata.2025.1557600","DOIUrl":"https://doi.org/10.3389/fdata.2025.1557600","url":null,"abstract":"<p><strong>Introduction: </strong>Iron deficiency anemia (IDA) is a global health issue that significantly affects quality of life. Non-invasive methods, such as image analysis using artificial vision, offer accessible alternatives for diagnosis. This study proposes a DenseNet169-based model to detect anemia from nail images and compares its performance with that of the Rad-67 hemoglobin meter.</p><p><strong>Methods: </strong>A cross-sectional study was conducted with 909 nail images collected from university students aged 18-25 years at the Universidad Nacional de San Martín, Peru. Samsung Galaxy A73 5G was used to capture images under controlled conditions, and clinical data were complemented with hemoglobin readings from the Rad-67 device. The images were pre-processed using segmentation and data augmentation techniques to standardize the dataset. Three models (DenseNet169, InceptionV3, and Xception) were trained and evaluated using metrics, such as accuracy, recall, and AUC.</p><p><strong>Results: </strong>DenseNet169169 demonstrated the best performance, achieving an accuracy of 0.6983, recall of 0.6477, F1-Score of 0.6525, and AUC of 0.7409. Despite the presence of false-negatives, the results showed a positive correlation with Rad-67 readings.</p><p><strong>Conclusion: </strong>The DenseNet169-based model proved to be a promising tool for non-invasive detection of iron deficiency anemia, with potential for application in clinical and educational settings. Future improvements in preprocessing and dataset diversification could enhance performance and applicability.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1557600"},"PeriodicalIF":2.4,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12015980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144040422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A stacked ensemble machine learning model for the prediction of pentavalent 3 vaccination dropout in East Africa.","authors":"Meron Asmamaw Alemayehu, Shimels Derso Kebede, Agmasie Damtew Walle, Daniel Niguse Mamo, Ermias Bekele Enyew, Jibril Bashir Adem","doi":"10.3389/fdata.2025.1522578","DOIUrl":"https://doi.org/10.3389/fdata.2025.1522578","url":null,"abstract":"<p><strong>Introduction: </strong>Vaccination is critical for reducing childhood mortality, yet completion rates for the third dose of the pentavalent vaccine (Penta 3) in East Africa remain inadequate. This study aims to predict Penta 3 vaccination dropout using a stacking ensemble machine learning model with Demographic and Health Survey (DHS) data. The objective is to identify predictors of dropout and enhance intervention strategies.</p><p><strong>Methods: </strong>The study utilized seven base machine learning algorithms to create a stacked ensemble model with three meta-learners: Random Forest (RF), Generalized Linear Model (GLM), and Extreme Gradient Boosting (XGBoost). The H2O package facilitated the development of base learners and the stacking of super learners. Feature selection (FS) and comparisons were performed using the LASSO and Boruta algorithms. The selected features were one-hot encoded, and ordinal encoding was applied where appropriate. Hyperparameter optimization (HPO) and comparisons were conducted using grid search and random search. Model performance was assessed using five key metrics, including accuracy and the area under the curve (AUC). SHAP (Shapley Additive Explanations) values were used to interpret the model outputs and identify influential predictors. The experimental design was employed to present the results.</p><p><strong>Results: </strong>Four experiments were conducted to evaluate feature selection and HPO methods. All stacked ensemble models outperformed individual learners, with the XGBoost meta-learner optimized with grid search and LASSO FS achieving the highest performance: 93.9% accuracy and 99.4% AUC. While RF and GLM meta-learners were also evaluated, they were outperformed by the XGBoost meta-learner. SHAP analysis revealed key features influencing Penta 3 dropout, including the place of delivery, decision-making autonomy, the mother's level of earning, and healthcare access. Home delivery increased the risk of dropout, while postnatal care by midwives and health insurance coverage lowered dropout likelihood.</p><p><strong>Conclusion and recommendation: </strong>This study provides insights into the factors influencing Penta 3 vaccination dropout in East Africa. To reduce dropout rates, interventions should focus on enhancing maternal livelihood opportunities, improving healthcare access in rural areas, and promoting institutional deliveries.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1522578"},"PeriodicalIF":2.4,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12009798/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144060436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-04-04eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1532362
Johnson Masinde, Franklin Mugambi, Daniel Wambiri Muthee
{"title":"Big data and personal information privacy in developing countries: insights from Kenya.","authors":"Johnson Masinde, Franklin Mugambi, Daniel Wambiri Muthee","doi":"10.3389/fdata.2025.1532362","DOIUrl":"https://doi.org/10.3389/fdata.2025.1532362","url":null,"abstract":"<p><p>The present study examined the correlation between big data and personal information privacy in Kenya, a developing nation which has experienced a significant rise in utilization of data in the recent past. The study sought to assess the effectiveness of present data protection laws and policies, highlight challenges that individuals and organizations experience while securing their data, and propose mechanisms to enhance data protection frameworks and raise public awareness of data privacy issues. The study employed a mixed-methods approach, which included a survey of 500 participants, 20 interviews with key stakeholders, and an examination of 50 pertinent documents. Study findings show that the regulatory and legal frameworks though present are not enforced, demonstrating a gap between legislation and implementation. Furthermore, there is a lack of understanding about the risks posed by sharing personal information, and that more public education and awareness activities are required. The findings also demonstrate that while people are prepared to trade their personal information for concrete benefits, they are concerned about how their data is utilized and by whom. The study proposes the establishment of a National Data Literacy Training and Capacity Building Framework (NADACA), that should mandate the training of government officials in best practices for data governance and enforcement mechanisms, educate the public on personal data privacy and relevant laws, and ensure the integration of data literacy into the curriculum, alongside the provision of regular resources and workshops on data literacy. The study has significant implications for policymakers, industry representatives, and civil society organizations in Kenya and globally.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1532362"},"PeriodicalIF":2.4,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12006125/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144053409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lightweight and hybrid transformer-based solution for quick and reliable deepfake detection.","authors":"Geeta Rani, Atharv Kothekar, Shawn George Philip, Vijaypal Singh Dhaka, Ester Zumpano, Eugenio Vocaturo","doi":"10.3389/fdata.2025.1521653","DOIUrl":"https://doi.org/10.3389/fdata.2025.1521653","url":null,"abstract":"<p><strong>Introduction: </strong>Rapid advancements in artificial intelligence and generative artificial intelligence have enabled the creation of fake images and videos that appear highly realistic. According to a report published in 2022, approximately 71% of people rely on fake videos and become victims of blackmail. Moreover, these fake videos and images are used to tarnish the reputation of popular public figures. This has increased the demand for deepfake detection techniques. The accuracy of the techniques proposed in the literature so far varies with changes in fake content generation techniques. Additionally, these techniques are computationally intensive. The techniques discussed in the literature are based on convolutional neural networks, Linformer models, or transformer models for deepfake detection, each with its advantages and disadvantages.</p><p><strong>Methods: </strong>In this manuscript, a hybrid architecture combining transformer and Linformer models is proposed for deepfake detection. This architecture converts an image into patches and performs position encoding to retain spatial relationships between patches. Its encoder captures the contextual information from the input patches, and Gaussian Error Linear Unit resolves the vanishing gradient problem.</p><p><strong>Results: </strong>The Linformer component reduces the size of the attention matrix. Thus, it reduces the execution time to half without compromising accuracy. Moreover, it utilizes the unique features of transformer and Linformer models to enhance the robustness and generalization of deepfake detection techniques. The low computational requirement and high accuracy of 98.9% increase the real-time applicability of the model, preventing blackmail and other losses to the public.</p><p><strong>Discussion: </strong>The proposed hybrid model utilizes the strength of the transformer model in capturing complex patterns in data. It uses the self-attention potential of the Linformer model and reduces the computation time without compromising the accuracy. Moreover, the models were implemented on patch sizes of 6 and 11. It is evident from the obtained results that increasing the patch size improves the performance of the model. This allows the model to capture fine-grained features and learn more effectively from the same set of videos. The larger patch size also enables the model to better preserve spatial details, which contributes to improved feature extraction.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1521653"},"PeriodicalIF":2.4,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12023275/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144045976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-03-25eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1573072
Alfred Krzywicki, Michael Bain, Wayne Wobcke
{"title":"Editorial: Natural language processing for recommender systems.","authors":"Alfred Krzywicki, Michael Bain, Wayne Wobcke","doi":"10.3389/fdata.2025.1573072","DOIUrl":"https://doi.org/10.3389/fdata.2025.1573072","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1573072"},"PeriodicalIF":2.4,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11975900/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143813038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}