Frontiers in Big DataPub Date : 2025-03-25eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1573072
Alfred Krzywicki, Michael Bain, Wayne Wobcke
{"title":"Editorial: Natural language processing for recommender systems.","authors":"Alfred Krzywicki, Michael Bain, Wayne Wobcke","doi":"10.3389/fdata.2025.1573072","DOIUrl":"https://doi.org/10.3389/fdata.2025.1573072","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1573072"},"PeriodicalIF":2.4,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11975900/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143813038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CrowdRadar: a mobile crowdsensing framework for urban traffic green travel safety risk assessment.","authors":"Yigao Wang, Qingxian Tang, Wenxuan Wei, Chenhui Yang, Dingqi Yang, Cheng Wang, Liang Xu, Longbiao Chen","doi":"10.3389/fdata.2025.1440816","DOIUrl":"10.3389/fdata.2025.1440816","url":null,"abstract":"<p><p>As environmental awareness increased due to the surge in greenhouse gases, green travel modes such as bicycles and walking have gradually became popular choices. However, the current traffic environment has many hidden problems that endanger the personal safety of traffic participants and hinder the development of green travel. Traditional methods, such as identifying risky locations after traffic accidents, suffer from the disadvantages of delayed response and lack of foresight. Against this background, we proposed a mobile edge crowdsensing framework to dynamically assess urban traffic green travel safety risks. Specifically, a large number of mobile devices were used to sense the road environment, from which a semantic detection framework detected the traffic high-risk behaviors of traffic participants. Then multi-source and heterogeneous urban crowdsensing data were used to model the travel safety risk to achieve a comprehensive and real-time assessment of urban green travel safety. We evaluated our method by leveraging real-world datasets collected from Xiamen Island. Results showed that our framework could accurately detect traffic high-risk behaviors with average F1-scores of 86.5% and assessed the travel safety risk with <i>R</i> <sup>2</sup> of 0.85 outperforming various baseline methods.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1440816"},"PeriodicalIF":2.4,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11968729/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143796985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-03-14eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1546223
Yves Rybarczyk, Rasa Zalakeviciute, Marija Ereminaite, Ivana Costa-Stolz
{"title":"Causal effect of PM<sub>2.5</sub> on the urban heat island.","authors":"Yves Rybarczyk, Rasa Zalakeviciute, Marija Ereminaite, Ivana Costa-Stolz","doi":"10.3389/fdata.2025.1546223","DOIUrl":"10.3389/fdata.2025.1546223","url":null,"abstract":"<p><p>The planet is experiencing global warming, with an increasing number of heat waves worldwide. Cities are particularly affected by the high temperatures because of the urban heat island (UHI) effect. This phenomenon is mostly explained by the land cover changes, reduced green spaces, and the concentration of infrastructure in urban settings. However, the reasons for the UHI are complex and involve multiple factors still understudied. Air pollution is one of them. This work investigates the link between particulate matter ≤2.5 μm (PM<sub>2.5</sub>) and air temperature by convergent cross-mapping (CCM), a statistical method to infer causation in dynamic non-linear systems. A positive correlation between the concentration of fine particulate matter and urban temperature is observed. The causal relationship between PM<sub>2.5</sub> and temperature is confirmed in the most urbanized areas of the study site (Quito, Ecuador). The results show that (i) the UHI is present even in the most elevated capital city of the world, and (ii) air quality is an important contributor to the higher temperatures in urban than outlying areas. This study supports the hypothesis of a non-linear threshold effect of pollution concentration on urban temperature.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1546223"},"PeriodicalIF":2.4,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11949916/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143755854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-03-13eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1455442
Waleed Albattah, Rehan Ullah Khan
{"title":"Impact of imbalanced features on large datasets.","authors":"Waleed Albattah, Rehan Ullah Khan","doi":"10.3389/fdata.2025.1455442","DOIUrl":"10.3389/fdata.2025.1455442","url":null,"abstract":"<p><p>The exponential growth of image and video data motivates the need for practical real-time content-based searching algorithms. Features play a vital role in identifying objects within images. However, feature-based classification faces a challenge due to uneven class instance distribution. Ideally, each class should have an equal number of instances and features to ensure optimal classifier performance. However, real-world scenarios often exhibit class imbalances. Thus, this article explores the classification framework based on image features, analyzing balanced and imbalanced distributions. Through extensive experimentation, we examine the impact of class imbalance on image classification performance, primarily on large datasets. The comprehensive evaluation shows that all models perform better with balancing compared to using an imbalanced dataset, underscoring the importance of dataset balancing for model accuracy. Distributed Gaussian (D-GA) and Distributed Poisson (D-PO) are found to be the most effective techniques, especially in improving Random Forest (RF) and SVM models. The deep learning experiments also show an improvement as such.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1455442"},"PeriodicalIF":2.4,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11948280/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143732910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-03-12eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1485493
Sandio Maciel Dos Santos, Marcelino Silva da Silva, Fábio Manoel França Lobato, Carlos Renato Lisboa Francês
{"title":"Use of Bayesian networks in Brazil high school educational database: analysis of the impact of COVID-19 on ENEM in Pará between 2019 and 2022.","authors":"Sandio Maciel Dos Santos, Marcelino Silva da Silva, Fábio Manoel França Lobato, Carlos Renato Lisboa Francês","doi":"10.3389/fdata.2025.1485493","DOIUrl":"10.3389/fdata.2025.1485493","url":null,"abstract":"<p><p>This study examines the impact of the COVID-19 pandemic on academic performance and student participation in the National High School Exam (ENEM) in the state of Pará, Brazil, focusing on the interaction between socioeconomic factors, access to technology, and regional disparities. The research employed a mixed-methods approach, analyzing quantitative data from ENEM results (2020-2022) and qualitative interviews with educators and students. The findings indicate that the pandemic exacerbated pre-existing educational inequalities, particularly affecting low-income students and those enrolled in public schools. The highest dropout rates were recorded among students with a family income of up to one minimum wage, highlighting the barriers posed by limited access to technology and infrastructure for remote learning. A statistical analysis revealed a 20% increase in scores among students with access to computers and the Internet, particularly in private schools. The study also found significant regional differences across Pará's mesoregions, with Marajó and Southeast Pará facing more persistent challenges in reducing dropout rates compared to the Metropolitan Region of Belém. These results underscore the urgent need for region-specific public policies that address disparities in educational resources, including targeted investments in digital infrastructure and teacher training for remote education. The study concludes that comprehensive support programs, including psychological assistance for students, are essential for building a more resilient and equitable educational system capable of withstanding future crises.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1485493"},"PeriodicalIF":2.4,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11937093/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143722233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-03-06eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1529848
Asma'a Mohammad Al-Mnayyis, Hasan Gharaibeh, Mohammad Amin, Duha Anakreh, Hanan Fawaz Akhdar, Eman Hussein Alshdaifat, Khalid M O Nahar, Ahmad Nasayreh, Mohammad Gharaibeh, Neda'a Alsalman, Alaa Alomar, Maha Gharaibeh, Hamad Yahia Abu Mhanna
{"title":"(KAUH-BCMD) dataset: advancing mammographic breast cancer classification with multi-fusion preprocessing and residual depth-wise network.","authors":"Asma'a Mohammad Al-Mnayyis, Hasan Gharaibeh, Mohammad Amin, Duha Anakreh, Hanan Fawaz Akhdar, Eman Hussein Alshdaifat, Khalid M O Nahar, Ahmad Nasayreh, Mohammad Gharaibeh, Neda'a Alsalman, Alaa Alomar, Maha Gharaibeh, Hamad Yahia Abu Mhanna","doi":"10.3389/fdata.2025.1529848","DOIUrl":"10.3389/fdata.2025.1529848","url":null,"abstract":"<p><p>The categorization of benign and malignant patterns in digital mammography is a critical step in the diagnosis of breast cancer, facilitating early detection and potentially saving many lives. Diverse breast tissue architectures often obscure and conceal breast issues. Classifying worrying regions (benign and malignant patterns) in digital mammograms is a significant challenge for radiologists. Even for specialists, the first visual indicators are nuanced and irregular, complicating identification. Therefore, radiologists want an advanced classifier to assist in identifying breast cancer and categorizing regions of concern. This study presents an enhanced technique for the classification of breast cancer using mammography images. The collection comprises real-world data from King Abdullah University Hospital (KAUH) at Jordan University of Science and Technology, consisting of 7,205 photographs from 5,000 patients aged 18-75. After being classified as benign or malignant, the pictures underwent preprocessing by rescaling, normalization, and augmentation. Multi-fusion approaches, such as high-boost filtering and contrast-limited adaptive histogram equalization (CLAHE), were used to improve picture quality. We created a unique Residual Depth-wise Network (RDN) to enhance the precision of breast cancer detection. The suggested RDN model was compared with many prominent models, including MobileNetV2, VGG16, VGG19, ResNet50, InceptionV3, Xception, and DenseNet121. The RDN model exhibited superior performance, achieving an accuracy of 97.82%, precision of 96.55%, recall of 99.19%, specificity of 96.45%, F1 score of 97.85%, and validation accuracy of 96.20%. The findings indicate that the proposed RDN model is an excellent instrument for early diagnosis using mammography images and significantly improves breast cancer detection when integrated with multi-fusion and efficient preprocessing approaches.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1529848"},"PeriodicalIF":2.4,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11922913/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-03-04eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1582619
{"title":"Erratum: Edge-level multi-constraint graph pattern matching with lung cancer knowledge graph.","authors":"","doi":"10.3389/fdata.2025.1582619","DOIUrl":"https://doi.org/10.3389/fdata.2025.1582619","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.3389/fdata.2025.1546850.].</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1582619"},"PeriodicalIF":2.4,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11915023/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143659763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-02-26eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1519369
Francesco Bertolotti, Niccolò Kadera, Luca Pasquino, Luca Mari
{"title":"An epidemiological extension of the El Farol Bar problem.","authors":"Francesco Bertolotti, Niccolò Kadera, Luca Pasquino, Luca Mari","doi":"10.3389/fdata.2025.1519369","DOIUrl":"10.3389/fdata.2025.1519369","url":null,"abstract":"<p><p>This paper presents an epidemiological extension of the El Farol Bar problem, where both a social and an epidemiological dimension are present. In the model, individual agents making binary decisions-to visit a bar or stay home-amidst a non-fatal epidemic. The extension of the classic social dilemma is implemented as an agent-based model, and it is later explored by sampling the parameter space and observing the resulting behavior. The results of this analysis suggest that the infection could be contained by increasing the information available in the underlying social system and adjusting its structure.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1519369"},"PeriodicalIF":2.4,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11897257/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143617838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-02-20eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1515341
Huanjing Liu, Xiao Zhang, Qian Liu
{"title":"A review of AI-based radiogenomics in neurodegenerative disease.","authors":"Huanjing Liu, Xiao Zhang, Qian Liu","doi":"10.3389/fdata.2025.1515341","DOIUrl":"10.3389/fdata.2025.1515341","url":null,"abstract":"<p><p>Neurodegenerative diseases are chronic, progressive conditions that cause irreversible damage to the nervous system, particularly in aging populations. Early diagnosis is a critical challenge, as these diseases often develop slowly and without clear symptoms until significant damage has occurred. Recent advances in radiomics and genomics have provided valuable insights into the mechanisms of these diseases by identifying specific imaging features and genomic patterns. Radiogenomics enhances diagnostic capabilities by linking genomics with imaging phenotypes, offering a more comprehensive understanding of disease progression. The growing field of artificial intelligence (AI), including machine learning and deep learning, opens new opportunities for improving the accuracy and timeliness of these diagnoses. This review examines the application of AI-based radiogenomics in neurodegenerative diseases, summarizing key model designs, performance metrics, publicly available data resources, significant findings, and future research directions. It provides a starting point and guidance for those seeking to explore this emerging area of study.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1515341"},"PeriodicalIF":2.4,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11882605/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143574665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Next-generation approach to skin disorder prediction employing hybrid deep transfer learning.","authors":"Yonis Gulzar, Shivani Agarwal, Saira Soomro, Meenakshi Kandpal, Sherzod Turaev, Choo W Onn, Shilpa Saini, Abdenour Bounsiar","doi":"10.3389/fdata.2025.1503883","DOIUrl":"10.3389/fdata.2025.1503883","url":null,"abstract":"<p><strong>Introduction: </strong>Skin diseases significantly impact individuals' health and mental wellbeing. However, their classification remains challenging due to complex lesion characteristics, overlapping symptoms, and limited annotated datasets. Traditional convolutional neural networks (CNNs) often struggle with generalization, leading to suboptimal classification performance. To address these challenges, this study proposes a Hybrid Deep Transfer Learning Method (HDTLM) that integrates DenseNet121 and EfficientNetB0 for improved skin disease prediction.</p><p><strong>Methods: </strong>The proposed hybrid model leverages DenseNet121's dense connectivity for capturing intricate patterns and EfficientNetB0's computational efficiency and scalability. A dataset comprising 19 skin conditions with 19,171 images was used for training and validation. The model was evaluated using multiple performance metrics, including accuracy, precision, recall, and F1-score. Additionally, a comparative analysis was conducted against state-of-the-art models such as DenseNet121, EfficientNetB0, VGG19, MobileNetV2, and AlexNet.</p><p><strong>Results: </strong>The proposed HDTLM achieved a training accuracy of 98.18% and a validation accuracy of 97.57%. It consistently outperformed baseline models, achieving a precision of 0.95, recall of 0.96, F1-score of 0.95, and an overall accuracy of 98.18%. The results demonstrate the hybrid model's superior ability to generalize across diverse skin disease categories.</p><p><strong>Discussion: </strong>The findings underscore the effectiveness of the HDTLM in enhancing skin disease classification, particularly in scenarios with significant domain shifts and limited labeled data. By integrating complementary strengths of DenseNet121 and EfficientNetB0, the proposed model provides a robust and scalable solution for automated dermatological diagnostics.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1503883"},"PeriodicalIF":2.4,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11879938/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143568816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}