{"title":"Predicting risk of preterm birth in singleton pregnancies using machine learning algorithms.","authors":"Qiu-Yan Yu, Ying Lin, Yu-Run Zhou, Xin-Jun Yang, Joris Hemelaar","doi":"10.3389/fdata.2024.1291196","DOIUrl":"10.3389/fdata.2024.1291196","url":null,"abstract":"<p><p>We aimed to develop, train, and validate machine learning models for predicting preterm birth (<37 weeks' gestation) in singleton pregnancies at different gestational intervals. Models were developed based on complete data from 22,603 singleton pregnancies from a prospective population-based cohort study that was conducted in 51 midwifery clinics and hospitals in Wenzhou City of China between 2014 and 2016. We applied Catboost, Random Forest, Stacked Model, Deep Neural Networks (DNN), and Support Vector Machine (SVM) algorithms, as well as logistic regression, to conduct feature selection and predictive modeling. Feature selection was implemented based on permutation-based feature importance lists derived from the machine learning models including all features, using a balanced training data set. To develop prediction models, the top 10%, 25%, and 50% most important predictive features were selected. Prediction models were developed with the training data set with 5-fold cross-validation for internal validation. Model performance was assessed using area under the receiver operating curve (AUC) values. The CatBoost-based prediction model after 26 weeks' gestation performed best with an AUC value of 0.70 (0.67, 0.73), accuracy of 0.81, sensitivity of 0.47, and specificity of 0.83. Number of antenatal care visits before 24 weeks' gestation, aspartate aminotransferase level at registration, symphysis fundal height, maternal weight, abdominal circumference, and blood pressure emerged as strong predictors after 26 completed weeks. The application of machine learning on pregnancy surveillance data is a promising approach to predict preterm birth and we identified several modifiable antenatal predictors.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1291196"},"PeriodicalIF":3.1,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10941650/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140144558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2024-02-29eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1266031
Dmitry Kolobkov, Satyarth Mishra Sharma, Aleksandr Medvedev, Mikhail Lebedev, Egor Kosaretskiy, Ruslan Vakhitov
{"title":"Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project.","authors":"Dmitry Kolobkov, Satyarth Mishra Sharma, Aleksandr Medvedev, Mikhail Lebedev, Egor Kosaretskiy, Ruslan Vakhitov","doi":"10.3389/fdata.2024.1266031","DOIUrl":"10.3389/fdata.2024.1266031","url":null,"abstract":"<p><p>Combining training data from multiple sources increases sample size and reduces confounding, leading to more accurate and less biased machine learning models. In healthcare, however, direct pooling of data is often not allowed by data custodians who are accountable for minimizing the exposure of sensitive information. Federated learning offers a promising solution to this problem by training a model in a decentralized manner thus reducing the risks of data leakage. Although there is increasing utilization of federated learning on clinical data, its efficacy on individual-level genomic data has not been studied. This study lays the groundwork for the adoption of federated learning for genomic data by investigating its applicability in two scenarios: phenotype prediction on the UK Biobank data and ancestry prediction on the 1000 Genomes Project data. We show that federated models trained on data split into independent nodes achieve performance close to centralized models, even in the presence of significant inter-node heterogeneity. Additionally, we investigate how federated model accuracy is affected by communication frequency and suggest approaches to reduce computational complexity or communication costs.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1266031"},"PeriodicalIF":3.1,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10937521/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140133172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2024-02-26eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1304439
Mathias Uta, Alexander Felfernig, Viet-Man Le, Thi Ngoc Trang Tran, Damian Garber, Sebastian Lubos, Tamim Burgstaller
{"title":"Knowledge-based recommender systems: overview and research directions.","authors":"Mathias Uta, Alexander Felfernig, Viet-Man Le, Thi Ngoc Trang Tran, Damian Garber, Sebastian Lubos, Tamim Burgstaller","doi":"10.3389/fdata.2024.1304439","DOIUrl":"10.3389/fdata.2024.1304439","url":null,"abstract":"<p><p>Recommender systems are decision support systems that help users to identify items of relevance from a potentially large set of alternatives. In contrast to the mainstream recommendation approaches of collaborative filtering and content-based filtering, knowledge-based recommenders exploit semantic user preference knowledge, item knowledge, and recommendation knowledge, to identify user-relevant items which is of specific relevance when dealing with complex and high-involvement items. Such recommenders are primarily applied in scenarios where users specify (and revise) their preferences, and related recommendations are determined on the basis of constraints or attribute-level similarity metrics. In this article, we provide an overview of the existing state-of-the-art in knowledge-based recommender systems. Different related recommendation techniques are explained on the basis of a working example from the domain of survey software services. On the basis of our analysis, we outline different directions for future research.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1304439"},"PeriodicalIF":3.1,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10925703/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140102782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2024-02-21eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1368581
Sujata Dash, Subhendu Kumar Pani, Wellington Pinheiro Dos Santos
{"title":"Editorial: Internet of Medical Things and computational intelligence in healthcare 4.0.","authors":"Sujata Dash, Subhendu Kumar Pani, Wellington Pinheiro Dos Santos","doi":"10.3389/fdata.2024.1368581","DOIUrl":"10.3389/fdata.2024.1368581","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1368581"},"PeriodicalIF":3.1,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10916686/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140050980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2024-02-21eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1369159
Elochukwu Ukwandu, Chaminda Hewage, Hanan Hindy
{"title":"Editorial: Cyber security in the wake of fourth industrial revolution: opportunities and challenges.","authors":"Elochukwu Ukwandu, Chaminda Hewage, Hanan Hindy","doi":"10.3389/fdata.2024.1369159","DOIUrl":"https://doi.org/10.3389/fdata.2024.1369159","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1369159"},"PeriodicalIF":3.1,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10915258/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140050979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2024-02-21eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1358486
Muhammad Saad, Rabia Noor Enam, Rehan Qureshi
{"title":"Optimizing multi-objective task scheduling in fog computing with GA-PSO algorithm for big data application.","authors":"Muhammad Saad, Rabia Noor Enam, Rehan Qureshi","doi":"10.3389/fdata.2024.1358486","DOIUrl":"10.3389/fdata.2024.1358486","url":null,"abstract":"<p><p>As the volume and velocity of Big Data continue to grow, traditional cloud computing approaches struggle to meet the demands of real-time processing and low latency. Fog computing, with its distributed network of edge devices, emerges as a compelling solution. However, efficient task scheduling in fog computing remains a challenge due to its inherently multi-objective nature, balancing factors like execution time, response time, and resource utilization. This paper proposes a hybrid Genetic Algorithm (GA)-Particle Swarm Optimization (PSO) algorithm to optimize multi-objective task scheduling in fog computing environments. The hybrid approach combines the strengths of GA and PSO, achieving effective exploration and exploitation of the search space, leading to improved performance compared to traditional single-algorithm approaches. The proposed hybrid algorithm results improved the execution time by 85.68% when compared with GA algorithm, by 84% when compared with Hybrid PWOA and by 51.03% when compared with PSO algorithm as well as it improved the response time by 67.28% when compared with GA algorithm, by 54.24% when compared with Hybrid PWOA and by 75.40% when compared with PSO algorithm as well as it improved the completion time by 68.69% when compared with GA algorithm, by 98.91% when compared with Hybrid PWOA and by 75.90% when compared with PSO algorithm when various tasks inputs are given. The proposed hybrid algorithm results also improved the execution time by 84.87% when compared with GA algorithm, by 88.64% when compared with Hybrid PWOA and by 85.07% when compared with PSO algorithm it improved the response time by 65.92% when compared with GA algorithm, by 80.51% when compared with Hybrid PWOA and by 85.26% when compared with PSO algorithm as well as it improved the completion time by 67.60% when compared with GA algorithm, by 81.34% when compared with Hybrid PWOA and by 85.23% when compared with PSO algorithm when various fog nodes are given.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1358486"},"PeriodicalIF":3.1,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10915077/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140050981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2024-02-20eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1353988
Jianwu Wang, Junqi Yin, Mai H Nguyen, Jingbo Wang, Weijia Xu
{"title":"Editorial: Big scientific data analytics on HPC and cloud.","authors":"Jianwu Wang, Junqi Yin, Mai H Nguyen, Jingbo Wang, Weijia Xu","doi":"10.3389/fdata.2024.1353988","DOIUrl":"https://doi.org/10.3389/fdata.2024.1353988","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1353988"},"PeriodicalIF":3.1,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10912602/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140050978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trends of the COVID-19 dynamics in 2022 and 2023 vs. the population age, testing and vaccination levels","authors":"I. Nesteruk","doi":"10.3389/fdata.2023.1355080","DOIUrl":"https://doi.org/10.3389/fdata.2023.1355080","url":null,"abstract":"The population, governments, and researchers show much less interest in the COVID-19 pandemic. However, many questions still need to be answered: why the much less vaccinated African continent has accumulated 15 times less deaths per capita than Europe? or why in 2023 the global value of the case fatality risk is almost twice higher than in 2022 and the UK figure is four times higher than the global one?The averaged daily numbers of cases DCC and death DDC per million, case fatality risks DDC/DCC were calculated for 34 countries and regions with the use of John Hopkins University (JHU) datasets. Possible linear and non-linear correlations with the averaged daily numbers of tests per thousand DTC, median age of population A, and percentages of vaccinations VC and boosters BC were investigated.Strong correlations between age and DCC and DDC values were revealed. One-year increment in the median age yielded 39.8 increase in DCC values and 0.0799 DDC increase in 2022 (in 2023 these figures are 5.8 and 0.0263, respectively). With decreasing of testing level DTC, the case fatality risk can increase drastically. DCC and DDC values increase with increasing the percentages of fully vaccinated people and boosters, which definitely increase for greater A. After removing the influence of age, no correlations between vaccinations and DCC and DDC values were revealed.The presented analysis demonstrates that age is a pivot factor of visible (registered) part of the COVID-19 pandemic dynamics. Much younger Africa has registered less numbers of cases and death per capita due to many unregistered asymptomatic patients. Of great concern is the fact that COVID-19 mortality in 2023 in the UK is still at least 4 times higher than the global value caused by seasonal flu.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"92 20","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139440171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel approach to fake news classification using LSTM-based deep learning models","authors":"Halyna Padalko, Vasyl Chomko, D. Chumachenko","doi":"10.3389/fdata.2023.1320800","DOIUrl":"https://doi.org/10.3389/fdata.2023.1320800","url":null,"abstract":"The rapid dissemination of information has been accompanied by the proliferation of fake news, posing significant challenges in discerning authentic news from fabricated narratives. This study addresses the urgent need for effective fake news detection mechanisms. The spread of fake news on digital platforms has necessitated the development of sophisticated tools for accurate detection and classification. Deep learning models, particularly Bi-LSTM and attention-based Bi-LSTM architectures, have shown promise in tackling this issue. This research utilized Bi-LSTM and attention-based Bi-LSTM models, integrating an attention mechanism to assess the significance of different parts of the input data. The models were trained on an 80% subset of the data and tested on the remaining 20%, employing comprehensive evaluation metrics including Recall, Precision, F1-Score, Accuracy, and Loss. Comparative analysis with existing models revealed the superior efficacy of the proposed architectures. The attention-based Bi-LSTM model demonstrated remarkable proficiency, outperforming other models in terms of accuracy (97.66%) and other key metrics. The study highlighted the potential of integrating advanced deep learning techniques in fake news detection. The proposed models set new standards in the field, offering effective tools for combating misinformation. Limitations such as data dependency, potential for overfitting, and language and context specificity were acknowledged. The research underscores the importance of leveraging cutting-edge deep learning methodologies, particularly attention mechanisms, in fake news identification. The innovative models presented pave the way for more robust solutions to counter misinformation, thereby preserving the veracity of digital information. Future research should focus on enhancing data diversity, model efficiency, and applicability across various languages and contexts.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 3","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139446656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2024-01-08eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1296508
Zilong Zhao, Aditya Kunar, Robert Birke, Hiek Van der Scheer, Lydia Y Chen
{"title":"CTAB-GAN+: enhancing tabular data synthesis.","authors":"Zilong Zhao, Aditya Kunar, Robert Birke, Hiek Van der Scheer, Lydia Y Chen","doi":"10.3389/fdata.2023.1296508","DOIUrl":"https://doi.org/10.3389/fdata.2023.1296508","url":null,"abstract":"<p><p>The usage of synthetic data is gaining momentum in part due to the unavailability of original data due to privacy and legal considerations and in part due to its utility as an augmentation to the authentic data. Generative adversarial networks (GANs), a paragon of generative models, initially for images and subsequently for tabular data, has contributed many of the state-of-the-art synthesizers. As GANs improve, the synthesized data increasingly resemble the real data risking to leak privacy. Differential privacy (DP) provides theoretical guarantees on privacy loss but degrades data utility. Striking the best trade-off remains yet a challenging research question. In this study, we propose CTAB-GAN+ a novel conditional tabular GAN. CTAB-GAN+ improves upon state-of-the-art by (i) adding downstream losses to conditional GAN for higher utility synthetic data in both classification and regression domains; (ii) using Wasserstein loss with gradient penalty for better training convergence; (iii) introducing novel encoders targeting mixed continuous-categorical variables and variables with unbalanced or skewed data; and (iv) training with DP stochastic gradient descent to impose strict privacy guarantees. We extensively evaluate CTAB-GAN+ on statistical similarity and machine learning utility against state-of-the-art tabular GANs. The results show that CTAB-GAN+ synthesizes privacy-preserving data with at least 21.9% higher machine learning utility (i.e., F1-Score) across multiple datasets and learning tasks under given privacy budget.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1296508"},"PeriodicalIF":3.1,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10801038/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139520685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}