{"title":"A scalable tool for analyzing genomic variants of humans using knowledge graphs and graph machine learning.","authors":"Shivika Prasanna, Ajay Kumar, Deepthi Rao, Eduardo J Simoes, Praveen Rao","doi":"10.3389/fdata.2024.1466391","DOIUrl":"10.3389/fdata.2024.1466391","url":null,"abstract":"<p><p>Advances in high-throughput genome sequencing have enabled large-scale genome sequencing in clinical practice and research studies. By analyzing genomic variants of humans, scientists can gain better understanding of the risk factors of complex diseases such as cancer and COVID-19. To model and analyze the rich genomic data, knowledge graphs (KGs) and graph machine learning (GML) can be regarded as enabling technologies. In this article, we present a scalable tool called VariantKG for analyzing genomic variants of humans modeled using KGs and GML. Specifically, we used publicly available genome sequencing data from patients with COVID-19. VariantKG extracts variant-level genetic information output by a variant calling pipeline, annotates the variant data with additional metadata, and converts the annotated variant information into a KG represented using the Resource Description Framework (RDF). The resulting KG is further enhanced with patient metadata and stored in a scalable graph database that enables efficient RDF indexing and query processing. VariantKG employs the Deep Graph Library (DGL) to perform GML tasks such as node classification. A user can extract a subset of the KG and perform inference tasks using DGL. The user can monitor the training and testing performance and hardware utilization. We tested VariantKG for KG construction by using 1,508 genome sequences, leading to 4 billion RDF statements. We evaluated GML tasks using VariantKG by selecting a subset of 500 sequences from the KG and performing node classification using well-known GML techniques such as GraphSAGE, Graph Convolutional Network (GCN) and Graph Transformer. VariantKG has intuitive user interfaces and features enabling a low barrier to entry for KG construction, model inference, and model interpretation on genomic variants of humans.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1466391"},"PeriodicalIF":2.4,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11790625/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143190383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-01-17eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1402926
Feras Al-Obeidat, Wael Hafez, Asrar Rashid, Mahir Khalil Jallo, Munier Gador, Ivan Cherrez-Ojeda, Daniel Simancas-Racines
{"title":"Artificial intelligence for the detection of acute myeloid leukemia from microscopic blood images; a systematic review and meta-analysis.","authors":"Feras Al-Obeidat, Wael Hafez, Asrar Rashid, Mahir Khalil Jallo, Munier Gador, Ivan Cherrez-Ojeda, Daniel Simancas-Racines","doi":"10.3389/fdata.2024.1402926","DOIUrl":"10.3389/fdata.2024.1402926","url":null,"abstract":"<p><strong>Background: </strong>Leukemia is the 11<sup>th</sup> most prevalent type of cancer worldwide, with acute myeloid leukemia (AML) being the most frequent malignant blood malignancy in adults. Microscopic blood tests are the most common methods for identifying leukemia subtypes. An automated optical image-processing system using artificial intelligence (AI) has recently been applied to facilitate clinical decision-making.</p><p><strong>Aim: </strong>To evaluate the performance of all AI-based approaches for the detection and diagnosis of acute myeloid leukemia (AML).</p><p><strong>Methods: </strong>Medical databases including PubMed, Web of Science, and Scopus were searched until December 2023. We used the \"metafor\" and \"metagen\" libraries in R to analyze the different models used in the studies. Accuracy and sensitivity were the primary outcome measures.</p><p><strong>Results: </strong>Ten studies were included in our review and meta-analysis, conducted between 2016 and 2023. Most deep-learning models have been utilized, including convolutional neural networks (CNNs). The common- and random-effects models had accuracies of 1.0000 [0.9999; 1.0001] and 0.9557 [0.9312, and 0.9802], respectively. The common and random effects models had high sensitivity values of 1.0000 and 0.8581, respectively, indicating that the machine learning models in this study can accurately detect true-positive leukemia cases. Studies have shown substantial variations in accuracy and sensitivity, as shown by the Q values and I<sup>2</sup> statistics.</p><p><strong>Conclusion: </strong>Our systematic review and meta-analysis found an overall high accuracy and sensitivity of AI models in correctly identifying true-positive AML cases. Future research should focus on unifying reporting methods and performance assessment metrics of AI-based diagnostics.</p><p><strong>Systematic review registration: </strong>https://www.crd.york.ac.uk/prospero/#recordDetails, CRD42024501980.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1402926"},"PeriodicalIF":2.4,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11782132/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143081942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-01-17eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1506443
Liu Feng, Yang Liu, Benyun Shi, Jiming Liu
{"title":"Toward a physics-guided machine learning approach for predicting chaotic systems dynamics.","authors":"Liu Feng, Yang Liu, Benyun Shi, Jiming Liu","doi":"10.3389/fdata.2024.1506443","DOIUrl":"10.3389/fdata.2024.1506443","url":null,"abstract":"<p><p>Predicting the dynamics of chaotic systems is crucial across various practical domains, including the control of infectious diseases and responses to extreme weather events. Such predictions provide quantitative insights into the future behaviors of these complex systems, thereby guiding the decision-making and planning within the respective fields. Recently, data-driven approaches, renowned for their capacity to learn from empirical data, have been widely used to predict chaotic system dynamics. However, these methods rely solely on historical observations while ignoring the underlying mechanisms that govern the systems' behaviors. Consequently, they may perform well in short-term predictions by effectively fitting the data, but their ability to make accurate long-term predictions is limited. A critical challenge in modeling chaotic systems lies in their sensitivity to initial conditions; even a slight variation can lead to significant divergence in actual and predicted trajectories over a finite number of time steps. In this paper, we propose a novel Physics-Guided Learning (PGL) method, aiming at extending the scope of accurate forecasting as much as possible. The proposed method aims to synergize observational data with the governing physical laws of chaotic systems to predict the systems' future dynamics. Specifically, our method consists of three key elements: a data-driven component (DDC) that captures dynamic patterns and mapping functions from historical data; a physics-guided component (PGC) that leverages the governing principles of the system to inform and constrain the learning process; and a nonlinear learning component (NLC) that effectively synthesizes the outputs of both the data-driven and physics-guided components. Empirical validation on six dynamical systems, each exhibiting unique chaotic behaviors, demonstrates that PGL achieves lower prediction errors than existing benchmark predictive models. The results highlight the efficacy of our design of data-physics integration in improving the precision of chaotic system dynamics forecasts.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1506443"},"PeriodicalIF":2.4,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11782262/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143081944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-01-15eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1469809
Stephan Räss, Markus C Leuenberger
{"title":"Analysis and prediction of atmospheric ozone concentrations using machine learning.","authors":"Stephan Räss, Markus C Leuenberger","doi":"10.3389/fdata.2024.1469809","DOIUrl":"https://doi.org/10.3389/fdata.2024.1469809","url":null,"abstract":"<p><p>Atmospheric ozone chemistry involves various substances and reactions, which makes it a complex system. We analyzed data recorded by Switzerland's National Air Pollution Monitoring Network (NABEL) to showcase the capabilities of machine learning (ML) for the prediction of ozone concentrations (daily averages) and to document a general approach that can be followed by anyone facing similar problems. We evaluated various artificial neural networks and compared them to linear as well as non-linear models deduced with ML. The main analyses and the training of the models were performed on atmospheric air data recorded from 2016 to 2023 at the NABEL station Lugano-Università in Lugano, TI, Switzerland. As a first step, we used techniques like best subset selection to determine the measurement parameters that might be relevant for the prediction of ozone concentrations; in general, the parameters identified by these methods agree with atmospheric ozone chemistry. Based on these results, we constructed various models and used them to predict ozone concentrations in Lugano for the period between January 1, 2024, and March 31, 2024; then, we compared the output of our models to the actual measurements and repeated this procedure for two NABEL stations situated in northern Switzerland (Dübendorf-Empa and Zürich-Kaserne). For these stations, predictions were made for the aforementioned period and the period between January 1, 2023, and December 31, 2023. In most of the cases, the lowest mean absolute errors (MAE) were provided by a non-linear model with 12 components (different powers and linear combinations of NO<sub>2</sub>, NO<sub>X</sub>, SO<sub>2</sub>, non-methane volatile organic compounds, temperature and radiation); the MAE of predicted ozone concentrations in Lugano was as low as 9 μgm<sup>-3</sup>. For the stations in Zürich and Dübendorf, the lowest MAEs were around 11 μgm<sup>-3</sup> and 13 μgm<sup>-3</sup>, respectively. For the tested periods, the accuracy of the best models was approximately 1 μgm<sup>-3</sup>. Since the aforementioned values are all lower than the standard deviations of the observations we conclude that using ML for complex data analyses can be very helpful and that artificial neural networks do not necessarily outperform simpler models.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1469809"},"PeriodicalIF":2.4,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11774898/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143069704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-01-14eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1518939
Rencheng Fang, Tao Zhou, Baohua Yu, Zhigang Li, Long Ma, Tao Luo, Yongcai Zhang, Xinqi Liu
{"title":"Prediction model of middle school student performance based on MBSO and MDBO-BP-Adaboost method.","authors":"Rencheng Fang, Tao Zhou, Baohua Yu, Zhigang Li, Long Ma, Tao Luo, Yongcai Zhang, Xinqi Liu","doi":"10.3389/fdata.2024.1518939","DOIUrl":"https://doi.org/10.3389/fdata.2024.1518939","url":null,"abstract":"<p><p>Predictions of student performance are important to the education system as a whole, helping students to know how their learning is changing and adjusting teachers' and school policymakers' plans for their future growth. However, selecting meaningful features from the huge amount of educational data is challenging, so the dimensionality of student achievement features needs to be reduced. Based on this motivation, this paper proposes an improved Binary Snake Optimizer (MBSO) as a wrapped feature selection model, taking the Mat and Por student achievement data in the UCI database as an example, and comparing the MBSO feature selection model with other feature methods, the MBSO is able to select features with strong correlation to the students and the average number of student features selected reaches a minimum of 7.90 and 7.10, which greatly reduces the complexity of student achievement prediction. In addition, we propose the MDBO-BP-Adaboost model to predict students' performance. Firstly, the model incorporates the good point set initialization, triangle wandering strategy and adaptive t-distribution strategy to obtain the Modified Dung Beetle Optimization Algorithm (MDBO), secondly, it uses MDBO to optimize the weights and thresholds of the BP neural network, and lastly, the optimized BP neural network is used as a weak learner for Adaboost. MDBO-BP-Adaboost After comparing with XGBoost, BP, BP-Adaboost, and DBO-BP-Adaboost models, the experimental results show that the R<sup>2</sup> on the student achievement dataset is 0.930 and 0.903, respectively, which proves that the proposed MDBO-BP-Adaboost model has a better effect than the other models in the prediction of students' achievement with better results than other models.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1518939"},"PeriodicalIF":2.4,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11772490/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143061453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-01-13eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1520605
Zhuang Xiong, Jun Ma, Bohang Chen, Haiming Lan, Yong Niu
{"title":"Multi-source data recognition and fusion algorithm based on a two-layer genetic algorithm-back propagation model.","authors":"Zhuang Xiong, Jun Ma, Bohang Chen, Haiming Lan, Yong Niu","doi":"10.3389/fdata.2024.1520605","DOIUrl":"10.3389/fdata.2024.1520605","url":null,"abstract":"<p><p>Traditional rainfall data collection mainly relies on rain buckets and meteorological data. It rarely considers the impact of sensor faults on measurement accuracy. To solve this problem, a two-layer genetic algorithm-backpropagation (GA-BP) model is proposed. The algorithm focuses on multi-source data identification and fusion. Rainfall data from a sensor array are first used. The GA optimizes the weights and thresholds of the BP neural network. It determines the optimal population and minimizes fitness values. This process builds a GA-BP model for recognizing sensor faults. A second GA-BP network is then created based on fault data. This model achieves data fusion output. The two-layer GA-BP algorithm is compared with a single BP neural network and actual expected values to test its performance. The results show that the two-layer GA-BP algorithm reduces data fusion runtime by 2.37 s compared to the single-layer BP model. For faults such as lost signals, high-value bias, and low-value bias, recognition accuracies improve by 26.09%, 18.18%, and 7.15%, respectively. The mean squared error is 3.49 mm lower than that of the single-layer BP model. The fusion output waveform is also smoother with less fluctuation. These results confirm that the two-layer GA-BP model improves system robustness and generalization.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1520605"},"PeriodicalIF":2.4,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11769991/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143054262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-01-07eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1476506
Revathy Venkataramanan, Aalap Tripathy, Tarun Kumar, Sergey Serebryakov, Annmary Justine, Arpit Shah, Suparna Bhattacharya, Martin Foltin, Paolo Faraboschi, Kaushik Roy, Amit Sheth
{"title":"Constructing a metadata knowledge graph as an atlas for demystifying AI pipeline optimization.","authors":"Revathy Venkataramanan, Aalap Tripathy, Tarun Kumar, Sergey Serebryakov, Annmary Justine, Arpit Shah, Suparna Bhattacharya, Martin Foltin, Paolo Faraboschi, Kaushik Roy, Amit Sheth","doi":"10.3389/fdata.2024.1476506","DOIUrl":"10.3389/fdata.2024.1476506","url":null,"abstract":"<p><p>The emergence of advanced artificial intelligence (AI) models has driven the development of frameworks and approaches that focus on automating model training and hyperparameter tuning of end-to-end AI pipelines. However, other crucial stages of these pipelines such as dataset selection, feature engineering, and model optimization for deployment have received less attention. Improving efficiency of end-to-end AI pipelines requires metadata of past executions of AI pipelines and all their stages. Regenerating metadata history by re-executing existing AI pipelines is computationally challenging and impractical. To address this issue, we propose to source AI pipeline metadata from open-source platforms such as Papers-with-Code, OpenML, and Hugging Face. However, integrating and unifying the varying terminologies and data formats from these diverse sources is a challenge. In this study, we present a solution by introducing Common Metadata Ontology (CMO) which is used to construct an extensive AI Pipeline Metadata Knowledge Graph (AIMKG) consisting of 1.6 million pipelines. Through semantic enhancements, the pipeline metadata in AIMKG is also enriched for downstream tasks such as search and recommendation of AI pipelines. We perform quantitative and qualitative evaluations on AIMKG to search and recommend relevant pipelines to user query. For quantitative evaluation, we propose a custom aggregation model that outperforms other baselines by achieving a retrieval accuracy (R@1) of 76.3%. Our qualitative analysis shows that AIMKG-based recommender retrieved relevant pipelines in 78% of test cases compared to the state-of-the-art MLSchema-based recommender which retrieved relevant responses in 51% of the cases. AIMKG serves as an atlas for navigating the evolving AI landscape, providing practitioners with a comprehensive factsheet for their applications. It guides AI pipeline optimization, offers insights and recommendations for improving AI pipelines, and serves as a foundation for data mining and analysis of evolving AI workflows.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1476506"},"PeriodicalIF":2.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11748301/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143016137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-01-07eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1365417
Rapuru Rushendran, Vellapandian Chitra
{"title":"Exploring infodemiology: unraveling the intricate relationships among stress, headaches, migraines, and suicide through Google Trends analysis.","authors":"Rapuru Rushendran, Vellapandian Chitra","doi":"10.3389/fdata.2024.1365417","DOIUrl":"10.3389/fdata.2024.1365417","url":null,"abstract":"<p><strong>Introduction: </strong>Google Trends has emerged as a vital resource for understanding public information-seeking behavior. This study investigates the interconnected search trends of stress, headaches, migraines, and suicide, highlighting their relevance to public health and mental well-being. By employing infodemiology, the study explores temporal and geographical patterns in search behavior and examines the impact of global events like the COVID-19 pandemic.</p><p><strong>Methods: </strong>Data mining was conducted using Google Trends for the search terms \"stress,\" \"headache,\" \"migraine,\" and \"suicide.\" Relative Search Volume (RSV) data from October 2013 to October 2023 was collected and adjusted for time and location. Statistical analyses, including Pearson correlation tests, linear regression, and seasonal Mann-Kendall tests, were applied to identify correlations, trends, and seasonal variations. Geographical differences were also analyzed to understand regional disparities.</p><p><strong>Results: </strong>Significant correlations were observed among the search terms, with \"migraine\" and \"suicide\" showing the strongest association. Seasonal variations revealed a peak in search volumes during winter months. Geographical analysis highlighted consistently high RSV in the Philippines for all terms. During the COVID-19 pandemic, searches for stress, headaches, and migraines showed notable increases, reflecting heightened public interest in mental health-related topics during this period.</p><p><strong>Discussion: </strong>The study underscores the interconnected nature of stress, headaches, migraines, and suicide in public search behavior. Seasonal patterns and regional variations emphasize the need for targeted interventions. The observed surge in search volume during the COVID-19 pandemic highlights the profound impact of global crises on mental health and the importance of timely public health responses.</p><p><strong>Conclusion: </strong>Google Trends provides valuable insights into the public's interest in health-related topics, demonstrating the intricate relationship between stress, headaches, migraines, and suicide. The findings highlight the need for increased mental health awareness and interventions, particularly during times of heightened stress. Further research is essential to develop strategies that mitigate the impact of these stressors on public health.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1365417"},"PeriodicalIF":2.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11747232/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143016139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2025-01-03eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1443646
Po-Chi Hsu, Jia-Ming Chen, Chia-Chu Chang, Yu-Jun Chang, Ping-Fang Chiu, John Y Chiang, Lun-Chien Lo
{"title":"Exploring the pivotal variables of tongue diagnosis between patients with chronic kidney disease and health participants.","authors":"Po-Chi Hsu, Jia-Ming Chen, Chia-Chu Chang, Yu-Jun Chang, Ping-Fang Chiu, John Y Chiang, Lun-Chien Lo","doi":"10.3389/fdata.2024.1443646","DOIUrl":"https://doi.org/10.3389/fdata.2024.1443646","url":null,"abstract":"<p><strong>Introduction: </strong>Chronic kidney disease (CKD) is a significant global health problem associated with high morbidity and mortality rates. Traditional Chinese Medicine (TCM) utilizes tongue diagnosis to differentiate symptoms and predict prognosis. This study examines the relationship between tongue characteristics and CKD severity using an automatic tongue diagnosis system (ATDS), which captures tongue images non-invasively to provide objective diagnostic information.</p><p><strong>Methods: </strong>This cross-sectional, case-control study was conducted from July 1, 2019, to December 31, 2021. Participants were divided into three groups based on estimated glomerular filtration rate (eGFR): control (eGFR > 60 ml/min/1.732), CKD stage 3 (30 ≤ eGFR < 60 ml/min/1.732), and CKD stage 4-5 (eGFR < 30 ml/min/1.732). Tongue images were analyzed using ATDS to extract nine primary features: tongue shape, color, fur, saliva, fissures, ecchymosis, tooth marks, and red dots. Statistical analyses included non-parametric methods and ordinal logistic regression.</p><p><strong>Results: </strong>This study revealed that significant differences in the fur thickness, tongue color, amount of ecchymosis, and saliva among three groups. Ordinal logistic regression indicated that pale tongue color (OR: 2.107, <i>P</i> < 0.001), bluish tongue color (OR: 2.743, <i>P</i> = 0.001), yellow fur (OR: 3.195, <i>P</i> < 0.001), wet saliva (OR: 2.536, <i>P</i> < 0.001), and ecchymoses (OR: 1.031, <i>P</i> = 0.012) were significantly associated with increased CKD severity. Additionally, each red dot and tooth mark decreased the odds of severe CKD.</p><p><strong>Conclusion: </strong>Tongue features such as paleness, wet saliva, yellow fur, and ecchymosis are prevalent in CKD patients and can serve as early clinical indicators of the disease. This study demonstrates that TCM tongue diagnosis, facilitated by ATDS, is a valuable, non-invasive method for identifying CKD and distinguishing its stages.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1443646"},"PeriodicalIF":2.4,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11739136/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143016113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}