{"title":"Development of a Large-Scale Dataset of Chest Computed Tomography Reports in Japanese and a High-Performance Finding Classification Model: Dataset Development and Validation Study.","authors":"Yosuke Yamagishi, Yuta Nakamura, Tomohiro Kikuchi, Yuki Sonoda, Hiroshi Hirakawa, Shintaro Kano, Satoshi Nakamura, Shouhei Hanaoka, Takeharu Yoshikawa, Osamu Abe","doi":"10.2196/71137","DOIUrl":"10.2196/71137","url":null,"abstract":"<p><strong>Background: </strong>Recent advances in large language models have highlighted the need for high-quality multilingual medical datasets. Although Japan is a global leader in computed tomography (CT) scanner deployment and use, the absence of large-scale Japanese radiology datasets has hindered the development of specialized language models for medical imaging analysis. Despite the emergence of multilingual models and language-specific adaptations, the development of Japanese-specific medical language models has been constrained by a lack of comprehensive datasets, particularly in radiology.</p><p><strong>Objective: </strong>This study aims to address this critical gap in Japanese medical natural language processing resources, for which a comprehensive Japanese CT report dataset was developed through machine translation, to establish a specialized language model for structured classification. In addition, a rigorously validated evaluation dataset was created through expert radiologist refinement to ensure a reliable assessment of model performance.</p><p><strong>Methods: </strong>We translated the CT-RATE dataset (24,283 CT reports from 21,304 patients) into Japanese using GPT-4o mini. The training dataset consisted of 22,778 machine-translated reports, and the validation dataset included 150 reports carefully revised by radiologists. We developed CT-BERT-JPN, a specialized Bidirectional Encoder Representations from Transformers (BERT) model for Japanese radiology text, based on the \"tohoku-nlp/bert-base-japanese-v3\" architecture, to extract 18 structured findings from reports. Translation quality was assessed with Bilingual Evaluation Understudy (BLEU) and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scores and further evaluated by radiologists in a dedicated human-in-the-loop experiment. In that experiment, each of a randomly selected subset of reports was independently reviewed by 2 radiologists-1 senior (postgraduate year [PGY] 6-11) and 1 junior (PGY 4-5)-using a 5-point Likert scale to rate: (1) grammatical correctness, (2) medical terminology accuracy, and (3) overall readability. Inter-rater reliability was measured via quadratic weighted kappa (QWK). Model performance was benchmarked against GPT-4o using accuracy, precision, recall, F1-score, ROC (receiver operating characteristic)-AUC (area under the curve), and average precision.</p><p><strong>Results: </strong>General text structure was preserved (BLEU: 0.731 findings, 0.690 impression; ROUGE: 0.770-0.876 findings, 0.748-0.857 impression), though expert review identified 3 categories of necessary refinements-contextual adjustment of technical terms, completion of incomplete translations, and localization of Japanese medical terminology. The radiologist-revised translations scored significantly higher than raw machine translations across all dimensions, and all improvements were statistically significant (P<.001). CT-BERT-JPN outperformed GPT-4o on 11 of 18 findin","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e71137"},"PeriodicalIF":3.8,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12392688/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Machine Learning Model for Predicting Sarcopenia Among Middle-Aged Adults: Development and External Validation.","authors":"Hye Jin Chong","doi":"10.2196/75760","DOIUrl":"10.2196/75760","url":null,"abstract":"<p><strong>Background: </strong>Sarcopenia is a common muscle disorder in older adults, and its early identification and management in middle-aged populations are essential for ensuring a healthier later life. Detecting sarcopenia at an earlier stage may reduce the future burden on health care systems and enhance the quality of life in older adults. Machine learning (ML) models can evaluate large datasets, identify essential variables, and find complicated correlations between input variables. However, using ML models to detect sarcopenia remains an unsatisfied need.</p><p><strong>Objective: </strong>This study aimed to develop and externally validate an ML model to predict sarcopenia risk among middle-aged adults using a nationally representative dataset.</p><p><strong>Methods: </strong>We analyzed data from 1926 participants aged 40 to 64 years and enrolled in the 2022 Korea National Health and Nutrition Examination Survey (KNHANES). Sarcopenia was diagnosed and defined based on the 2019 Asian Working Group for Sarcopenia criteria, which incorporate both low muscle mass and reduced muscle strength. Muscle mass was assessed using bioelectrical impedance analysis with cutoffs of <7.0 kg/m² for men and <5.7 kg/m² for women. Muscle strength was measured via handgrip strength using a digital dynamometer with thresholds of <28 kg for men and <18 kg for women. Participants meeting both criteria were classified as those with sarcopenia. Four ML algorithms, random forest, support vector machine, extreme gradient boosting, and logistic regression, were used to identify risk factors of sarcopenia and predict its likelihood. The top-performing model was subsequently validated in an external cohort of 2247 middle-aged adults from the 2023 KNHANES. Model performance was assessed using the F<sub>2</sub>-score, area under the curve of a receiver operating characteristic curve, and sensitivity. All analyses were conducted using Python 3.13.2 (Python Software Foundation).</p><p><strong>Results: </strong>Among the 4 models, the logistic regression model demonstrated the strongest performance, yielding an area under the curve of 0.85, a sensitivity of 0.92, and an F<sub>2</sub>-score of 0.66. External validation using the 2023 KNHANES dataset confirmed the model's robust performance, indicating its potential for widespread applications.</p><p><strong>Conclusions: </strong>This study developed and externally validated an ML model that accurately identified sarcopenia in middle-aged adults. Leveraging data from a comprehensive national survey, our findings underscore the significance of early detection and customized interventions in midlife to mitigate sarcopenia risk and optimize long-term health outcomes.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e75760"},"PeriodicalIF":3.8,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12423610/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John Novoa-Laurentiev, Mica Bowen, Avery Pullman, Wenyu Song, Ania Syrowatka, Jin Chen, Michael Sainlaire, Frank Chang, Krissy Gray, Purushottam Panta, Luwei Liu, Khalid Nawab, Shadi Hijjawi, Richard Schreiber, Li Zhou, Patricia C Dykes
{"title":"An Extraction Tool for Venous Thromboembolism Symptom Identification in Primary Care Notes to Facilitate Electronic Clinical Quality Measure Reporting: Algorithm Development and Validation Study.","authors":"John Novoa-Laurentiev, Mica Bowen, Avery Pullman, Wenyu Song, Ania Syrowatka, Jin Chen, Michael Sainlaire, Frank Chang, Krissy Gray, Purushottam Panta, Luwei Liu, Khalid Nawab, Shadi Hijjawi, Richard Schreiber, Li Zhou, Patricia C Dykes","doi":"10.2196/63720","DOIUrl":"https://doi.org/10.2196/63720","url":null,"abstract":"<p><strong>Background: </strong>Diagnosis of venous thromboembolism (VTE) is often delayed, and facilitating earlier diagnosis may improve associated morbidity and mortality. Clinical notes contain information not found elsewhere in the medical record that could facilitate timely VTE diagnosis and accurate quality measurement. However, extracting relevant information from unstructured clinical notes is complex. Today, there are relatively few electronic clinical quality measures (eCQMs) in our national payment program and none that use natural language processing (NLP) techniques for data extraction. NLP holds great promise for making quality measurement more accurate and more efficient. Given the potential of NLP-based applications to facilitate more accurate VTE detection, primary care is one clinical setting in urgent need of this type of tool.</p><p><strong>Objective: </strong>This study aimed to develop a tool that extracts VTE symptoms from clinical notes for use within an eCQM to quantify the rate of delayed diagnosis of VTE in primary care settings.</p><p><strong>Methods: </strong>We iteratively developed an NLP-based data extraction tool, venous thromboembolism symptom extractor (VTExt), on an internal dataset using a rule-based approach to extract VTE symptoms from primary care clinical note text. The VTE symptoms lexicon was derived and optimized with physician guidance and externally validated using datasets from 2 independent health care organizations. We performed 26 rounds of performance evaluation of notes sampled from the case cohort (17,585 patient progress note sentences from 279 patient notes), and 5 rounds of evaluation of the control cohort (2838 patient progress note sentences from 50 patient notes). VTExt's performance was evaluated using evaluation metrics, including area under the curve, positive predictive value, negative predictive value, sensitivity, and specificity.</p><p><strong>Results: </strong>VTExt achieved near-perfect performance in extracting VTE symptoms from primary care notes sampled from records of patients diagnosed with or without VTE. In external validation, VTExt achieved promising performance in 2 additional geographically distant organizations using different electronic health record systems. When compared against a deep learning model and 4 machine learning models, VTExt exhibited similar or even improved performance across all metrics.</p><p><strong>Conclusions: </strong>This study demonstrates a data-driven NLP-based approach to clinical note information extraction that can be generalized to different electronic health record systems across different institutions. Due to the robust performance of this tool, VTExt is the first NLP application to be used in a nationally endorsed eCQM.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e63720"},"PeriodicalIF":3.8,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387394/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alper Idrisoglu, Johan Flyborg, Sarah Nauman Ghazi, Elina Mikaelsson Midlöv, Helén Dellkvist, Anna Axén, Ana Luiza Dallora
{"title":"Prediction of Mini-Mental State Examination Scores for Cognitive Impairment and Machine Learning Analysis of Oral Health and Demographic Data Among Individuals Older Than 60 Years: Cross-Sectional Study.","authors":"Alper Idrisoglu, Johan Flyborg, Sarah Nauman Ghazi, Elina Mikaelsson Midlöv, Helén Dellkvist, Anna Axén, Ana Luiza Dallora","doi":"10.2196/75069","DOIUrl":"https://doi.org/10.2196/75069","url":null,"abstract":"<p><strong>Background: </strong>As the older population grows, so does the prevalence of cognitive impairment, emphasizing the importance of early diagnosis. The Mini-Mental State Examination (MMSE) is vital in identifying cognitive impairment. It is known that degraded oral health correlates with MMSE scores ≤26.</p><p><strong>Objective: </strong>This study aims to explore the potential of using machine learning (ML) technologies using oral health and demographic examination data to predict the probability of having MMSE scores of 30 or ≤26 in Swedish individuals older than 60 years.</p><p><strong>Methods: </strong>The study had a cross-sectional design. Baseline data from 2 longitudinal oral health and ongoing general health studies involving individuals older than 60 years were entered into ML models, including random forest, support vector machine, and CatBoost (CB) to classify MMSE scores as either 30 or ≤26, distinguishing between MMSE of 30 and MMSE ≤26 groups. Nested cross-validation (nCV) was used to mitigate overfitting. The best performance-giving model was further investigated for feature importance using Shapley additive explanation summary plots to easily visualize the contribution of each feature to the prediction output. The sample consisted of 693 individuals (350 females and 343 males).</p><p><strong>Results: </strong>All CB, random forest, and support vector machine models achieved high classification accuracies. However, CB exhibited superior performance with an average accuracy of 80.6% on the model using 3 × 3 nCV and surpassed the performance of other models. The Shapley additive explanation summary plot illustrates the impact of factors on the model's predictions, such as age, Plaque Index, probing pocket depth, a feeling of dry mouth, level of education, and use of dental hygiene tools for approximal cleaning.</p><p><strong>Conclusions: </strong>The oral health parameters and demographic data used as inputs for ML classifiers contain sufficient information to differentiate between MMSE scores ≤26 and 30. This study suggests oral health parameters and ML techniques could offer a potential tool for screening MMSE scores for individuals aged 60 years and older.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e75069"},"PeriodicalIF":3.8,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12377517/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qianzi Che, Yuanming Leng, Wei Yang, Xihao Cao, Zhongxia Wang, Lizheng Liu, Feibiao Xie, Ruilin Wang
{"title":"Tongue Image-Based Diagnosis of Acute Respiratory Tract Infection Using Machine Learning: Algorithm Development and Validation.","authors":"Qianzi Che, Yuanming Leng, Wei Yang, Xihao Cao, Zhongxia Wang, Lizheng Liu, Feibiao Xie, Ruilin Wang","doi":"10.2196/74102","DOIUrl":"https://doi.org/10.2196/74102","url":null,"abstract":"<p><strong>Background: </strong>Human adenoviruses (HAdVs) and COVID-19 are prominent respiratory pathogens with overlapping clinical presentations, including fever, cough, and sore throat, posing significant diagnostic challenges without viral testing. Tongue image diagnosis, a noninvasive method used in traditional Chinese medicine, has shown correlations with specific respiratory infections, but its application remains underexplored in differentiating HAdVs from COVID-19. Advances in artificial intelligence offer opportunities to enhance tongue image analysis for more objective and accurate diagnostics.</p><p><strong>Objective: </strong>This study aims to develop and validate artificial intelligence-based predictive models using tongue image features to differentiate COVID-19 from adenoviral respiratory infections, thereby improving diagnostic accuracy and integrating traditional diagnostic methods with modern medical technologies.</p><p><strong>Methods: </strong>A total of 280 tongue images were collected from 58 patients with COVID-19, 84 patients with HAdVs, and 30 healthy controls. Deep learning methods were applied to extract tongue features, including color, coating, fissures, papillae, tooth marks, and granules. Four machine learning classifiers, logistic regression, random forest, gradient boosting model, and extreme gradient boosting, were developed to differentiate COVID-19 and HAdV infections. The key features identified by the machine learning algorithms were further visualized in a 2D space.</p><p><strong>Results: </strong>Nine tongue features showed significant differences among groups (all P<.05), including coating color (red, green, and blue), presence of tooth marks, coating crack ratio, moisture level, texture directionality, roughness, and contrast. The extreme gradient boosting model achieved the highest diagnostic performance with an area under the receiver operating characteristic curve of 0.84 (95% CI 0.78-0.90) and an area under the precision-recall curve above 0.70. Shapley additive explanations analysis indicated tongue color, moisture, and texture as key contributors.</p><p><strong>Conclusions: </strong>Our findings demonstrate the potential of tongue diagnosis in identifying pathogens responsible for acute respiratory tract infections at the time of admission. This approach holds significant clinical implications, offering the potential to reduce clinician workloads while improving diagnostic accuracy and the overall quality of medical care.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e74102"},"PeriodicalIF":3.8,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12377515/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AI-Driven Integration of Deep Learning With Lung Imaging, Functional Analysis, and Blood Gas Metrics for Perioperative Hypoxemia Prediction.","authors":"Kecheng Huang, Chujun Wu, Rongpeng Pi, Jieyu Fang","doi":"10.2196/73995","DOIUrl":"10.2196/73995","url":null,"abstract":"<p><p>This viewpoint article explores the transformative role of artificial intelligence (AI) in predicting perioperative hypoxemia through the integration of deep learning with multimodal clinical data, including lung imaging, pulmonary function tests, and arterial blood gas (ABG) analysis. Perioperative hypoxemia, defined as arterial oxygen partial pressure <60 mmHg or oxygen saturation <90%, poses significant risks of delayed recovery and organ dysfunction. Traditional diagnostic methods such as radiological imaging and ABG analysis often lack integrated predictive accuracy. AI frameworks, particularly convolutional neural networks and hybrid models like TD-CNNLSTM-LungNet, demonstrate exceptional performance in detecting pulmonary inflammation and stratifying hypoxemia risk, achieving up to 96.57% accuracy in pneumonia subtype differentiation and an area under the curve of 0.96 for postoperative hypoxemia prediction. Multimodal AI systems, such as DeepLung-Predict, unify computed tomography scans, pulmonary function tests, and ABG parameters to enhance predictive precision, surpassing conventional methods by 22%. However, challenges persist, including dataset heterogeneity, model interpretability, and clinical workflow integration. Future directions emphasize multicenter validation, explainable AI frameworks, and pragmatic trials to ensure equitable and reliable deployment. This AI-driven approach not only optimizes resource allocation but also mitigates financial burdens on health care systems by enabling early interventions and reducing intensive care unit admission risks.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":" ","pages":"e73995"},"PeriodicalIF":3.8,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12413569/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144786048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liesbeth Hunik, Asma Chaabouni, Twan van Laarhoven, Tim C Olde Hartman, Ralph T H Leijenaar, Jochen W L Cals, Annemarie A Uijen, Henk J Schers
{"title":"Diagnostic Prediction Models for Primary Care, Based on AI and Electronic Health Records: Systematic Review.","authors":"Liesbeth Hunik, Asma Chaabouni, Twan van Laarhoven, Tim C Olde Hartman, Ralph T H Leijenaar, Jochen W L Cals, Annemarie A Uijen, Henk J Schers","doi":"10.2196/62862","DOIUrl":"https://doi.org/10.2196/62862","url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI)-based diagnostic prediction models could aid primary care (PC) in decision-making for faster and more accurate diagnoses. AI has the potential to transform electronic health records (EHRs) data into valuable diagnostic prediction models. Different prediction models based on EHR have been developed. However, there are currently no systematic reviews that evaluate AI-based diagnostic prediction models for PC using EHR data.</p><p><strong>Objective: </strong>This study aims to evaluate the content of diagnostic prediction models based on AI and EHRs in PC, including risk of bias and applicability.</p><p><strong>Methods: </strong>This systematic review was performed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. MEDLINE, Embase, Web of Science, and Cochrane were searched. We included observational and intervention studies using AI and PC EHRs and developing or testing a diagnostic prediction model for health conditions. Two independent reviewers (LH and AC) used a standardized data extraction form. Risk of bias and applicability were assessed using PROBAST (Prediction Model Risk of Bias Assessment Tool).</p><p><strong>Results: </strong>From 10,657 retrieved records, a total of 15 papers were selected. Most EHR papers focused on 1 chronic health care condition (n=11, 73%). From the 15 papers, 13 (87%) described a study that developed a diagnostic prediction model and 2 (13%) described a study that externally validated and tested the model in a PC setting. Studies used a variety of AI techniques. The predictors used to develop the model were all registered in the EHR. We found no papers with a low risk of bias, and high risk of bias was found in 9 (60%) papers. Biases covered an unjustified small sample size, not excluding predictors from the outcome definition, and the inappropriate evaluation of the performance measures. The risk of bias was unclear in 6 papers, as no information was provided on the handling of missing data and no results were reported from the multivariate analysis. Applicability was unclear in 10 (67%) papers, mainly due to lack of clarity in reporting the time interval between outcomes and predictors.</p><p><strong>Conclusions: </strong>Most AI-based diagnostic prediction models based on EHR data in PC focused on 1 chronic condition. Only 2 papers tested the model in a PC setting. The lack of sufficiently described methods led to a high risk of bias. Our findings highlight that the currently available diagnostic prediction models are not yet ready for clinical implementation in PC.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e62862"},"PeriodicalIF":3.8,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12373303/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ji-Hyun Kim, Eun Young Cho, Yuhyun Choi, Joo-Yun Won, Se Hee Cheon, Young Ae Kim, Ki-Byung Lee, Kwang Joon Kim, Ho Gwan Kim, Taeyong Sim
{"title":"Deep Learning-Based Early Warning Systems in Hospitalized Patients at Risk of Code Blue Events and Length of Stay: Retrospective Real-World Implementation Study.","authors":"Ji-Hyun Kim, Eun Young Cho, Yuhyun Choi, Joo-Yun Won, Se Hee Cheon, Young Ae Kim, Ki-Byung Lee, Kwang Joon Kim, Ho Gwan Kim, Taeyong Sim","doi":"10.2196/72232","DOIUrl":"https://doi.org/10.2196/72232","url":null,"abstract":"<p><strong>Background: </strong>In hospitals, Code Blue is an emergency that refers to a patient requiring immediate resuscitation. Over 85% of patients with cardiopulmonary arrest exhibit abnormal vital sign trends prior to the event. Continuous monitoring and accurate interpretation of clinical data through artificial intelligence (AI) models can contribute to preventing critical events.</p><p><strong>Objective: </strong>This study aims to evaluate changes in clinical outcomes following the use of VitalCare (Major Adverse Event Score and Mortality Score), which is an AI-based early warning system, and to validate the performance of the algorithm.</p><p><strong>Methods: </strong>A retrospective analysis was conducted by extracting electronic health record data, using a total of 30,785 inpatient cases from general wards and intensive care units. A comparative analysis was performed by setting a 3-month period before and after the system implementation. For clinical evaluation, we measured the incidence rates of Code Blue and adverse events, the proportion of prolonged hospitalization, and the frequency of early interventions. The area under the receiver operating characteristic curve (AUROC) was calculated to assess the performance of the algorithm.</p><p><strong>Results: </strong>This study demonstrated that, following the implementation of VitalCare, there was a 24.97% reduction in major events such as Code Blue (P=.004) and the proportion of prolonged hospitalization in general wards (P<.05), along with a significant increase in the rate of early interventions. The model performance exhibited superior outcomes compared with traditional scoring systems, with a Major Adverse Event Score AUROC of 0.865 (95% CI 0.857-0.873) and Mortality Score AUROC of 0.937 (95% CI 0.931-0.944).</p><p><strong>Conclusions: </strong>A well-developed AI-based model that provides high predictive power can contribute to the prevention of major in-hospital events by providing early predictive information to clinicians. Additionally, it plays a crucial role in effectively addressing unmet needs and challenges in terms of human resources and practical procedures.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e72232"},"PeriodicalIF":3.8,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12373407/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hangyu Sha, Fan Gong, Bo Liu, Runfeng Liu, Haofen Wang, Tianxing Wu
{"title":"Leveraging Retrieval-Augmented Large Language Models for Dietary Recommendations With Traditional Chinese Medicine's Medicine Food Homology: Algorithm Development and Validation.","authors":"Hangyu Sha, Fan Gong, Bo Liu, Runfeng Liu, Haofen Wang, Tianxing Wu","doi":"10.2196/75279","DOIUrl":"https://doi.org/10.2196/75279","url":null,"abstract":"<p><strong>Background: </strong>Traditional Chinese Medicine (TCM) emphasizes the concept of medicine food homology (MFH), which integrates dietary therapy into health care. However, the practical application of MFH principles relies heavily on expert knowledge and manual interpretation, posing challenges for automating MFH-based dietary recommendations. Although large language models (LLMs) have shown potential in health care decision support, their performance in specialized domains such as TCM is often hindered by hallucinations and a lack of domain knowledge. The integration of uncertain knowledge graphs (UKGs) with LLMs via retrieval-augmented generation (RAG) offers a promising solution to overcome these limitations by enabling a structured and faithful representation of MFH principles while enhancing LLMs' ability to understand the inherent uncertainty and heterogeneity of TCM knowledge. Consequently, it holds potential to improve the reliability and accuracy of MFH-based dietary recommendations generated by LLMs.</p><p><strong>Objective: </strong>This study aimed to introduce Yaoshi-RAG, a framework that leverages UKGs to enhance LLMs' capabilities in generating accurate and personalized MFH-based dietary recommendations.</p><p><strong>Methods: </strong>The proposed framework began by constructing a comprehensive MFH knowledge graph (KG) through LLM-driven open information extraction, which extracted structured knowledge from multiple sources. To address the incompleteness and uncertainty within the MFH KG, UKG reasoning was used to measure the confidence of existing triples and to complete missing triples. When processing user queries, query entities were identified and linked to the MFH KG, enabling retrieval of relevant reasoning paths. These reasoning paths were then ranked based on triple confidence scores and entity importance. Finally, the most informative reasoning paths were encoded into prompts using prompt engineering, enabling the LLM to generate personalized dietary recommendations that aligned with both individual health needs and MFH principles. The effectiveness of Yaoshi-RAG was evaluated through both automated metrics and human evaluation.</p><p><strong>Results: </strong>The constructed MFH KG comprised 24,984 entities, 22 relations, and 29,292 triples. Extensive experiments demonstrate the superiority of Yaoshi-RAG in different evaluation metrics. Integrating the MFH KG significantly improved the performance of LLMs, yielding an average increase of 14.5% in Hits@1 and 8.7% in F1-score, respectively. Among the evaluated LLMs, DeepSeek-R1 achieved the best performance, with 84.2% in Hits@1 and 71.5% in F1-score, respectively. Human evaluation further validated these results, confirming that Yaoshi-RAG consistently outperformed baseline models across all assessed quality dimensions.</p><p><strong>Conclusions: </strong>This study shows Yaoshi-RAG, a new framework that enhances LLMs' capabilities in generating MFH-based diet","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e75279"},"PeriodicalIF":3.8,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12370266/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Medical Science Data Value Evaluation Model: Mixed Methods Study.","authors":"Dandan Wang, Yaning Liu","doi":"10.2196/63544","DOIUrl":"https://doi.org/10.2196/63544","url":null,"abstract":"<p><strong>Background: </strong>Medical science data hold significant value, and open platforms play a crucial role in unlocking this potential. While relevant platforms are being developed, the overall usage of these data values remains limited.</p><p><strong>Objective: </strong>This study aims to propose a set of practical and effective data value evaluation processes and methods for medical science data open platforms, enabling them to manage and unlock the value of these data.</p><p><strong>Methods: </strong>Integrating the information system success model, technology acceptance model, and consumer perceived value theory, a set of medical science data value assessment index systems was developed by adopting the literature review and expert survey methods. Data from 10 domestic and international open platforms were collected and empirically analyzed using the entropy-weighted Technique for Order Preference by Similarity to Ideal Solution technique.</p><p><strong>Results: </strong>Based on the scores of each indicator, the intragroup correlation coefficient was calculated to be 0.489, indicating consistency in the evaluation. The highest information entropy values and weights determined using the entropy weighting method were the number of datasets (0.70, 17.68%), data timeliness (0.77, 13.44%), search comprehensiveness (0.78, 12.92%), and system responsiveness (0.80, 11.55%), respectively. Based on the weighted analysis, the platform with the highest overall score was the National Population Health Sciences Data Center, with a score of 62.32.</p><p><strong>Conclusions: </strong>The evaluation index system and model developed can be used not only to optimize the platform's data value evaluation processes, but also to enhance the platform's overall data value and encourage users to reuse data.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e63544"},"PeriodicalIF":3.8,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12369987/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}