Theresa Willem, Alessandro Wollek, Theodor Cheslerean-Boghiu, Martha Kenney, Alena Buyx
{"title":"The Social Construction of Categorical Data: Mixed Methods Approach to Assessing Data Features in Publicly Available Datasets.","authors":"Theresa Willem, Alessandro Wollek, Theodor Cheslerean-Boghiu, Martha Kenney, Alena Buyx","doi":"10.2196/59452","DOIUrl":"10.2196/59452","url":null,"abstract":"<p><strong>Background: </strong>In data-sparse areas such as health care, computer scientists aim to leverage as much available information as possible to increase the accuracy of their machine learning models' outputs. As a standard, categorical data, such as patients' gender, socioeconomic status, or skin color, are used to train models in fusion with other data types, such as medical images and text-based medical information. However, the effects of including categorical data features for model training in such data-scarce areas are underexamined, particularly regarding models intended to serve individuals equitably in a diverse population.</p><p><strong>Objective: </strong>This study aimed to explore categorical data's effects on machine learning model outputs, rooted the effects in the data collection and dataset publication processes, and proposed a mixed methods approach to examining datasets' data categories before using them for machine learning training.</p><p><strong>Methods: </strong>Against the theoretical background of the social construction of categories, we suggest a mixed methods approach to assess categorical data's utility for machine learning model training. As an example, we applied our approach to a Brazilian dermatological dataset (Dermatological and Surgical Assistance Program at the Federal University of Espírito Santo [PAD-UFES] 20). We first present an exploratory, quantitative study that assesses the effects when including or excluding each of the unique categorical data features of the PAD-UFES 20 dataset for training a transformer-based model using a data fusion algorithm. We then pair our quantitative analysis with a qualitative examination of the data categories based on interviews with the dataset authors.</p><p><strong>Results: </strong>Our quantitative study suggests scattered effects of including categorical data for machine learning model training across predictive classes. Our qualitative analysis gives insights into how the categorical data were collected and why they were published, explaining some of the quantitative effects that we observed. Our findings highlight the social constructedness of categorical data in publicly available datasets, meaning that the data in a category heavily depend on both how these categories are defined by the dataset creators and the sociomedico context in which the data are collected. This reveals relevant limitations of using publicly available datasets in contexts different from those of the collection of their data.</p><p><strong>Conclusions: </strong>We caution against using data features of publicly available datasets without reflection on the social construction and context dependency of their categorical data features, particularly in data-sparse areas. We conclude that social scientific, context-dependent analysis of available data features using both quantitative and qualitative methods is helpful in judging the utility of categorical data for the population for wh","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e59452"},"PeriodicalIF":3.1,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11815297/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143061594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Use of the FHTHWA Index as a Novel Approach for Predicting the Incidence of Diabetes in a Japanese Population Without Diabetes: Data Analysis Study.","authors":"Jiao Wang, Jianrong Chen, Ying Liu, Jixiong Xu","doi":"10.2196/64992","DOIUrl":"10.2196/64992","url":null,"abstract":"<p><strong>Background: </strong>Many tools have been developed to predict the risk of diabetes in a population without diabetes; however, these tools have shortcomings that include the omission of race, inclusion of variables that are not readily available to patients, and low sensitivity or specificity.</p><p><strong>Objective: </strong>We aimed to develop and validate an easy, systematic index for predicting diabetes risk in the Asian population.</p><p><strong>Methods: </strong>We collected the data from the NAGALA (NAfld [nonalcoholic fatty liver disease] in the Gifu Area, Longitudinal Analysis) database. The least absolute shrinkage and selection operator model was used to select potentially relevant features. Multiple Cox proportional hazard analysis was used to develop a model based on the training set.</p><p><strong>Results: </strong>The final study population of 15464 participants had a mean age of 42 (range 18-79) years; 54.5% (8430) were men. The mean follow-up duration was 6.05 (SD 3.78) years. A total of 373 (2.41%) participants showed progression to diabetes during the follow-up period. Then, we established a novel parameter (the FHTHWA index), to evaluate the incidence of diabetes in a population without diabetes, comprising 6 parameters based on the training set. After multivariable adjustment, individuals in tertile 3 had a significantly higher rate of diabetes compared with those in tertile 1 (hazard ratio 32.141, 95% CI 11.545-89.476). Time receiver operating characteristic curve analyses showed that the FHTHWA index had high accuracy, with the area under the curve value being around 0.9 during the more than 12 years of follow-up.</p><p><strong>Conclusions: </strong>This research successfully developed a diabetes risk assessment index tailored for the Japanese population by utilizing an extensive dataset and a wide range of indices. By categorizing the diabetes risk levels among Japanese individuals, this study offers a novel predictive tool for identifying potential patients, while also delivering valuable insights into diabetes prevention strategies for the healthy Japanese populace.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e64992"},"PeriodicalIF":3.1,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11793195/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143069867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Doris Yang, Doudou Zhou, Steven Cai, Ziming Gan, Michael Pencina, Paul Avillach, Tianxi Cai, Chuan Hong
{"title":"Robust Automated Harmonization of Heterogeneous Data Through Ensemble Machine Learning: Algorithm Development and Validation Study.","authors":"Doris Yang, Doudou Zhou, Steven Cai, Ziming Gan, Michael Pencina, Paul Avillach, Tianxi Cai, Chuan Hong","doi":"10.2196/54133","DOIUrl":"10.2196/54133","url":null,"abstract":"<p><strong>Background: </strong>Cohort studies contain rich clinical data across large and diverse patient populations and are a common source of observational data for clinical research. Because large scale cohort studies are both time and resource intensive, one alternative is to harmonize data from existing cohorts through multicohort studies. However, given differences in variable encoding, accurate variable harmonization is difficult.</p><p><strong>Objective: </strong>We propose SONAR (Semantic and Distribution-Based Harmonization) as a method for harmonizing variables across cohort studies to facilitate multicohort studies.</p><p><strong>Methods: </strong>SONAR used semantic learning from variable descriptions and distribution learning from study participant data. Our method learned an embedding vector for each variable and used pairwise cosine similarity to score the similarity between variables. This approach was built off 3 National Institutes of Health cohorts, including the Cardiovascular Health Study, the Multi-Ethnic Study of Atherosclerosis, and the Women's Health Initiative. We also used gold standard labels to further refine the embeddings in a supervised manner.</p><p><strong>Results: </strong>The method was evaluated using manually curated gold standard labels from the 3 National Institutes of Health cohorts. We evaluated both the intracohort and intercohort variable harmonization performance. The supervised SONAR method outperformed existing benchmark methods for almost all intracohort and intercohort comparisons using area under the curve and top-k accuracy metrics. Notably, SONAR was able to significantly improve harmonization of concepts that were difficult for existing semantic methods to harmonize.</p><p><strong>Conclusions: </strong>SONAR achieves accurate variable harmonization within and between cohort studies by harnessing the complementary strengths of semantic learning and variable distribution learning.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e54133"},"PeriodicalIF":3.1,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11778729/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143026011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Impact of Data Control and Delayed Discounting on the Public's Willingness to Share Different Types of Health Care Data: Empirical Study.","authors":"Dongle Wei, Pan Gao, Yunkai Zhai","doi":"10.2196/66444","DOIUrl":"10.2196/66444","url":null,"abstract":"<p><strong>Background: </strong>Health data typically include patient-generated data and clinical medical data. Different types of data contribute to disease prevention, precision medicine, and the overall improvement of health care. With the introduction of regulations such as the Health Insurance Portability and Accountability Act (HIPAA), individuals play a key role in the sharing and application of personal health data.</p><p><strong>Objective: </strong>This study aims to explore the impact of different types of health data on users' willingness to share. Additionally, it analyzes the effect of data control and delay discounting rate on this process.</p><p><strong>Methods: </strong>The results of a web-based survey were analyzed to examine individuals' perceptions of sharing different types of health data and how data control and delay discounting rates influenced their decisions. We recruited participants for our study through the web-based platform \"Wenjuanxing.\" After screening, we obtained 257 valid responses. Regression analysis was used to investigate the impact of data control, delayed discounting, and mental accounting on the public's willingness to share different types of health care data.</p><p><strong>Results: </strong>Our findings indicate that the type of health data does not significantly affect the perceived benefits of data sharing. Instead, it negatively influences willingness to share by indirectly affecting data acquisition costs and perceived risks. Our results also show that data control reduces the perceived risks associated with sharing, while higher delay discounting rates lead to an overestimation of data acquisition costs and perceived risks.</p><p><strong>Conclusions: </strong>Individuals' willingness to share data is primarily influenced by costs. To promote the acquisition and development of personal health data, stakeholders should strengthen individuals' control over their data or provide direct short-term incentives.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e66444"},"PeriodicalIF":3.1,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11778728/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicholas C Cardamone, Mark Olfson, Timothy Schmutte, Lyle Ungar, Tony Liu, Sara W Cullen, Nathaniel J Williams, Steven C Marcus
{"title":"Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study.","authors":"Nicholas C Cardamone, Mark Olfson, Timothy Schmutte, Lyle Ungar, Tony Liu, Sara W Cullen, Nathaniel J Williams, Steven C Marcus","doi":"10.2196/65454","DOIUrl":"10.2196/65454","url":null,"abstract":"<p><strong>Background: </strong>Prediction models have demonstrated a range of applications across medicine, including using electronic health record (EHR) data to identify hospital readmission and mortality risk. Large language models (LLMs) can transform unstructured EHR text into structured features, which can then be integrated into statistical prediction models, ensuring that the results are both clinically meaningful and interpretable.</p><p><strong>Objective: </strong>This study aims to compare the classification decisions made by clinical experts with those generated by a state-of-the-art LLM, using terms extracted from a large EHR data set of individuals with mental health disorders seen in emergency departments (EDs).</p><p><strong>Methods: </strong>Using a dataset from the EHR systems of more than 50 health care provider organizations in the United States from 2016 to 2021, we extracted all clinical terms that appeared in at least 1000 records of individuals admitted to the ED for a mental health-related problem from a source population of over 6 million ED episodes. Two experienced mental health clinicians (one medically trained psychiatrist and one clinical psychologist) reached consensus on the classification of EHR terms and diagnostic codes into categories. We evaluated an LLM's agreement with clinical judgment across three classification tasks as follows: (1) classify terms into \"mental health\" or \"physical health\", (2) classify mental health terms into 1 of 42 prespecified categories, and (3) classify physical health terms into 1 of 19 prespecified broad categories.</p><p><strong>Results: </strong>There was high agreement between the LLM and clinical experts when categorizing 4553 terms as \"mental health\" or \"physical health\" (κ=0.77, 95% CI 0.75-0.80). However, there was still considerable variability in LLM-clinician agreement on the classification of mental health terms (κ=0.62, 95% CI 0.59-0.66) and physical health terms (κ=0.69, 95% CI 0.67-0.70).</p><p><strong>Conclusions: </strong>The LLM displayed high agreement with clinical experts when classifying EHR terms into certain mental health or physical health term categories. However, agreement with clinical experts varied considerably within both sets of mental and physical health term categories. Importantly, the use of LLMs presents an alternative to manual human coding, presenting great potential to create interpretable features for prediction models.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e65454"},"PeriodicalIF":3.1,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11884378/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aoyu Li, Jingwen Li, Yishan Hu, Yan Geng, Yan Qiang, Juanjuan Zhao
{"title":"A Dynamic Adaptive Ensemble Learning Framework for Noninvasive Mild Cognitive Impairment Detection: Development and Validation Study.","authors":"Aoyu Li, Jingwen Li, Yishan Hu, Yan Geng, Yan Qiang, Juanjuan Zhao","doi":"10.2196/60250","DOIUrl":"10.2196/60250","url":null,"abstract":"<p><strong>Background: </strong>The prompt and accurate identification of mild cognitive impairment (MCI) is crucial for preventing its progression into more severe neurodegenerative diseases. However, current diagnostic solutions, such as biomarkers and cognitive screening tests, prove costly, time-consuming, and invasive, hindering patient compliance and the accessibility of these tests. Therefore, exploring a more cost-effective, efficient, and noninvasive method to aid clinicians in detecting MCI is necessary.</p><p><strong>Objective: </strong>This study aims to develop an ensemble learning framework that adaptively integrates multimodal physiological data collected from wearable wristbands and digital cognitive metrics recorded on tablets, thereby improving the accuracy and practicality of MCI detection.</p><p><strong>Methods: </strong>We recruited 843 participants aged 60 years and older from the geriatrics and neurology departments of our collaborating hospitals, who were randomly divided into a development dataset (674/843 participants) and an internal test dataset (169/843 participants) at a 4:1 ratio. In addition, 226 older adults were recruited from 3 external centers to form an external test dataset. We measured their physiological signals (eg, electrodermal activity and photoplethysmography) and digital cognitive parameters (eg, reaction time and test scores) using the clinically certified Empatica 4 wristband and a tablet cognitive screening tool. The collected data underwent rigorous preprocessing, during which features in the time, frequency, and nonlinear domains were extracted from individual physiological signals. To address the challenges (eg, the curse of dimensionality and increased model complexity) posed by high-dimensional features, we developed a dynamic adaptive feature selection optimization algorithm to identify the most impactful subset of features for classification performance. Finally, the accuracy and efficiency of the classification model were improved by optimizing the combination of base learners.</p><p><strong>Results: </strong>The experimental results indicate that the proposed MCI detection framework achieved classification accuracies of 88.4%, 85.5%, and 84.5% on the development, internal test, and external test datasets, respectively. The area under the curve for the binary classification task was 0.945 (95% CI 0.903-0.986), 0.912 (95% CI 0.859-0.965), and 0.904 (95% CI 0.846-0.962) on these datasets. Furthermore, a statistical analysis of feature subsets during the iterative modeling process revealed that the decay time of skin conductance response, the percentage of continuous normal-to-normal intervals exceeding 50 milliseconds, the ratio of low-frequency to high-frequency (LF/HF) components in heart rate variability, and cognitive time features emerged as the most prevalent and effective indicators. Specifically, compared with healthy individuals, patients with MCI exhibited a longer skin conductance ","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e60250"},"PeriodicalIF":3.1,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11791443/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143017039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital Health Innovations to Catalyze the Transition to Value-Based Health Care.","authors":"Lan Zhang, Christopher Bullen, Jinsong Chen","doi":"10.2196/57385","DOIUrl":"10.2196/57385","url":null,"abstract":"<p><strong>Unlabelled: </strong>The health care industry is currently going through a transformation due to the integration of technologies and the shift toward value-based health care (VBHC). This article explores how digital health solutions play a role in advancing VBHC, highlighting both the challenges and opportunities associated with adopting these technologies. Digital health, which includes mobile health, wearable devices, telehealth, and personalized medicine, shows promise in improving diagnostic accuracy, treatment options, and overall health outcomes. The article delves into the concept of transformation in health care by emphasizing its potential to reform care delivery through data communication, patient engagement, and operational efficiency. Moreover, it examines the principles of VBHC, with a focus on patient outcomes, and emphasizes how digital platforms play a role in treatment among tertiary hospitals by using patient-reported outcome measures. The article discusses challenges that come with implementing VBHC, such as stakeholder engagement and standardization of patient-reported outcome measures. It also highlights the role played by health innovators in facilitating the transition toward VBHC models. Through real-life case examples, this article illustrates how digital platforms have had an impact on efficiencies, patient outcomes, and empowerment. In conclusion, it envisions directions for solutions in VBHC by emphasizing the need for interoperability, standardization, and collaborative efforts among stakeholders to fully realize the potential of digital transformation in health care. This research highlights the impact of digital health in creating a health care system that focuses on providing high-quality, efficient, and patient-centered care.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e57385"},"PeriodicalIF":3.1,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11769777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interpretable Machine Learning Model for Predicting Postpartum Depression: Retrospective Study.","authors":"Ren Zhang, Yi Liu, Zhiwei Zhang, Rui Luo, Bin Lv","doi":"10.2196/58649","DOIUrl":"10.2196/58649","url":null,"abstract":"<p><strong>Background: </strong>Postpartum depression (PPD) is a prevalent mental health issue with significant impacts on mothers and families. Exploring reliable predictors is crucial for the early and accurate prediction of PPD, which remains challenging.</p><p><strong>Objective: </strong>This study aimed to comprehensively collect variables from multiple aspects, develop and validate machine learning models to achieve precise prediction of PPD, and interpret the model to reveal clinical implications.</p><p><strong>Methods: </strong>This study recruited pregnant women who delivered at the West China Second University Hospital, Sichuan University. Various variables were collected from electronic medical record data and screened using least absolute shrinkage and selection operator penalty regression. Participants were divided into training (1358/2055, 66.1%) and validation (697/2055, 33.9%) sets by random sampling. Machine learning-based predictive models were developed in the training cohort. Models were validated in the validation cohort with receiver operating curve and decision curve analysis. Multiple model interpretation methods were implemented to explain the optimal model.</p><p><strong>Results: </strong>We recruited 2055 participants in this study. The extreme gradient boosting model was the optimal predictive model with the area under the receiver operating curve of 0.849. Shapley Additive Explanation indicated that the most influential predictors of PPD were antepartum depression, lower fetal weight, elevated thyroid-stimulating hormone, declined thyroid peroxidase antibodies, elevated serum ferritin, and older age.</p><p><strong>Conclusions: </strong>This study developed and validated a machine learning-based predictive model for PPD. Several significant risk factors and how they impact the prediction of PPD were revealed. These findings provide new insights into the early screening of individuals with high risk for PPD, emphasizing the need for comprehensive screening approaches that include both physiological and psychological factors.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e58649"},"PeriodicalIF":3.1,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11769778/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elizabeth Joyce, James McMullen, Xiaowen Kong, Connor O'Hare, Valerie Gavrila, Anthony Cuttitta, Geoffrey D Barnes, Colin F Greineder
{"title":"Performance of an Electronic Health Record-Based Automated Pulmonary Embolism Severity Index Score Calculator: Cohort Study in the Emergency Department.","authors":"Elizabeth Joyce, James McMullen, Xiaowen Kong, Connor O'Hare, Valerie Gavrila, Anthony Cuttitta, Geoffrey D Barnes, Colin F Greineder","doi":"10.2196/58800","DOIUrl":"10.2196/58800","url":null,"abstract":"<p><strong>Background: </strong>Studies suggest that less than 4% of patients with pulmonary embolisms (PEs) are managed in the outpatient setting. Strong evidence and multiple guidelines support the use of the Pulmonary Embolism Severity Index (PESI) for the identification of acute PE patients appropriate for outpatient management. However, calculating the PESI score can be inconvenient in a busy emergency department (ED). To facilitate integration into ED workflow, we created a 2023 Epic-compatible clinical decision support tool that automatically calculates the PESI score in real-time with patients' electronic health data (ePESI [Electronic Pulmonary Embolism Severity Index]).</p><p><strong>Objective: </strong>The primary objectives of this study were to determine the overall accuracy of ePESI and its ability to correctly distinguish high- and low-risk PESI scores within the Epic 2023 software. The secondary objective was to identify variables that impact ePESI accuracy.</p><p><strong>Methods: </strong>We collected ePESI scores on 500 consecutive patients at least 18 years old who underwent a computerized tomography-pulmonary embolism scan in the ED of our tertiary, academic health center between January 3 and February 15, 2023. We compared ePESI results to a PESI score calculated by 2 independent, medically-trained abstractors blinded to the ePESI and each other's results. ePESI accuracy was calculated with binomial test. The odds ratio (OR) was calculated using logistic regression.</p><p><strong>Results: </strong>Of the 500 patients, a total of 203 (40.6%) and 297 (59.4%) patients had low- and high-risk PESI scores, respectively. The ePESI exactly matched the calculated PESI in 394 out of 500 cases, with an accuracy of 78.8% (95% CI 74.9%-82.3%), and correctly identified low- versus high-risk in 477 out of 500 (95.4%) cases. The accuracy of the ePESI was higher for low-risk scores (OR 2.96, P<.001) and lower when patients were without prior encounters in the health system (OR 0.42, P=.008).</p><p><strong>Conclusions: </strong>In this single-center study, the ePESI was highly accurate in discriminating between low- and high-risk scores. The clinical decision support should facilitate real-time identification of patients who may be candidates for outpatient PE management.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e58800"},"PeriodicalIF":3.1,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11769779/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taehwan Kim, Jung-Yeon Choi, Myung Jin Ko, Kwang-Il Kim
{"title":"Development and Validation of a Machine Learning Method Using Vocal Biomarkers for Identifying Frailty in Community-Dwelling Older Adults: Cross-Sectional Study.","authors":"Taehwan Kim, Jung-Yeon Choi, Myung Jin Ko, Kwang-Il Kim","doi":"10.2196/57298","DOIUrl":"10.2196/57298","url":null,"abstract":"<p><strong>Background: </strong>The two most commonly used methods to identify frailty are the frailty phenotype and the frailty index. However, both methods have limitations in clinical application. In addition, methods for measuring frailty have not yet been standardized.</p><p><strong>Objective: </strong>We aimed to develop and validate a classification model for predicting frailty status using vocal biomarkers in community-dwelling older adults, based on voice recordings obtained from the picture description task (PDT).</p><p><strong>Methods: </strong>We recruited 127 participants aged 50 years and older and collected clinical information through a short form of the Comprehensive Geriatric Assessment scale. Voice recordings were collected with a tablet device during the Korean version of the PDT, and we preprocessed audio data to remove background noise before feature extraction. Three artificial intelligence (AI) models were developed for identifying frailty status: SpeechAI (using speech data only), DemoAI (using demographic data only), and DemoSpeechAI (combining both data types).</p><p><strong>Results: </strong>Our models were trained and evaluated on the basis of 5-fold cross-validation for 127 participants and compared. The SpeechAI model, using deep learning-based acoustic features, outperformed in terms of accuracy and area under the receiver operating characteristic curve (AUC), 80.4% (95% CI 76.89%-83.91%) and 0.89 (95% CI 0.86-0.92), respectively, while the model using only demographics showed an accuracy of 67.96% (95% CI 67.63%-68.29%) and an AUC of 0.74 (95% CI 0.73-0.75). The SpeechAI model outperformed the model using only demographics significantly in AUC (t4=8.705 [2-sided]; P<.001). The DemoSpeechAI model, which combined demographics with deep learning-based acoustic features, showed superior performance (accuracy 85.6%, 95% CI 80.03%-91.17% and AUC 0.93, 95% CI 0.89-0.97), but there was no significant difference in AUC between the SpeechAI and DemoSpeechAI models (t4=1.057 [2-sided]; P=.35). Compared with models using traditional acoustic features from the openSMILE toolkit, the SpeechAI model demonstrated superior performance (AUC 0.89) over traditional methods (logistic regression: AUC 0.62; decision tree: AUC 0.57; random forest: AUC 0.66).</p><p><strong>Conclusions: </strong>Our findings demonstrate that vocal biomarkers derived from deep learning-based acoustic features can be effectively used to predict frailty status in community-dwelling older adults. The SpeechAI model showed promising accuracy and AUC, outperforming models based solely on demographic data or traditional acoustic features. Furthermore, while the combined DemoSpeechAI model showed slightly improved performance over the SpeechAI model, the difference was not statistically significant. These results suggest that speech-based AI models offer a noninvasive, scalable method for frailty detection, potentially streamlining assessments in clinical and comm","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e57298"},"PeriodicalIF":3.1,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11756832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143016957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}