JMIR Medical Informatics最新文献

筛选
英文 中文
The Impact of Data Control and Delayed Discounting on the Public's Willingness to Share Different Types of Health Care Data: Empirical Study.
IF 3.1 3区 医学
JMIR Medical Informatics Pub Date : 2025-01-22 DOI: 10.2196/66444
Dongle Wei, Pan Gao, Yunkai Zhai
{"title":"The Impact of Data Control and Delayed Discounting on the Public's Willingness to Share Different Types of Health Care Data: Empirical Study.","authors":"Dongle Wei, Pan Gao, Yunkai Zhai","doi":"10.2196/66444","DOIUrl":"10.2196/66444","url":null,"abstract":"<p><strong>Background: </strong>Health data typically include patient-generated data and clinical medical data. Different types of data contribute to disease prevention, precision medicine, and the overall improvement of health care. With the introduction of regulations such as the Health Insurance Portability and Accountability Act (HIPAA), individuals play a key role in the sharing and application of personal health data.</p><p><strong>Objective: </strong>This study aims to explore the impact of different types of health data on users' willingness to share. Additionally, it analyzes the effect of data control and delay discounting rate on this process.</p><p><strong>Methods: </strong>The results of a web-based survey were analyzed to examine individuals' perceptions of sharing different types of health data and how data control and delay discounting rates influenced their decisions. We recruited participants for our study through the web-based platform \"Wenjuanxing.\" After screening, we obtained 257 valid responses. Regression analysis was used to investigate the impact of data control, delayed discounting, and mental accounting on the public's willingness to share different types of health care data.</p><p><strong>Results: </strong>Our findings indicate that the type of health data does not significantly affect the perceived benefits of data sharing. Instead, it negatively influences willingness to share by indirectly affecting data acquisition costs and perceived risks. Our results also show that data control reduces the perceived risks associated with sharing, while higher delay discounting rates lead to an overestimation of data acquisition costs and perceived risks.</p><p><strong>Conclusions: </strong>Individuals' willingness to share data is primarily influenced by costs. To promote the acquisition and development of personal health data, stakeholders should strengthen individuals' control over their data or provide direct short-term incentives.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e66444"},"PeriodicalIF":3.1,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11778728/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study.
IF 3.1 3区 医学
JMIR Medical Informatics Pub Date : 2025-01-21 DOI: 10.2196/65454
Nicholas C Cardamone, Mark Olfson, Timothy Schmutte, Lyle Ungar, Tony Liu, Sara W Cullen, Nathaniel J Williams, Steven C Marcus
{"title":"Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study.","authors":"Nicholas C Cardamone, Mark Olfson, Timothy Schmutte, Lyle Ungar, Tony Liu, Sara W Cullen, Nathaniel J Williams, Steven C Marcus","doi":"10.2196/65454","DOIUrl":"https://doi.org/10.2196/65454","url":null,"abstract":"<p><strong>Background: </strong>Prediction models have demonstrated a range of applications across medicine, including using electronic health record (EHR) data to identify hospital readmission and mortality risk. Large language models (LLMs) can transform unstructured EHR text into structured features, which can then be integrated into statistical prediction models, ensuring that the results are both clinically meaningful and interpretable.</p><p><strong>Objective: </strong>This study aims to compare the classification decisions made by clinical experts with those generated by a state-of-the-art LLM, using terms extracted from a large EHR data set of individuals with mental health disorders seen in emergency departments (EDs).</p><p><strong>Methods: </strong>Using a dataset from the EHR systems of more than 50 health care provider organizations in the United States from 2016 to 2021, we extracted all clinical terms that appeared in at least 1000 records of individuals admitted to the ED for a mental health-related problem from a source population of over 6 million ED episodes. Two experienced mental health clinicians (one medically trained psychiatrist and one clinical psychologist) reached consensus on the classification of EHR terms and diagnostic codes into categories. We evaluated an LLM's agreement with clinical judgment across three classification tasks as follows: (1) classify terms into \"mental health\" or \"physical health\", (2) classify mental health terms into 1 of 42 prespecified categories, and (3) classify physical health terms into 1 of 19 prespecified broad categories.</p><p><strong>Results: </strong>There was high agreement between the LLM and clinical experts when categorizing 4553 terms as \"mental health\" or \"physical health\" (κ=0.77, 95% CI 0.75-0.80). However, there was still considerable variability in LLM-clinician agreement on the classification of mental health terms (κ=0.62, 95% CI 0.59-0.66) and physical health terms (κ=0.69, 95% CI 0.67-0.70).</p><p><strong>Conclusions: </strong>The LLM displayed high agreement with clinical experts when classifying EHR terms into certain mental health or physical health term categories. However, agreement with clinical experts varied considerably within both sets of mental and physical health term categories. Importantly, the use of LLMs presents an alternative to manual human coding, presenting great potential to create interpretable features for prediction models.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e65454"},"PeriodicalIF":3.1,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Dynamic Adaptive Ensemble Learning Framework for Noninvasive Mild Cognitive Impairment Detection: Development and Validation Study. 无创轻度认知障碍检测的动态自适应集成学习框架:开发与验证研究。
IF 3.1 3区 医学
JMIR Medical Informatics Pub Date : 2025-01-20 DOI: 10.2196/60250
Aoyu Li, Jingwen Li, Yishan Hu, Yan Geng, Yan Qiang, Juanjuan Zhao
{"title":"A Dynamic Adaptive Ensemble Learning Framework for Noninvasive Mild Cognitive Impairment Detection: Development and Validation Study.","authors":"Aoyu Li, Jingwen Li, Yishan Hu, Yan Geng, Yan Qiang, Juanjuan Zhao","doi":"10.2196/60250","DOIUrl":"https://doi.org/10.2196/60250","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;The prompt and accurate identification of mild cognitive impairment (MCI) is crucial for preventing its progression into more severe neurodegenerative diseases. However, current diagnostic solutions, such as biomarkers and cognitive screening tests, prove costly, time-consuming, and invasive, hindering patient compliance and the accessibility of these tests. Therefore, exploring a more cost-effective, efficient, and noninvasive method to aid clinicians in detecting MCI is necessary.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aims to develop an ensemble learning framework that adaptively integrates multimodal physiological data collected from wearable wristbands and digital cognitive metrics recorded on tablets, thereby improving the accuracy and practicality of MCI detection.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We recruited 843 participants aged 60 years and older from the geriatrics and neurology departments of our collaborating hospitals, who were randomly divided into a development dataset (674/843 participants) and an internal test dataset (169/843 participants) at a 4:1 ratio. In addition, 226 older adults were recruited from 3 external centers to form an external test dataset. We measured their physiological signals (eg, electrodermal activity and photoplethysmography) and digital cognitive parameters (eg, reaction time and test scores) using the clinically certified Empatica 4 wristband and a tablet cognitive screening tool. The collected data underwent rigorous preprocessing, during which features in the time, frequency, and nonlinear domains were extracted from individual physiological signals. To address the challenges (eg, the curse of dimensionality and increased model complexity) posed by high-dimensional features, we developed a dynamic adaptive feature selection optimization algorithm to identify the most impactful subset of features for classification performance. Finally, the accuracy and efficiency of the classification model were improved by optimizing the combination of base learners.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;The experimental results indicate that the proposed MCI detection framework achieved classification accuracies of 88.4%, 85.5%, and 84.5% on the development, internal test, and external test datasets, respectively. The area under the curve for the binary classification task was 0.945 (95% CI 0.903-0.986), 0.912 (95% CI 0.859-0.965), and 0.904 (95% CI 0.846-0.962) on these datasets. Furthermore, a statistical analysis of feature subsets during the iterative modeling process revealed that the decay time of skin conductance response, the percentage of continuous normal-to-normal intervals exceeding 50 milliseconds, the ratio of low-frequency to high-frequency (LF/HF) components in heart rate variability, and cognitive time features emerged as the most prevalent and effective indicators. Specifically, compared with healthy individuals, patients with MCI exhibited a longer skin conductance ","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e60250"},"PeriodicalIF":3.1,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143017039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital Health Innovations to Catalyze the Transition to Value-Based Health Care.
IF 3.1 3区 医学
JMIR Medical Informatics Pub Date : 2025-01-20 DOI: 10.2196/57385
Lan Zhang, Christopher Bullen, Jinsong Chen
{"title":"Digital Health Innovations to Catalyze the Transition to Value-Based Health Care.","authors":"Lan Zhang, Christopher Bullen, Jinsong Chen","doi":"10.2196/57385","DOIUrl":"10.2196/57385","url":null,"abstract":"<p><strong>Unlabelled: </strong>The health care industry is currently going through a transformation due to the integration of technologies and the shift toward value-based health care (VBHC). This article explores how digital health solutions play a role in advancing VBHC, highlighting both the challenges and opportunities associated with adopting these technologies. Digital health, which includes mobile health, wearable devices, telehealth, and personalized medicine, shows promise in improving diagnostic accuracy, treatment options, and overall health outcomes. The article delves into the concept of transformation in health care by emphasizing its potential to reform care delivery through data communication, patient engagement, and operational efficiency. Moreover, it examines the principles of VBHC, with a focus on patient outcomes, and emphasizes how digital platforms play a role in treatment among tertiary hospitals by using patient-reported outcome measures. The article discusses challenges that come with implementing VBHC, such as stakeholder engagement and standardization of patient-reported outcome measures. It also highlights the role played by health innovators in facilitating the transition toward VBHC models. Through real-life case examples, this article illustrates how digital platforms have had an impact on efficiencies, patient outcomes, and empowerment. In conclusion, it envisions directions for solutions in VBHC by emphasizing the need for interoperability, standardization, and collaborative efforts among stakeholders to fully realize the potential of digital transformation in health care. This research highlights the impact of digital health in creating a health care system that focuses on providing high-quality, efficient, and patient-centered care.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e57385"},"PeriodicalIF":3.1,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11769777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable Machine Learning Model for Predicting Postpartum Depression: Retrospective Study.
IF 3.1 3区 医学
JMIR Medical Informatics Pub Date : 2025-01-20 DOI: 10.2196/58649
Ren Zhang, Yi Liu, Zhiwei Zhang, Rui Luo, Bin Lv
{"title":"Interpretable Machine Learning Model for Predicting Postpartum Depression: Retrospective Study.","authors":"Ren Zhang, Yi Liu, Zhiwei Zhang, Rui Luo, Bin Lv","doi":"10.2196/58649","DOIUrl":"10.2196/58649","url":null,"abstract":"<p><strong>Background: </strong>Postpartum depression (PPD) is a prevalent mental health issue with significant impacts on mothers and families. Exploring reliable predictors is crucial for the early and accurate prediction of PPD, which remains challenging.</p><p><strong>Objective: </strong>This study aimed to comprehensively collect variables from multiple aspects, develop and validate machine learning models to achieve precise prediction of PPD, and interpret the model to reveal clinical implications.</p><p><strong>Methods: </strong>This study recruited pregnant women who delivered at the West China Second University Hospital, Sichuan University. Various variables were collected from electronic medical record data and screened using least absolute shrinkage and selection operator penalty regression. Participants were divided into training (1358/2055, 66.1%) and validation (697/2055, 33.9%) sets by random sampling. Machine learning-based predictive models were developed in the training cohort. Models were validated in the validation cohort with receiver operating curve and decision curve analysis. Multiple model interpretation methods were implemented to explain the optimal model.</p><p><strong>Results: </strong>We recruited 2055 participants in this study. The extreme gradient boosting model was the optimal predictive model with the area under the receiver operating curve of 0.849. Shapley Additive Explanation indicated that the most influential predictors of PPD were antepartum depression, lower fetal weight, elevated thyroid-stimulating hormone, declined thyroid peroxidase antibodies, elevated serum ferritin, and older age.</p><p><strong>Conclusions: </strong>This study developed and validated a machine learning-based predictive model for PPD. Several significant risk factors and how they impact the prediction of PPD were revealed. These findings provide new insights into the early screening of individuals with high risk for PPD, emphasizing the need for comprehensive screening approaches that include both physiological and psychological factors.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e58649"},"PeriodicalIF":3.1,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11769778/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance of an Electronic Health Record-Based Automated Pulmonary Embolism Severity Index Score Calculator: Cohort Study in the Emergency Department.
IF 3.1 3区 医学
JMIR Medical Informatics Pub Date : 2025-01-20 DOI: 10.2196/58800
Elizabeth Joyce, James McMullen, Xiaowen Kong, Connor O'Hare, Valerie Gavrila, Anthony Cuttitta, Geoffrey D Barnes, Colin F Greineder
{"title":"Performance of an Electronic Health Record-Based Automated Pulmonary Embolism Severity Index Score Calculator: Cohort Study in the Emergency Department.","authors":"Elizabeth Joyce, James McMullen, Xiaowen Kong, Connor O'Hare, Valerie Gavrila, Anthony Cuttitta, Geoffrey D Barnes, Colin F Greineder","doi":"10.2196/58800","DOIUrl":"10.2196/58800","url":null,"abstract":"<p><strong>Background: </strong>Studies suggest that less than 4% of patients with pulmonary embolisms (PEs) are managed in the outpatient setting. Strong evidence and multiple guidelines support the use of the Pulmonary Embolism Severity Index (PESI) for the identification of acute PE patients appropriate for outpatient management. However, calculating the PESI score can be inconvenient in a busy emergency department (ED). To facilitate integration into ED workflow, we created a 2023 Epic-compatible clinical decision support tool that automatically calculates the PESI score in real-time with patients' electronic health data (ePESI [Electronic Pulmonary Embolism Severity Index]).</p><p><strong>Objective: </strong>The primary objectives of this study were to determine the overall accuracy of ePESI and its ability to correctly distinguish high- and low-risk PESI scores within the Epic 2023 software. The secondary objective was to identify variables that impact ePESI accuracy.</p><p><strong>Methods: </strong>We collected ePESI scores on 500 consecutive patients at least 18 years old who underwent a computerized tomography-pulmonary embolism scan in the ED of our tertiary, academic health center between January 3 and February 15, 2023. We compared ePESI results to a PESI score calculated by 2 independent, medically-trained abstractors blinded to the ePESI and each other's results. ePESI accuracy was calculated with binomial test. The odds ratio (OR) was calculated using logistic regression.</p><p><strong>Results: </strong>Of the 500 patients, a total of 203 (40.6%) and 297 (59.4%) patients had low- and high-risk PESI scores, respectively. The ePESI exactly matched the calculated PESI in 394 out of 500 cases, with an accuracy of 78.8% (95% CI 74.9%-82.3%), and correctly identified low- versus high-risk in 477 out of 500 (95.4%) cases. The accuracy of the ePESI was higher for low-risk scores (OR 2.96, P<.001) and lower when patients were without prior encounters in the health system (OR 0.42, P=.008).</p><p><strong>Conclusions: </strong>In this single-center study, the ePESI was highly accurate in discriminating between low- and high-risk scores. The clinical decision support should facilitate real-time identification of patients who may be candidates for outpatient PE management.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e58800"},"PeriodicalIF":3.1,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11769779/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and Validation of a Machine Learning Method Using Vocal Biomarkers for Identifying Frailty in Community-Dwelling Older Adults: Cross-Sectional Study. 使用声音生物标志物识别社区老年人虚弱的机器学习方法的开发和验证:横断面研究。
IF 3.1 3区 医学
JMIR Medical Informatics Pub Date : 2025-01-16 DOI: 10.2196/57298
Taehwan Kim, Jung-Yeon Choi, Myung Jin Ko, Kwang-Il Kim
{"title":"Development and Validation of a Machine Learning Method Using Vocal Biomarkers for Identifying Frailty in Community-Dwelling Older Adults: Cross-Sectional Study.","authors":"Taehwan Kim, Jung-Yeon Choi, Myung Jin Ko, Kwang-Il Kim","doi":"10.2196/57298","DOIUrl":"10.2196/57298","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;The two most commonly used methods to identify frailty are the frailty phenotype and the frailty index. However, both methods have limitations in clinical application. In addition, methods for measuring frailty have not yet been standardized.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;We aimed to develop and validate a classification model for predicting frailty status using vocal biomarkers in community-dwelling older adults, based on voice recordings obtained from the picture description task (PDT).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We recruited 127 participants aged 50 years and older and collected clinical information through a short form of the Comprehensive Geriatric Assessment scale. Voice recordings were collected with a tablet device during the Korean version of the PDT, and we preprocessed audio data to remove background noise before feature extraction. Three artificial intelligence (AI) models were developed for identifying frailty status: SpeechAI (using speech data only), DemoAI (using demographic data only), and DemoSpeechAI (combining both data types).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Our models were trained and evaluated on the basis of 5-fold cross-validation for 127 participants and compared. The SpeechAI model, using deep learning-based acoustic features, outperformed in terms of accuracy and area under the receiver operating characteristic curve (AUC), 80.4% (95% CI 76.89%-83.91%) and 0.89 (95% CI 0.86-0.92), respectively, while the model using only demographics showed an accuracy of 67.96% (95% CI 67.63%-68.29%) and an AUC of 0.74 (95% CI 0.73-0.75). The SpeechAI model outperformed the model using only demographics significantly in AUC (t4=8.705 [2-sided]; P&lt;.001). The DemoSpeechAI model, which combined demographics with deep learning-based acoustic features, showed superior performance (accuracy 85.6%, 95% CI 80.03%-91.17% and AUC 0.93, 95% CI 0.89-0.97), but there was no significant difference in AUC between the SpeechAI and DemoSpeechAI models (t4=1.057 [2-sided]; P=.35). Compared with models using traditional acoustic features from the openSMILE toolkit, the SpeechAI model demonstrated superior performance (AUC 0.89) over traditional methods (logistic regression: AUC 0.62; decision tree: AUC 0.57; random forest: AUC 0.66).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;Our findings demonstrate that vocal biomarkers derived from deep learning-based acoustic features can be effectively used to predict frailty status in community-dwelling older adults. The SpeechAI model showed promising accuracy and AUC, outperforming models based solely on demographic data or traditional acoustic features. Furthermore, while the combined DemoSpeechAI model showed slightly improved performance over the SpeechAI model, the difference was not statistically significant. These results suggest that speech-based AI models offer a noninvasive, scalable method for frailty detection, potentially streamlining assessments in clinical and comm","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e57298"},"PeriodicalIF":3.1,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11756832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143016957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset. 评估和增强遗传咨询支持的日语大型语言模型:领域适应的比较研究和专家评估数据集的开发。
IF 3.1 3区 医学
JMIR Medical Informatics Pub Date : 2025-01-16 DOI: 10.2196/65047
Takuya Fukushima, Masae Manabe, Shuntaro Yada, Shoko Wakamiya, Akiko Yoshida, Yusaku Urakawa, Akiko Maeda, Shigeyuki Kan, Masayo Takahashi, Eiji Aramaki
{"title":"Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset.","authors":"Takuya Fukushima, Masae Manabe, Shuntaro Yada, Shoko Wakamiya, Akiko Yoshida, Yusaku Urakawa, Akiko Maeda, Shigeyuki Kan, Masayo Takahashi, Eiji Aramaki","doi":"10.2196/65047","DOIUrl":"10.2196/65047","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Advances in genetics have underscored a strong association between genetic factors and health outcomes, leading to an increased demand for genetic counseling services. However, a shortage of qualified genetic counselors poses a significant challenge. Large language models (LLMs) have emerged as a potential solution for augmenting support in genetic counseling tasks. Despite the potential, Japanese genetic counseling LLMs (JGCLLMs) are underexplored. To advance a JGCLLM-based dialogue system for genetic counseling, effective domain adaptation methods require investigation.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aims to evaluate the current capabilities and identify challenges in developing a JGCLLM-based dialogue system for genetic counseling. The primary focus is to assess the effectiveness of prompt engineering, retrieval-augmented generation (RAG), and instruction tuning within the context of genetic counseling. Furthermore, we will establish an experts-evaluated dataset of responses generated by LLMs adapted to Japanese genetic counseling for the future development of JGCLLMs.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;Two primary datasets were used in this study: (1) a question-answer (QA) dataset for LLM adaptation and (2) a genetic counseling question dataset for evaluation. The QA dataset included 899 QA pairs covering medical and genetic counseling topics, while the evaluation dataset contained 120 curated questions across 6 genetic counseling categories. Three enhancement techniques of LLMs-instruction tuning, RAG, and prompt engineering-were applied to a lightweight Japanese LLM to enhance its ability for genetic counseling. The performance of the adapted LLM was evaluated on the 120-question dataset by 2 certified genetic counselors and 1 ophthalmologist (SK, YU, and AY). Evaluation focused on four metrics: (1) inappropriateness of information, (2) sufficiency of information, (3) severity of harm, and (4) alignment with medical consensus.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;The evaluation by certified genetic counselors and an ophthalmologist revealed varied outcomes across different methods. RAG showed potential, particularly in enhancing critical aspects of genetic counseling. In contrast, instruction tuning and prompt engineering produced less favorable outcomes. This evaluation process facilitated the creation an expert-evaluated dataset of responses generated by LLMs adapted with different combinations of these methods. Error analysis identified key ethical concerns, including inappropriate promotion of prenatal testing, criticism of relatives, and inaccurate probability statements.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;RAG demonstrated notable improvements across all evaluation metrics, suggesting potential for further enhancement through the expansion of RAG data. The expert-evaluated dataset developed in this study provides valuable insights for future optimization efforts. However, the ethical issues obser","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e65047"},"PeriodicalIF":3.1,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783024/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143016961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effectiveness of the Facility for Elderly Surveillance System (FESSy) in Two Public Health Center Jurisdictions in Japan: Prospective Observational Study. 日本两个公共卫生中心辖区老年人监测系统设施(FESSy)的有效性:前瞻性观察研究。
IF 3.1 3区 医学
JMIR Medical Informatics Pub Date : 2025-01-10 DOI: 10.2196/58509
Junko Kurita, Motomi Hori, Sumiyo Yamaguchi, Aiko Ogiwara, Yurina Saito, Minako Sugiyama, Asami Sunadori, Tomoko Hayashi, Akane Hara, Yukari Kawana, Youichi Itoi, Tamie Sugawara, Yoshiyuki Sugishita, Fujiko Irie, Naomi Sakurai
{"title":"Effectiveness of the Facility for Elderly Surveillance System (FESSy) in Two Public Health Center Jurisdictions in Japan: Prospective Observational Study.","authors":"Junko Kurita, Motomi Hori, Sumiyo Yamaguchi, Aiko Ogiwara, Yurina Saito, Minako Sugiyama, Asami Sunadori, Tomoko Hayashi, Akane Hara, Yukari Kawana, Youichi Itoi, Tamie Sugawara, Yoshiyuki Sugishita, Fujiko Irie, Naomi Sakurai","doi":"10.2196/58509","DOIUrl":"10.2196/58509","url":null,"abstract":"<p><strong>Background: </strong>Residents of facilities for older people are vulnerable to COVID-19 outbreaks. Nevertheless, timely recognition of outbreaks at facilities for older people at public health centers has been impossible in Japan since May 8, 2023, when the Japanese government discontinued aggressive countermeasures against COVID-19 because of the waning severity of the dominant Omicron strain. The Facility for Elderly Surveillance System (FESSy) has been developed to improve information collection.</p><p><strong>Objective: </strong>This study examined FESSy experiences and effectiveness in two public health center jurisdictions in Japan.</p><p><strong>Methods: </strong>This study assessed the use by public health centers of the detection mode of an automated AI detection system (ie, FESSy AI), as well as manual detection by the public health centers' staff (ie, FESSy staff) and direct reporting by facilities to the public health centers. We considered the following aspects: (1) diagnoses or symptoms, (2) numbers of patients as of their detection date, and (3) ultimate numbers of patients involved in incidents. Subsequently, effectiveness was assessed and compared based on detection modes. The study lasted from June 1, 2023, through January 2024.</p><p><strong>Results: </strong>In both areas, this study examined 31 facilities at which 87 incidents were detected. FESSy (AI or staff) detected significantly fewer patients than non-FESSy methods, that is, direct reporting to the public health center of the detection date and ultimate number of patients.</p><p><strong>Conclusions: </strong>FESSy was superior to direct reporting from facilities for the number of patients as of the detection date and for the ultimate outbreak size.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e58509"},"PeriodicalIF":3.1,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11741194/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142973490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Qwen-2.5 Outperforms Other Large Language Models in the Chinese National Nursing Licensing Examination: Retrospective Cross-Sectional Comparative Study. Qwen-2.5在全国护理执业资格考试中优于其他大型语言模型:回顾性横断面比较研究。
IF 3.1 3区 医学
JMIR Medical Informatics Pub Date : 2025-01-10 DOI: 10.2196/63731
Shiben Zhu, Wanqin Hu, Zhi Yang, Jiani Yan, Fang Zhang
{"title":"Qwen-2.5 Outperforms Other Large Language Models in the Chinese National Nursing Licensing Examination: Retrospective Cross-Sectional Comparative Study.","authors":"Shiben Zhu, Wanqin Hu, Zhi Yang, Jiani Yan, Fang Zhang","doi":"10.2196/63731","DOIUrl":"10.2196/63731","url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) have been proposed as valuable tools in medical education and practice. The Chinese National Nursing Licensing Examination (CNNLE) presents unique challenges for LLMs due to its requirement for both deep domain-specific nursing knowledge and the ability to make complex clinical decisions, which differentiates it from more general medical examinations. However, their potential application in the CNNLE remains unexplored.</p><p><strong>Objective: </strong>This study aims to evaluates the accuracy of 7 LLMs including GPT-3.5, GPT-4.0, GPT-4o, Copilot, ERNIE Bot-3.5, SPARK, and Qwen-2.5 on the CNNLE, focusing on their ability to handle domain-specific nursing knowledge and clinical decision-making. We also explore whether combining their outputs using machine learning techniques can improve their overall accuracy.</p><p><strong>Methods: </strong>This retrospective cross-sectional study analyzed all 1200 multiple-choice questions from the CNNLE conducted between 2019 and 2023. Seven LLMs were evaluated on these multiple-choice questions, and 9 machine learning models, including Logistic Regression, Support Vector Machine, Multilayer Perceptron, k-nearest neighbors, Random Forest, LightGBM, AdaBoost, XGBoost, and CatBoost, were used to optimize overall performance through ensemble techniques.</p><p><strong>Results: </strong>Qwen-2.5 achieved the highest overall accuracy of 88.9%, followed by GPT-4o (80.7%), ERNIE Bot-3.5 (78.1%), GPT-4.0 (70.3%), SPARK (65.0%), and GPT-3.5 (49.5%). Qwen-2.5 demonstrated superior accuracy in the Practical Skills section compared with the Professional Practice section across most years. It also performed well in brief clinical case summaries and questions involving shared clinical scenarios. When the outputs of the 7 LLMs were combined using 9 machine learning models, XGBoost yielded the best performance, increasing accuracy to 90.8%. XGBoost also achieved an area under the curve of 0.961, sensitivity of 0.905, specificity of 0.978, F<sub>1</sub>-score of 0.901, positive predictive value of 0.901, and negative predictive value of 0.977.</p><p><strong>Conclusions: </strong>This study is the first to evaluate the performance of 7 LLMs on the CNNLE and that the integration of models via machine learning significantly boosted accuracy, reaching 90.8%. These findings demonstrate the transformative potential of LLMs in revolutionizing health care education and call for further research to refine their capabilities and expand their impact on examination preparation and professional training.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e63731"},"PeriodicalIF":3.1,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11759905/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142962601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信