Junyan Zhang, Junchen Zhou, Liqin Zhou, Zhichao Ba
{"title":"提取慢性疾病共病患者的多面特征:使用大语言模型的框架开发。","authors":"Junyan Zhang, Junchen Zhou, Liqin Zhou, Zhichao Ba","doi":"10.2196/70096","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Research on chronic multimorbidity has increasingly become a focal point with the aging of the population. Many studies in this area require detailed patient characteristic information. However, the current methods for extracting such information are complex, time-consuming, and prone to errors. The challenge of quickly and accurately extracting patient characteristics has become a common issue in the study of chronic disease comorbidities.</p><p><strong>Objective: </strong>Our objective was to establish a comprehensive framework for extracting demographic and disease characteristics of patients with multimorbidity. This framework leverages large language models (LLMs) to extract feature information from unstructured and semistructured electronic health records pertaining to these patients. We investigated the model's proficiency in extracting feature information across 7 dimensions: basic information, disease details, lifestyle habits, family medical history, symptom history, medication recommendations, and dietary advice. In addition, we demonstrated the strengths and limitations of this framework.</p><p><strong>Methods: </strong>We used data sourced from a grassroots community health service center in China. We developed a multifaceted feature extraction framework tailored for patients with multimorbidity, which consists of several integral components: feasibility testing, preprocessing, the determination of feature extraction, prompt modeling based on LLMs, postprocessing, and midterm evaluation. Within this framework, 7 types of feature information were extracted as straightforward features, and three types of features were identified as intricate features. On the basis of the straightforward features, we calculated patients' age, BMI, and 12 disease risk factors. Rigorous manual verification experiments were conducted 100 times for straightforward features and 200 times for intricate features, followed by comprehensive quantitative and qualitative assessments of the experimental outcomes.</p><p><strong>Results: </strong>The framework achieved an overall F<sub>1</sub>-score of 99.6% for the 7 straightforward feature extractions, with the highest F<sub>1</sub>-score of 100% for basic information. In addition, the framework demonstrated an overall F<sub>1</sub>-score of 94.4% for the 3 intricate feature extractions. Our analysis of the results revealed that accurate information content extraction is a substantially advantage of this framework, whereas ensuring consistency in the format of extracted information remains one of its challenges.</p><p><strong>Conclusions: </strong>The framework incorporates electronic health record information from 1225 patients with multimorbidity, covering a diverse range of 41 chronic diseases, and can seamlessly accommodate the inclusion of additional diseases. This underscores its scalability and adaptability as a method for extracting patient-specific characteristics, effectively addressing the challenges associated with information extraction in the context of multidisease research. Research and medical policy personnel can extract feature information by setting corresponding goals based on the research objectives and directly using the LLM for zero-sample target feature extraction. This approach greatly improves research efficiency and reduces labor requirements; moreover, due to the framework's high accuracy, it can increase study reliability.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e70096"},"PeriodicalIF":3.1000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12123238/pdf/","citationCount":"0","resultStr":"{\"title\":\"Extracting Multifaceted Characteristics of Patients With Chronic Disease Comorbidity: Framework Development Using Large Language Models.\",\"authors\":\"Junyan Zhang, Junchen Zhou, Liqin Zhou, Zhichao Ba\",\"doi\":\"10.2196/70096\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Research on chronic multimorbidity has increasingly become a focal point with the aging of the population. Many studies in this area require detailed patient characteristic information. However, the current methods for extracting such information are complex, time-consuming, and prone to errors. The challenge of quickly and accurately extracting patient characteristics has become a common issue in the study of chronic disease comorbidities.</p><p><strong>Objective: </strong>Our objective was to establish a comprehensive framework for extracting demographic and disease characteristics of patients with multimorbidity. This framework leverages large language models (LLMs) to extract feature information from unstructured and semistructured electronic health records pertaining to these patients. We investigated the model's proficiency in extracting feature information across 7 dimensions: basic information, disease details, lifestyle habits, family medical history, symptom history, medication recommendations, and dietary advice. In addition, we demonstrated the strengths and limitations of this framework.</p><p><strong>Methods: </strong>We used data sourced from a grassroots community health service center in China. We developed a multifaceted feature extraction framework tailored for patients with multimorbidity, which consists of several integral components: feasibility testing, preprocessing, the determination of feature extraction, prompt modeling based on LLMs, postprocessing, and midterm evaluation. Within this framework, 7 types of feature information were extracted as straightforward features, and three types of features were identified as intricate features. On the basis of the straightforward features, we calculated patients' age, BMI, and 12 disease risk factors. Rigorous manual verification experiments were conducted 100 times for straightforward features and 200 times for intricate features, followed by comprehensive quantitative and qualitative assessments of the experimental outcomes.</p><p><strong>Results: </strong>The framework achieved an overall F<sub>1</sub>-score of 99.6% for the 7 straightforward feature extractions, with the highest F<sub>1</sub>-score of 100% for basic information. In addition, the framework demonstrated an overall F<sub>1</sub>-score of 94.4% for the 3 intricate feature extractions. Our analysis of the results revealed that accurate information content extraction is a substantially advantage of this framework, whereas ensuring consistency in the format of extracted information remains one of its challenges.</p><p><strong>Conclusions: </strong>The framework incorporates electronic health record information from 1225 patients with multimorbidity, covering a diverse range of 41 chronic diseases, and can seamlessly accommodate the inclusion of additional diseases. This underscores its scalability and adaptability as a method for extracting patient-specific characteristics, effectively addressing the challenges associated with information extraction in the context of multidisease research. Research and medical policy personnel can extract feature information by setting corresponding goals based on the research objectives and directly using the LLM for zero-sample target feature extraction. This approach greatly improves research efficiency and reduces labor requirements; moreover, due to the framework's high accuracy, it can increase study reliability.</p>\",\"PeriodicalId\":56334,\"journal\":{\"name\":\"JMIR Medical Informatics\",\"volume\":\"13 \",\"pages\":\"e70096\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12123238/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/70096\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/70096","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
Extracting Multifaceted Characteristics of Patients With Chronic Disease Comorbidity: Framework Development Using Large Language Models.
Background: Research on chronic multimorbidity has increasingly become a focal point with the aging of the population. Many studies in this area require detailed patient characteristic information. However, the current methods for extracting such information are complex, time-consuming, and prone to errors. The challenge of quickly and accurately extracting patient characteristics has become a common issue in the study of chronic disease comorbidities.
Objective: Our objective was to establish a comprehensive framework for extracting demographic and disease characteristics of patients with multimorbidity. This framework leverages large language models (LLMs) to extract feature information from unstructured and semistructured electronic health records pertaining to these patients. We investigated the model's proficiency in extracting feature information across 7 dimensions: basic information, disease details, lifestyle habits, family medical history, symptom history, medication recommendations, and dietary advice. In addition, we demonstrated the strengths and limitations of this framework.
Methods: We used data sourced from a grassroots community health service center in China. We developed a multifaceted feature extraction framework tailored for patients with multimorbidity, which consists of several integral components: feasibility testing, preprocessing, the determination of feature extraction, prompt modeling based on LLMs, postprocessing, and midterm evaluation. Within this framework, 7 types of feature information were extracted as straightforward features, and three types of features were identified as intricate features. On the basis of the straightforward features, we calculated patients' age, BMI, and 12 disease risk factors. Rigorous manual verification experiments were conducted 100 times for straightforward features and 200 times for intricate features, followed by comprehensive quantitative and qualitative assessments of the experimental outcomes.
Results: The framework achieved an overall F1-score of 99.6% for the 7 straightforward feature extractions, with the highest F1-score of 100% for basic information. In addition, the framework demonstrated an overall F1-score of 94.4% for the 3 intricate feature extractions. Our analysis of the results revealed that accurate information content extraction is a substantially advantage of this framework, whereas ensuring consistency in the format of extracted information remains one of its challenges.
Conclusions: The framework incorporates electronic health record information from 1225 patients with multimorbidity, covering a diverse range of 41 chronic diseases, and can seamlessly accommodate the inclusion of additional diseases. This underscores its scalability and adaptability as a method for extracting patient-specific characteristics, effectively addressing the challenges associated with information extraction in the context of multidisease research. Research and medical policy personnel can extract feature information by setting corresponding goals based on the research objectives and directly using the LLM for zero-sample target feature extraction. This approach greatly improves research efficiency and reduces labor requirements; moreover, due to the framework's high accuracy, it can increase study reliability.
期刊介绍:
JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals.
Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.