Inés Pérez-Sancristóbal, Nils Steinz, Ling Qin, Tjardo Maarseveen, Floor Zegers, Barbara Bislawska Axnäs, Luis Rodríguez-Rodríguez, Rachel Knevel
{"title":"让我们问病人:根据病人在自由文本中的症状描述来预测疾病。","authors":"Inés Pérez-Sancristóbal, Nils Steinz, Ling Qin, Tjardo Maarseveen, Floor Zegers, Barbara Bislawska Axnäs, Luis Rodríguez-Rodríguez, Rachel Knevel","doi":"10.1093/rap/rkaf103","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study evaluates the value self-reported free-text symptom descriptions for supporting diagnostic decisions in osteoarthritis (OA), fibromyalgia (FM) and immune-mediated rheumatic diseases (imRD) using natural language processing (NLP) and machine learning (ML).</p><p><strong>Methods: </strong>Free-text descriptions from 8454 patients were processed using a word-weighting method (TF-IDF vectorization) that reflects how relevant each word is across the dataset, and then classified with support vector machine (SVM) models. OA and FM models were optimized for specificity and the imRD model for sensitivity based on disease context and validated against an independent dataset. Model explainability was explored using SHapley Additive exPlanations (SHAP) values.</p><p><strong>Results: </strong>The SVM models demonstrated moderate diagnostic support potential with Area under the Receiver Operating Characteristic Curve (AUC-ROC) values of 0.68 for OA, 0.75 for FM and 0.69 for imRD. When optimized for clinical utility, the models achieved high specificity of 0.82 for OA and 0.92 for FM, effectively reducing unnecessary referrals with misdiagnosis rates of only 17% and 8%, respectively. For imRD, the model achieved a sensitivity of 0.92 and negative predictive value (NPV) of 0.77, ensuring minimal missed diagnoses of these potentially serious conditions. Decision curve analysis confirmed clinical utility across varying threshold preferences. SHAP analysis revealed that key linguistic patterns in patient descriptions aligned with clinical reasoning, enhancing the models' interpretability.</p><p><strong>Conclusion: </strong>Our results highlight the value of patient-reported data in augmenting rheumatology decision-making and sets the stage for further development in AI-assisted diagnostics. While not a standalone diagnostic tool, the integration of NLP-driven analysis of free-text symptom descriptions shows promise in reducing diagnostic ambiguity.</p>","PeriodicalId":21350,"journal":{"name":"Rheumatology Advances in Practice","volume":"9 4","pages":"rkaf103"},"PeriodicalIF":2.1000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456275/pdf/","citationCount":"0","resultStr":"{\"title\":\"Let's ask the patient: disease prediction based on patients' symptom descriptions in free text.\",\"authors\":\"Inés Pérez-Sancristóbal, Nils Steinz, Ling Qin, Tjardo Maarseveen, Floor Zegers, Barbara Bislawska Axnäs, Luis Rodríguez-Rodríguez, Rachel Knevel\",\"doi\":\"10.1093/rap/rkaf103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>This study evaluates the value self-reported free-text symptom descriptions for supporting diagnostic decisions in osteoarthritis (OA), fibromyalgia (FM) and immune-mediated rheumatic diseases (imRD) using natural language processing (NLP) and machine learning (ML).</p><p><strong>Methods: </strong>Free-text descriptions from 8454 patients were processed using a word-weighting method (TF-IDF vectorization) that reflects how relevant each word is across the dataset, and then classified with support vector machine (SVM) models. OA and FM models were optimized for specificity and the imRD model for sensitivity based on disease context and validated against an independent dataset. Model explainability was explored using SHapley Additive exPlanations (SHAP) values.</p><p><strong>Results: </strong>The SVM models demonstrated moderate diagnostic support potential with Area under the Receiver Operating Characteristic Curve (AUC-ROC) values of 0.68 for OA, 0.75 for FM and 0.69 for imRD. When optimized for clinical utility, the models achieved high specificity of 0.82 for OA and 0.92 for FM, effectively reducing unnecessary referrals with misdiagnosis rates of only 17% and 8%, respectively. For imRD, the model achieved a sensitivity of 0.92 and negative predictive value (NPV) of 0.77, ensuring minimal missed diagnoses of these potentially serious conditions. Decision curve analysis confirmed clinical utility across varying threshold preferences. SHAP analysis revealed that key linguistic patterns in patient descriptions aligned with clinical reasoning, enhancing the models' interpretability.</p><p><strong>Conclusion: </strong>Our results highlight the value of patient-reported data in augmenting rheumatology decision-making and sets the stage for further development in AI-assisted diagnostics. While not a standalone diagnostic tool, the integration of NLP-driven analysis of free-text symptom descriptions shows promise in reducing diagnostic ambiguity.</p>\",\"PeriodicalId\":21350,\"journal\":{\"name\":\"Rheumatology Advances in Practice\",\"volume\":\"9 4\",\"pages\":\"rkaf103\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456275/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Rheumatology Advances in Practice\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/rap/rkaf103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"RHEUMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Rheumatology Advances in Practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/rap/rkaf103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}
Let's ask the patient: disease prediction based on patients' symptom descriptions in free text.
Objective: This study evaluates the value self-reported free-text symptom descriptions for supporting diagnostic decisions in osteoarthritis (OA), fibromyalgia (FM) and immune-mediated rheumatic diseases (imRD) using natural language processing (NLP) and machine learning (ML).
Methods: Free-text descriptions from 8454 patients were processed using a word-weighting method (TF-IDF vectorization) that reflects how relevant each word is across the dataset, and then classified with support vector machine (SVM) models. OA and FM models were optimized for specificity and the imRD model for sensitivity based on disease context and validated against an independent dataset. Model explainability was explored using SHapley Additive exPlanations (SHAP) values.
Results: The SVM models demonstrated moderate diagnostic support potential with Area under the Receiver Operating Characteristic Curve (AUC-ROC) values of 0.68 for OA, 0.75 for FM and 0.69 for imRD. When optimized for clinical utility, the models achieved high specificity of 0.82 for OA and 0.92 for FM, effectively reducing unnecessary referrals with misdiagnosis rates of only 17% and 8%, respectively. For imRD, the model achieved a sensitivity of 0.92 and negative predictive value (NPV) of 0.77, ensuring minimal missed diagnoses of these potentially serious conditions. Decision curve analysis confirmed clinical utility across varying threshold preferences. SHAP analysis revealed that key linguistic patterns in patient descriptions aligned with clinical reasoning, enhancing the models' interpretability.
Conclusion: Our results highlight the value of patient-reported data in augmenting rheumatology decision-making and sets the stage for further development in AI-assisted diagnostics. While not a standalone diagnostic tool, the integration of NLP-driven analysis of free-text symptom descriptions shows promise in reducing diagnostic ambiguity.