让我们问病人：根据病人在自由文本中的症状描述来预测疾病。

IF 2.1 Q3 RHEUMATOLOGY

Rheumatology Advances in Practice Pub Date : 2025-09-10 eCollection Date: 2025-01-01 DOI:10.1093/rap/rkaf103

Inés Pérez-Sancristóbal, Nils Steinz, Ling Qin, Tjardo Maarseveen, Floor Zegers, Barbara Bislawska Axnäs, Luis Rodríguez-Rodríguez, Rachel Knevel

{"title":"让我们问病人：根据病人在自由文本中的症状描述来预测疾病。","authors":"Inés Pérez-Sancristóbal, Nils Steinz, Ling Qin, Tjardo Maarseveen, Floor Zegers, Barbara Bislawska Axnäs, Luis Rodríguez-Rodríguez, Rachel Knevel","doi":"10.1093/rap/rkaf103","DOIUrl":null,"url":null,"abstract":"Objective: This study evaluates the value self-reported free-text symptom descriptions for supporting diagnostic decisions in osteoarthritis (OA), fibromyalgia (FM) and immune-mediated rheumatic diseases (imRD) using natural language processing (NLP) and machine learning (ML).Methods: Free-text descriptions from 8454 patients were processed using a word-weighting method (TF-IDF vectorization) that reflects how relevant each word is across the dataset, and then classified with support vector machine (SVM) models. OA and FM models were optimized for specificity and the imRD model for sensitivity based on disease context and validated against an independent dataset. Model explainability was explored using SHapley Additive exPlanations (SHAP) values.Results: The SVM models demonstrated moderate diagnostic support potential with Area under the Receiver Operating Characteristic Curve (AUC-ROC) values of 0.68 for OA, 0.75 for FM and 0.69 for imRD. When optimized for clinical utility, the models achieved high specificity of 0.82 for OA and 0.92 for FM, effectively reducing unnecessary referrals with misdiagnosis rates of only 17% and 8%, respectively. For imRD, the model achieved a sensitivity of 0.92 and negative predictive value (NPV) of 0.77, ensuring minimal missed diagnoses of these potentially serious conditions. Decision curve analysis confirmed clinical utility across varying threshold preferences. SHAP analysis revealed that key linguistic patterns in patient descriptions aligned with clinical reasoning, enhancing the models' interpretability.Conclusion: Our results highlight the value of patient-reported data in augmenting rheumatology decision-making and sets the stage for further development in AI-assisted diagnostics. While not a standalone diagnostic tool, the integration of NLP-driven analysis of free-text symptom descriptions shows promise in reducing diagnostic ambiguity.","PeriodicalId":21350,"journal":{"name":"Rheumatology Advances in Practice","volume":"9 4","pages":"rkaf103"},"PeriodicalIF":2.1000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456275/pdf/","citationCount":"0","resultStr":"{\"title\":\"Let's ask the patient: disease prediction based on patients' symptom descriptions in free text.\",\"authors\":\"Inés Pérez-Sancristóbal, Nils Steinz, Ling Qin, Tjardo Maarseveen, Floor Zegers, Barbara Bislawska Axnäs, Luis Rodríguez-Rodríguez, Rachel Knevel\",\"doi\":\"10.1093/rap/rkaf103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: This study evaluates the value self-reported free-text symptom descriptions for supporting diagnostic decisions in osteoarthritis (OA), fibromyalgia (FM) and immune-mediated rheumatic diseases (imRD) using natural language processing (NLP) and machine learning (ML).Methods: Free-text descriptions from 8454 patients were processed using a word-weighting method (TF-IDF vectorization) that reflects how relevant each word is across the dataset, and then classified with support vector machine (SVM) models. OA and FM models were optimized for specificity and the imRD model for sensitivity based on disease context and validated against an independent dataset. Model explainability was explored using SHapley Additive exPlanations (SHAP) values.Results: The SVM models demonstrated moderate diagnostic support potential with Area under the Receiver Operating Characteristic Curve (AUC-ROC) values of 0.68 for OA, 0.75 for FM and 0.69 for imRD. When optimized for clinical utility, the models achieved high specificity of 0.82 for OA and 0.92 for FM, effectively reducing unnecessary referrals with misdiagnosis rates of only 17% and 8%, respectively. For imRD, the model achieved a sensitivity of 0.92 and negative predictive value (NPV) of 0.77, ensuring minimal missed diagnoses of these potentially serious conditions. Decision curve analysis confirmed clinical utility across varying threshold preferences. SHAP analysis revealed that key linguistic patterns in patient descriptions aligned with clinical reasoning, enhancing the models' interpretability.Conclusion: Our results highlight the value of patient-reported data in augmenting rheumatology decision-making and sets the stage for further development in AI-assisted diagnostics. While not a standalone diagnostic tool, the integration of NLP-driven analysis of free-text symptom descriptions shows promise in reducing diagnostic ambiguity.\",\"PeriodicalId\":21350,\"journal\":{\"name\":\"Rheumatology Advances in Practice\",\"volume\":\"9 4\",\"pages\":\"rkaf103\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456275/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Rheumatology Advances in Practice\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/rap/rkaf103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"RHEUMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Rheumatology Advances in Practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/rap/rkaf103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

目的：本研究利用自然语言处理（NLP）和机器学习（ML）评估自我报告的自由文本症状描述在支持骨关节炎（OA）、纤维肌痛（FM）和免疫介导的风湿性疾病（imRD）诊断决策中的价值。方法：对来自8454例患者的自由文本描述使用反映数据集中每个词的相关程度的单词加权方法（TF-IDF矢量化）进行处理，然后使用支持向量机（SVM）模型进行分类。OA和FM模型的特异性和基于疾病背景的imRD模型的敏感性进行了优化，并针对独立的数据集进行了验证。利用SHapley加性解释（SHAP）值探讨模型的可解释性。结果：SVM模型表现出中等的诊断支持潜力，OA、FM和imRD的AUC-ROC值分别为0.68、0.75和0.69。经过优化后的临床应用，模型对OA和FM的特异性分别为0.82和0.92，有效减少了不必要的转诊，误诊率分别为17%和8%。对于imRD，该模型的敏感性为0.92，阴性预测值（NPV）为0.77，确保了对这些潜在严重疾病的最小漏诊。决策曲线分析证实了不同阈值偏好的临床效用。SHAP分析显示，患者描述中的关键语言模式与临床推理一致，增强了模型的可解释性。结论：我们的研究结果强调了患者报告数据在增强风湿病决策方面的价值，并为人工智能辅助诊断的进一步发展奠定了基础。虽然不是一个独立的诊断工具，但集成了由nlp驱动的自由文本症状描述分析，有望减少诊断歧义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Let's ask the patient: disease prediction based on patients' symptom descriptions in free text.

查看原文本刊更多论文

Let's ask the patient: disease prediction based on patients' symptom descriptions in free text.

Objective: This study evaluates the value self-reported free-text symptom descriptions for supporting diagnostic decisions in osteoarthritis (OA), fibromyalgia (FM) and immune-mediated rheumatic diseases (imRD) using natural language processing (NLP) and machine learning (ML).

Methods: Free-text descriptions from 8454 patients were processed using a word-weighting method (TF-IDF vectorization) that reflects how relevant each word is across the dataset, and then classified with support vector machine (SVM) models. OA and FM models were optimized for specificity and the imRD model for sensitivity based on disease context and validated against an independent dataset. Model explainability was explored using SHapley Additive exPlanations (SHAP) values.

Results: The SVM models demonstrated moderate diagnostic support potential with Area under the Receiver Operating Characteristic Curve (AUC-ROC) values of 0.68 for OA, 0.75 for FM and 0.69 for imRD. When optimized for clinical utility, the models achieved high specificity of 0.82 for OA and 0.92 for FM, effectively reducing unnecessary referrals with misdiagnosis rates of only 17% and 8%, respectively. For imRD, the model achieved a sensitivity of 0.92 and negative predictive value (NPV) of 0.77, ensuring minimal missed diagnoses of these potentially serious conditions. Decision curve analysis confirmed clinical utility across varying threshold preferences. SHAP analysis revealed that key linguistic patterns in patient descriptions aligned with clinical reasoning, enhancing the models' interpretability.

Conclusion: Our results highlight the value of patient-reported data in augmenting rheumatology decision-making and sets the stage for further development in AI-assisted diagnostics. While not a standalone diagnostic tool, the integration of NLP-driven analysis of free-text symptom descriptions shows promise in reducing diagnostic ambiguity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊