{"title":"Early Disease Prediction Using a Text-Numerical Hybrid Model Using Large-Scale Clinical Real-World Data.","authors":"Ayaka Oka, Tatsuya Yamaguchi, Masaki Ishihara, Takayuki Baba, Tatsuya Sato, Kazuki Iwamoto, Ryo Iwamura, Shigetaka Toma, Kaho Ogura, Masahiro Kimura, Hokuto Morohoshi, Akio Nakamura","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>To assist physicians in predicting diseases, most natural language processing (NLP) models have focused on progress notes in electronic medical records with full descriptions from the initial stage of patient diagnosis to the final stage of discharge. However, accurately predicting diseases in the early stage using initial notes is challenging due to limited information. To address this, a text-numerical hybrid method is developed to improve disease prediction accuracy. The method identifies \"Reliably predicted diseases (RPD)\" that can be robustly predicted in the NLP and Random Forest models even if there are missing values in the numerical data or the amount of text data is small. Results show that, among the predicted disease groups of the two models, diseases matching the RPD are preferentially adopted and integrated. Precision@10 reveals that our developed method has a relatively higher accuracy of 67.0% than the traditional NLP model.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2024 ","pages":"885-893"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12099446/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AMIA ... Annual Symposium proceedings. AMIA Symposium","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
To assist physicians in predicting diseases, most natural language processing (NLP) models have focused on progress notes in electronic medical records with full descriptions from the initial stage of patient diagnosis to the final stage of discharge. However, accurately predicting diseases in the early stage using initial notes is challenging due to limited information. To address this, a text-numerical hybrid method is developed to improve disease prediction accuracy. The method identifies "Reliably predicted diseases (RPD)" that can be robustly predicted in the NLP and Random Forest models even if there are missing values in the numerical data or the amount of text data is small. Results show that, among the predicted disease groups of the two models, diseases matching the RPD are preferentially adopted and integrated. Precision@10 reveals that our developed method has a relatively higher accuracy of 67.0% than the traditional NLP model.