Predicting functional outcomes after a stroke event by clinical text notes: A comparative study of traditional machine learning and deep learning methods.
{"title":"Predicting functional outcomes after a stroke event by clinical text notes: A comparative study of traditional machine learning and deep learning methods.","authors":"Yu-Hsiang Su, Chih-Fong Tsai","doi":"10.1177/14604582251381194","DOIUrl":null,"url":null,"abstract":"<p><p><b>Objective:</b> Accurately predicting functional outcomes after acute ischemic stroke is essential for healthcare institutions to optimize staffing and resource allocation. Although text mining has been applied to build such models, most prior studies emphasize traditional machine learning, with limited comparison to deep learning methods. <b>Methods:</b> Clinical text notes were collected from a Taiwanese hospital to build the experimental dataset. Four textual feature representation techniques were evaluated: bag-of-words (BOW), term frequency-inverse document frequency (TF-IDF), embeddings from language models (ELMo), and bidirectional encoder representations from transformers (BERT). Correspondingly, four predictive models were tested: k-nearest neighbor (KNN), support vector machine (SVM), convolutional neural network (CNN), and long short-term memory (LSTM). <b>Results:</b> The best performance was obtained using BOW features with an SVM classifier. Feature fusion strategies, combining representations such as BOW + TF-IDF and BOW + BERT, also yielded strong performance. Notably, the BOW + TF-IDF combination with SVM achieved the lowest type I error, effectively minimizing the misclassification of patients with poor outcomes. <b>Conclusion:</b> Traditional machine learning methods outperformed deep learning models in this study. Among all combinations, BOW + TF-IDF features with SVM provided the most accurate predictions and lowest risk of false positives in stroke outcome prediction.</p>","PeriodicalId":55069,"journal":{"name":"Health Informatics Journal","volume":"31 3","pages":"14604582251381194"},"PeriodicalIF":2.3000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Informatics Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/14604582251381194","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/17 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Accurately predicting functional outcomes after acute ischemic stroke is essential for healthcare institutions to optimize staffing and resource allocation. Although text mining has been applied to build such models, most prior studies emphasize traditional machine learning, with limited comparison to deep learning methods. Methods: Clinical text notes were collected from a Taiwanese hospital to build the experimental dataset. Four textual feature representation techniques were evaluated: bag-of-words (BOW), term frequency-inverse document frequency (TF-IDF), embeddings from language models (ELMo), and bidirectional encoder representations from transformers (BERT). Correspondingly, four predictive models were tested: k-nearest neighbor (KNN), support vector machine (SVM), convolutional neural network (CNN), and long short-term memory (LSTM). Results: The best performance was obtained using BOW features with an SVM classifier. Feature fusion strategies, combining representations such as BOW + TF-IDF and BOW + BERT, also yielded strong performance. Notably, the BOW + TF-IDF combination with SVM achieved the lowest type I error, effectively minimizing the misclassification of patients with poor outcomes. Conclusion: Traditional machine learning methods outperformed deep learning models in this study. Among all combinations, BOW + TF-IDF features with SVM provided the most accurate predictions and lowest risk of false positives in stroke outcome prediction.
期刊介绍:
Health Informatics Journal is an international peer-reviewed journal. All papers submitted to Health Informatics Journal are subject to peer review by members of a carefully appointed editorial board. The journal operates a conventional single-blind reviewing policy in which the reviewer’s name is always concealed from the submitting author.