{"title":"通过临床文本笔记预测中风事件后的功能结果:传统机器学习和深度学习方法的比较研究。","authors":"Yu-Hsiang Su, Chih-Fong Tsai","doi":"10.1177/14604582251381194","DOIUrl":null,"url":null,"abstract":"<p><p><b>Objective:</b> Accurately predicting functional outcomes after acute ischemic stroke is essential for healthcare institutions to optimize staffing and resource allocation. Although text mining has been applied to build such models, most prior studies emphasize traditional machine learning, with limited comparison to deep learning methods. <b>Methods:</b> Clinical text notes were collected from a Taiwanese hospital to build the experimental dataset. Four textual feature representation techniques were evaluated: bag-of-words (BOW), term frequency-inverse document frequency (TF-IDF), embeddings from language models (ELMo), and bidirectional encoder representations from transformers (BERT). Correspondingly, four predictive models were tested: k-nearest neighbor (KNN), support vector machine (SVM), convolutional neural network (CNN), and long short-term memory (LSTM). <b>Results:</b> The best performance was obtained using BOW features with an SVM classifier. Feature fusion strategies, combining representations such as BOW + TF-IDF and BOW + BERT, also yielded strong performance. Notably, the BOW + TF-IDF combination with SVM achieved the lowest type I error, effectively minimizing the misclassification of patients with poor outcomes. <b>Conclusion:</b> Traditional machine learning methods outperformed deep learning models in this study. Among all combinations, BOW + TF-IDF features with SVM provided the most accurate predictions and lowest risk of false positives in stroke outcome prediction.</p>","PeriodicalId":55069,"journal":{"name":"Health Informatics Journal","volume":"31 3","pages":"14604582251381194"},"PeriodicalIF":2.3000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting functional outcomes after a stroke event by clinical text notes: A comparative study of traditional machine learning and deep learning methods.\",\"authors\":\"Yu-Hsiang Su, Chih-Fong Tsai\",\"doi\":\"10.1177/14604582251381194\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Objective:</b> Accurately predicting functional outcomes after acute ischemic stroke is essential for healthcare institutions to optimize staffing and resource allocation. Although text mining has been applied to build such models, most prior studies emphasize traditional machine learning, with limited comparison to deep learning methods. <b>Methods:</b> Clinical text notes were collected from a Taiwanese hospital to build the experimental dataset. Four textual feature representation techniques were evaluated: bag-of-words (BOW), term frequency-inverse document frequency (TF-IDF), embeddings from language models (ELMo), and bidirectional encoder representations from transformers (BERT). Correspondingly, four predictive models were tested: k-nearest neighbor (KNN), support vector machine (SVM), convolutional neural network (CNN), and long short-term memory (LSTM). <b>Results:</b> The best performance was obtained using BOW features with an SVM classifier. Feature fusion strategies, combining representations such as BOW + TF-IDF and BOW + BERT, also yielded strong performance. Notably, the BOW + TF-IDF combination with SVM achieved the lowest type I error, effectively minimizing the misclassification of patients with poor outcomes. <b>Conclusion:</b> Traditional machine learning methods outperformed deep learning models in this study. Among all combinations, BOW + TF-IDF features with SVM provided the most accurate predictions and lowest risk of false positives in stroke outcome prediction.</p>\",\"PeriodicalId\":55069,\"journal\":{\"name\":\"Health Informatics Journal\",\"volume\":\"31 3\",\"pages\":\"14604582251381194\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health Informatics Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/14604582251381194\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/9/17 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Informatics Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/14604582251381194","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/17 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Predicting functional outcomes after a stroke event by clinical text notes: A comparative study of traditional machine learning and deep learning methods.
Objective: Accurately predicting functional outcomes after acute ischemic stroke is essential for healthcare institutions to optimize staffing and resource allocation. Although text mining has been applied to build such models, most prior studies emphasize traditional machine learning, with limited comparison to deep learning methods. Methods: Clinical text notes were collected from a Taiwanese hospital to build the experimental dataset. Four textual feature representation techniques were evaluated: bag-of-words (BOW), term frequency-inverse document frequency (TF-IDF), embeddings from language models (ELMo), and bidirectional encoder representations from transformers (BERT). Correspondingly, four predictive models were tested: k-nearest neighbor (KNN), support vector machine (SVM), convolutional neural network (CNN), and long short-term memory (LSTM). Results: The best performance was obtained using BOW features with an SVM classifier. Feature fusion strategies, combining representations such as BOW + TF-IDF and BOW + BERT, also yielded strong performance. Notably, the BOW + TF-IDF combination with SVM achieved the lowest type I error, effectively minimizing the misclassification of patients with poor outcomes. Conclusion: Traditional machine learning methods outperformed deep learning models in this study. Among all combinations, BOW + TF-IDF features with SVM provided the most accurate predictions and lowest risk of false positives in stroke outcome prediction.
期刊介绍:
Health Informatics Journal is an international peer-reviewed journal. All papers submitted to Health Informatics Journal are subject to peer review by members of a carefully appointed editorial board. The journal operates a conventional single-blind reviewing policy in which the reviewer’s name is always concealed from the submitting author.