{"title":"整合无监督和监督学习技术预测创伤性脑损伤:一项基于人群的研究","authors":"Suvd Zulbayar , Tatyana Mollayeva , Angela Colantonio , Vincy Chan , Michael Escobar","doi":"10.1016/j.ibmed.2023.100118","DOIUrl":null,"url":null,"abstract":"<div><p>This work aimed to identify pre-existing health conditions of patients with traumatic brain injury (TBI) and develop predictive models for the first TBI event and its external causes by employing a combination of unsupervised and supervised learning algorithms. We acquired up to five years of pre-injury diagnoses for 488,107 patients with TBI and 488,107 matched control patients who entered the emergency department or acute care hospitals between April 1st, 2002, and March 31st, 2020. Diagnoses were obtained from the Ontario Health Insurance Plan (OHIP) database which contains province-wide claims data by physicians in Ontario, Canada for inpatient and outpatient services. A screening process was conducted on the OHIP diagnostic codes to limit the subsequent analysis to codes that were predictive of TBI, which concluded that 314 codes were significantly associated with TBI. The Latent Dirichlet Allocation (LDA) model was applied to the diagnostic codes and generated an optimal number of 19 topics that concur with published literature but also suggest other unexplored areas. Estimated word-topic probabilities from the LDA model helped us detect pre-morbid conditions among patients with TBI by uncovering the underlying patterns of diagnoses, meanwhile estimated document-topic probabilities were utilized in variable creation as form of a dimension reduction. We created 19 topic scores for each patient in the cohort which were utilized along with socio-demographic factors for Random Forest binary classifier models. Test set performances evaluated using area under the receiver operating characteristic curve (AUC) were: TBI event (AUC = 0.85), external cause of injury: falls (AUC = 0.85), struck by/against (AUC = 0.83), cyclist collision (AUC = 0.76), motor vehicle collision (AUC = 0.83). Our analysis successfully demonstrated the feasibility of using machine learning to predict TBI due to various external causes and identified the most important factors that contribute to this prediction.</p></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"8 ","pages":"Article 100118"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666521223000327/pdfft?md5=7a277cf7b43235bd2edae27f0d38c38d&pid=1-s2.0-S2666521223000327-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Integrating unsupervised and supervised learning techniques to predict traumatic brain injury: A population-based study\",\"authors\":\"Suvd Zulbayar , Tatyana Mollayeva , Angela Colantonio , Vincy Chan , Michael Escobar\",\"doi\":\"10.1016/j.ibmed.2023.100118\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This work aimed to identify pre-existing health conditions of patients with traumatic brain injury (TBI) and develop predictive models for the first TBI event and its external causes by employing a combination of unsupervised and supervised learning algorithms. We acquired up to five years of pre-injury diagnoses for 488,107 patients with TBI and 488,107 matched control patients who entered the emergency department or acute care hospitals between April 1st, 2002, and March 31st, 2020. Diagnoses were obtained from the Ontario Health Insurance Plan (OHIP) database which contains province-wide claims data by physicians in Ontario, Canada for inpatient and outpatient services. A screening process was conducted on the OHIP diagnostic codes to limit the subsequent analysis to codes that were predictive of TBI, which concluded that 314 codes were significantly associated with TBI. The Latent Dirichlet Allocation (LDA) model was applied to the diagnostic codes and generated an optimal number of 19 topics that concur with published literature but also suggest other unexplored areas. Estimated word-topic probabilities from the LDA model helped us detect pre-morbid conditions among patients with TBI by uncovering the underlying patterns of diagnoses, meanwhile estimated document-topic probabilities were utilized in variable creation as form of a dimension reduction. We created 19 topic scores for each patient in the cohort which were utilized along with socio-demographic factors for Random Forest binary classifier models. Test set performances evaluated using area under the receiver operating characteristic curve (AUC) were: TBI event (AUC = 0.85), external cause of injury: falls (AUC = 0.85), struck by/against (AUC = 0.83), cyclist collision (AUC = 0.76), motor vehicle collision (AUC = 0.83). Our analysis successfully demonstrated the feasibility of using machine learning to predict TBI due to various external causes and identified the most important factors that contribute to this prediction.</p></div>\",\"PeriodicalId\":73399,\"journal\":{\"name\":\"Intelligence-based medicine\",\"volume\":\"8 \",\"pages\":\"Article 100118\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666521223000327/pdfft?md5=7a277cf7b43235bd2edae27f0d38c38d&pid=1-s2.0-S2666521223000327-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligence-based medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666521223000327\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666521223000327","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Integrating unsupervised and supervised learning techniques to predict traumatic brain injury: A population-based study
This work aimed to identify pre-existing health conditions of patients with traumatic brain injury (TBI) and develop predictive models for the first TBI event and its external causes by employing a combination of unsupervised and supervised learning algorithms. We acquired up to five years of pre-injury diagnoses for 488,107 patients with TBI and 488,107 matched control patients who entered the emergency department or acute care hospitals between April 1st, 2002, and March 31st, 2020. Diagnoses were obtained from the Ontario Health Insurance Plan (OHIP) database which contains province-wide claims data by physicians in Ontario, Canada for inpatient and outpatient services. A screening process was conducted on the OHIP diagnostic codes to limit the subsequent analysis to codes that were predictive of TBI, which concluded that 314 codes were significantly associated with TBI. The Latent Dirichlet Allocation (LDA) model was applied to the diagnostic codes and generated an optimal number of 19 topics that concur with published literature but also suggest other unexplored areas. Estimated word-topic probabilities from the LDA model helped us detect pre-morbid conditions among patients with TBI by uncovering the underlying patterns of diagnoses, meanwhile estimated document-topic probabilities were utilized in variable creation as form of a dimension reduction. We created 19 topic scores for each patient in the cohort which were utilized along with socio-demographic factors for Random Forest binary classifier models. Test set performances evaluated using area under the receiver operating characteristic curve (AUC) were: TBI event (AUC = 0.85), external cause of injury: falls (AUC = 0.85), struck by/against (AUC = 0.83), cyclist collision (AUC = 0.76), motor vehicle collision (AUC = 0.83). Our analysis successfully demonstrated the feasibility of using machine learning to predict TBI due to various external causes and identified the most important factors that contribute to this prediction.