Zain U. Hussain, R. Comerford, Fynn Comerford, N. Ng, Dominic Ng, Ateeb Khan, C. Lees, A. Hussain
{"title":"预测克罗恩病进展的机器学习方法比较","authors":"Zain U. Hussain, R. Comerford, Fynn Comerford, N. Ng, Dominic Ng, Ateeb Khan, C. Lees, A. Hussain","doi":"10.1109/SCOReD50371.2020.9251019","DOIUrl":null,"url":null,"abstract":"The incidence of Crohn’s disease (CD) is rising, which calls for more accurate and less invasive diagnostic tools. The concentration of Faecal Calprotectin (FC) is a reliable indicator of luminal inflammatory processes and can replace invasive and uncomfortable ileocolonoscopies. Studies have confirmed the association of FC levels with the progression of CD and various machine learning approaches have been used for predicting disease progression. In this study, we aimed to comparatively evaluate the performance of established machine learning approaches, to predict the progression of CD, using a range of variables, including FC levels. Our dataset consisted of records for 804 patients with CD and a FC measurement, from a teaching hospital that cares for secondary and tertiary referred patients. We compared the performance of four machine learning approaches, namely logistic regression, support vector machine, random forests and artificial neural networks, to predict the likelihood of a flare up. Our results showed that all four approaches performed strongly, which demonstrates the potential of these approaches, in particular logistic regression, for predicting disease progression. Logistic regression slightly outperformed the others, with an accuracy of 0.90 and an AUC of 0.83. Our dataset had missing data for a number of patients, which resulted in fewer variables being selected for inclusion in the model. Our relatively small sample size could account for SVM, Random Forest and the ANN not demonstrating superior accuracy compared to logistic regression, in this study. In future, an increased number of variables should be included for analysis, the outcome period for a flare up should be explored, and our results should be validated using another independent and large dataset.","PeriodicalId":142867,"journal":{"name":"2020 IEEE Student Conference on Research and Development (SCOReD)","volume":"332 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comparison of Machine Learning Approaches for Predicting the Progression of Crohn’s Disease\",\"authors\":\"Zain U. Hussain, R. Comerford, Fynn Comerford, N. Ng, Dominic Ng, Ateeb Khan, C. Lees, A. Hussain\",\"doi\":\"10.1109/SCOReD50371.2020.9251019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The incidence of Crohn’s disease (CD) is rising, which calls for more accurate and less invasive diagnostic tools. The concentration of Faecal Calprotectin (FC) is a reliable indicator of luminal inflammatory processes and can replace invasive and uncomfortable ileocolonoscopies. Studies have confirmed the association of FC levels with the progression of CD and various machine learning approaches have been used for predicting disease progression. In this study, we aimed to comparatively evaluate the performance of established machine learning approaches, to predict the progression of CD, using a range of variables, including FC levels. Our dataset consisted of records for 804 patients with CD and a FC measurement, from a teaching hospital that cares for secondary and tertiary referred patients. We compared the performance of four machine learning approaches, namely logistic regression, support vector machine, random forests and artificial neural networks, to predict the likelihood of a flare up. Our results showed that all four approaches performed strongly, which demonstrates the potential of these approaches, in particular logistic regression, for predicting disease progression. Logistic regression slightly outperformed the others, with an accuracy of 0.90 and an AUC of 0.83. Our dataset had missing data for a number of patients, which resulted in fewer variables being selected for inclusion in the model. Our relatively small sample size could account for SVM, Random Forest and the ANN not demonstrating superior accuracy compared to logistic regression, in this study. In future, an increased number of variables should be included for analysis, the outcome period for a flare up should be explored, and our results should be validated using another independent and large dataset.\",\"PeriodicalId\":142867,\"journal\":{\"name\":\"2020 IEEE Student Conference on Research and Development (SCOReD)\",\"volume\":\"332 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Student Conference on Research and Development (SCOReD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCOReD50371.2020.9251019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Student Conference on Research and Development (SCOReD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCOReD50371.2020.9251019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Comparison of Machine Learning Approaches for Predicting the Progression of Crohn’s Disease
The incidence of Crohn’s disease (CD) is rising, which calls for more accurate and less invasive diagnostic tools. The concentration of Faecal Calprotectin (FC) is a reliable indicator of luminal inflammatory processes and can replace invasive and uncomfortable ileocolonoscopies. Studies have confirmed the association of FC levels with the progression of CD and various machine learning approaches have been used for predicting disease progression. In this study, we aimed to comparatively evaluate the performance of established machine learning approaches, to predict the progression of CD, using a range of variables, including FC levels. Our dataset consisted of records for 804 patients with CD and a FC measurement, from a teaching hospital that cares for secondary and tertiary referred patients. We compared the performance of four machine learning approaches, namely logistic regression, support vector machine, random forests and artificial neural networks, to predict the likelihood of a flare up. Our results showed that all four approaches performed strongly, which demonstrates the potential of these approaches, in particular logistic regression, for predicting disease progression. Logistic regression slightly outperformed the others, with an accuracy of 0.90 and an AUC of 0.83. Our dataset had missing data for a number of patients, which resulted in fewer variables being selected for inclusion in the model. Our relatively small sample size could account for SVM, Random Forest and the ANN not demonstrating superior accuracy compared to logistic regression, in this study. In future, an increased number of variables should be included for analysis, the outcome period for a flare up should be explored, and our results should be validated using another independent and large dataset.