M. Tharmakulasingam, Cihan Topal, Warnakulasuriya Anil Chandana Fernando, R. M. Ragione
{"title":"Improved Pathogen Recognition using Non-Euclidean Distance Metrics andWeighted kNN","authors":"M. Tharmakulasingam, Cihan Topal, Warnakulasuriya Anil Chandana Fernando, R. M. Ragione","doi":"10.1145/3375923.3375956","DOIUrl":null,"url":null,"abstract":"The timely identification of pathogens is vital in order to effectively control diseases and avoid antimicrobial resistance. Non-invasive point-of-care diagnostic tools are recently trending in identification of the pathogens and becoming a helpful tool especially for rural areas. Machine learning approaches have been widely applied on biological markers for predicting diseases and pathogens. However, there are few studies in the literature that have utilized volatile organic compounds (VOCs) as non-invasive biological markers to identify bacterial pathogens. Furthermore, there is no comprehensive study investigating the effect of different distance and similarity metrics for pathogen classification based on VOC data. In this study, we compared various non-Euclidean distance and similarity metrics with Euclidean metric to identify significantly contributing VOCs to predict pathogens. In addition, we also utilized backward feature elimination (BFE) method to accurately select the best set of features. The dataset we utilized for experiments was composed from the publications published between 1977 and 2016, and consisted of associations in between 703 VOCs and 11 pathogens.We performed extensive set of experiments with five different distance metrics in both uniform and weighted manner. Comprehensive experiments showed that it is possible to correctly predict pathogens by using 68 VOCs among 703 with 78.6% accuracy using k-nearest neighbour classifier and Sorensen distance metric.","PeriodicalId":20457,"journal":{"name":"Proceedings of the 2019 6th International Conference on Biomedical and Bioinformatics Engineering","volume":"39 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 6th International Conference on Biomedical and Bioinformatics Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3375923.3375956","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
The timely identification of pathogens is vital in order to effectively control diseases and avoid antimicrobial resistance. Non-invasive point-of-care diagnostic tools are recently trending in identification of the pathogens and becoming a helpful tool especially for rural areas. Machine learning approaches have been widely applied on biological markers for predicting diseases and pathogens. However, there are few studies in the literature that have utilized volatile organic compounds (VOCs) as non-invasive biological markers to identify bacterial pathogens. Furthermore, there is no comprehensive study investigating the effect of different distance and similarity metrics for pathogen classification based on VOC data. In this study, we compared various non-Euclidean distance and similarity metrics with Euclidean metric to identify significantly contributing VOCs to predict pathogens. In addition, we also utilized backward feature elimination (BFE) method to accurately select the best set of features. The dataset we utilized for experiments was composed from the publications published between 1977 and 2016, and consisted of associations in between 703 VOCs and 11 pathogens.We performed extensive set of experiments with five different distance metrics in both uniform and weighted manner. Comprehensive experiments showed that it is possible to correctly predict pathogens by using 68 VOCs among 703 with 78.6% accuracy using k-nearest neighbour classifier and Sorensen distance metric.