Kuruge Darshana Abeyrathna, Ole-Christoffer Granmo, Xuan Zhang, Morten Goodwin
{"title":"Tsetlin机器自适应连续特征二值化在菲律宾登革热发病率预测中的应用","authors":"Kuruge Darshana Abeyrathna, Ole-Christoffer Granmo, Xuan Zhang, Morten Goodwin","doi":"10.1109/SSCI47803.2020.9308291","DOIUrl":null,"url":null,"abstract":"The Tsetlin Machine (TM) is a recent interpretable machine learning algorithm that requires relatively modest computational power, yet attains competitive accuracy in several benchmarks. TMs are inherently binary; however, many machine learning problems are continuous. While binarization of continuous data through brute-force thresholding has yielded promising accuracy, such an approach is computationally expensive and hinders extrapolation. In this paper, we address these limitations by standardizing features to support scale shifts in the transition from training data to real-world operation, typical for e.g. forecasting. For scalability, we employ sampling to reduce the number of binarization thresholds, relying on stratification to minimize loss of accuracy. We evaluate the approach empirically using two artificial datasets before we apply the resulting TM to forecast dengue outbreaks in the Philippines using the spatiotemporal properties of the data. Our results show that the loss of accuracy due to threshold sampling is insignificant. Furthermore, the dengue outbreak forecasts made by the TM are more accurate than those obtained by Support Vector Machines (SVMs), Decision Trees (DTs), and several multi-layered Artificial Neural Networks (ANNs), both in terms of forecasting precision and Fl-score.","PeriodicalId":413489,"journal":{"name":"2020 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Adaptive Continuous Feature Binarization for Tsetlin Machines Applied to Forecasting Dengue Incidences in the Philippines\",\"authors\":\"Kuruge Darshana Abeyrathna, Ole-Christoffer Granmo, Xuan Zhang, Morten Goodwin\",\"doi\":\"10.1109/SSCI47803.2020.9308291\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Tsetlin Machine (TM) is a recent interpretable machine learning algorithm that requires relatively modest computational power, yet attains competitive accuracy in several benchmarks. TMs are inherently binary; however, many machine learning problems are continuous. While binarization of continuous data through brute-force thresholding has yielded promising accuracy, such an approach is computationally expensive and hinders extrapolation. In this paper, we address these limitations by standardizing features to support scale shifts in the transition from training data to real-world operation, typical for e.g. forecasting. For scalability, we employ sampling to reduce the number of binarization thresholds, relying on stratification to minimize loss of accuracy. We evaluate the approach empirically using two artificial datasets before we apply the resulting TM to forecast dengue outbreaks in the Philippines using the spatiotemporal properties of the data. Our results show that the loss of accuracy due to threshold sampling is insignificant. Furthermore, the dengue outbreak forecasts made by the TM are more accurate than those obtained by Support Vector Machines (SVMs), Decision Trees (DTs), and several multi-layered Artificial Neural Networks (ANNs), both in terms of forecasting precision and Fl-score.\",\"PeriodicalId\":413489,\"journal\":{\"name\":\"2020 IEEE Symposium Series on Computational Intelligence (SSCI)\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Symposium Series on Computational Intelligence (SSCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSCI47803.2020.9308291\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI47803.2020.9308291","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Adaptive Continuous Feature Binarization for Tsetlin Machines Applied to Forecasting Dengue Incidences in the Philippines
The Tsetlin Machine (TM) is a recent interpretable machine learning algorithm that requires relatively modest computational power, yet attains competitive accuracy in several benchmarks. TMs are inherently binary; however, many machine learning problems are continuous. While binarization of continuous data through brute-force thresholding has yielded promising accuracy, such an approach is computationally expensive and hinders extrapolation. In this paper, we address these limitations by standardizing features to support scale shifts in the transition from training data to real-world operation, typical for e.g. forecasting. For scalability, we employ sampling to reduce the number of binarization thresholds, relying on stratification to minimize loss of accuracy. We evaluate the approach empirically using two artificial datasets before we apply the resulting TM to forecast dengue outbreaks in the Philippines using the spatiotemporal properties of the data. Our results show that the loss of accuracy due to threshold sampling is insignificant. Furthermore, the dengue outbreak forecasts made by the TM are more accurate than those obtained by Support Vector Machines (SVMs), Decision Trees (DTs), and several multi-layered Artificial Neural Networks (ANNs), both in terms of forecasting precision and Fl-score.