Tsetlin机器自适应连续特征二值化在菲律宾登革热发病率预测中的应用

2020 IEEE Symposium Series on Computational Intelligence (SSCI) Pub Date : 2020-12-01 DOI:10.1109/SSCI47803.2020.9308291

Kuruge Darshana Abeyrathna, Ole-Christoffer Granmo, Xuan Zhang, Morten Goodwin

{"title":"Tsetlin机器自适应连续特征二值化在菲律宾登革热发病率预测中的应用","authors":"Kuruge Darshana Abeyrathna, Ole-Christoffer Granmo, Xuan Zhang, Morten Goodwin","doi":"10.1109/SSCI47803.2020.9308291","DOIUrl":null,"url":null,"abstract":"The Tsetlin Machine (TM) is a recent interpretable machine learning algorithm that requires relatively modest computational power, yet attains competitive accuracy in several benchmarks. TMs are inherently binary; however, many machine learning problems are continuous. While binarization of continuous data through brute-force thresholding has yielded promising accuracy, such an approach is computationally expensive and hinders extrapolation. In this paper, we address these limitations by standardizing features to support scale shifts in the transition from training data to real-world operation, typical for e.g. forecasting. For scalability, we employ sampling to reduce the number of binarization thresholds, relying on stratification to minimize loss of accuracy. We evaluate the approach empirically using two artificial datasets before we apply the resulting TM to forecast dengue outbreaks in the Philippines using the spatiotemporal properties of the data. Our results show that the loss of accuracy due to threshold sampling is insignificant. Furthermore, the dengue outbreak forecasts made by the TM are more accurate than those obtained by Support Vector Machines (SVMs), Decision Trees (DTs), and several multi-layered Artificial Neural Networks (ANNs), both in terms of forecasting precision and Fl-score.","PeriodicalId":413489,"journal":{"name":"2020 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Adaptive Continuous Feature Binarization for Tsetlin Machines Applied to Forecasting Dengue Incidences in the Philippines\",\"authors\":\"Kuruge Darshana Abeyrathna, Ole-Christoffer Granmo, Xuan Zhang, Morten Goodwin\",\"doi\":\"10.1109/SSCI47803.2020.9308291\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Tsetlin Machine (TM) is a recent interpretable machine learning algorithm that requires relatively modest computational power, yet attains competitive accuracy in several benchmarks. TMs are inherently binary; however, many machine learning problems are continuous. While binarization of continuous data through brute-force thresholding has yielded promising accuracy, such an approach is computationally expensive and hinders extrapolation. In this paper, we address these limitations by standardizing features to support scale shifts in the transition from training data to real-world operation, typical for e.g. forecasting. For scalability, we employ sampling to reduce the number of binarization thresholds, relying on stratification to minimize loss of accuracy. We evaluate the approach empirically using two artificial datasets before we apply the resulting TM to forecast dengue outbreaks in the Philippines using the spatiotemporal properties of the data. Our results show that the loss of accuracy due to threshold sampling is insignificant. Furthermore, the dengue outbreak forecasts made by the TM are more accurate than those obtained by Support Vector Machines (SVMs), Decision Trees (DTs), and several multi-layered Artificial Neural Networks (ANNs), both in terms of forecasting precision and Fl-score.\",\"PeriodicalId\":413489,\"journal\":{\"name\":\"2020 IEEE Symposium Series on Computational Intelligence (SSCI)\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Symposium Series on Computational Intelligence (SSCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSCI47803.2020.9308291\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI47803.2020.9308291","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

Tsetlin Machine (TM)是一种最新的可解释机器学习算法，它需要相对适度的计算能力，但在几个基准测试中获得了具有竞争力的准确性。TMs本质上是二元的;然而，许多机器学习问题是连续的。虽然通过暴力阈值法对连续数据进行二值化已经产生了很好的准确性，但这种方法在计算上很昂贵，并且阻碍了外推。在本文中，我们通过标准化特征来解决这些限制，以支持从训练数据过渡到现实世界操作的规模转移，典型的例子是预测。对于可扩展性，我们采用采样来减少二值化阈值的数量，依靠分层来最小化精度损失。我们利用两个人工数据集对该方法进行了实证评估，然后利用数据的时空特性将所得TM应用于预测菲律宾的登革热疫情。我们的结果表明，由于阈值采样的精度损失是微不足道的。此外，TM预测的登革热疫情在预测精度和fl评分方面均优于支持向量机(svm)、决策树(DTs)和几种多层人工神经网络(ann)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive Continuous Feature Binarization for Tsetlin Machines Applied to Forecasting Dengue Incidences in the Philippines

The Tsetlin Machine (TM) is a recent interpretable machine learning algorithm that requires relatively modest computational power, yet attains competitive accuracy in several benchmarks. TMs are inherently binary; however, many machine learning problems are continuous. While binarization of continuous data through brute-force thresholding has yielded promising accuracy, such an approach is computationally expensive and hinders extrapolation. In this paper, we address these limitations by standardizing features to support scale shifts in the transition from training data to real-world operation, typical for e.g. forecasting. For scalability, we employ sampling to reduce the number of binarization thresholds, relying on stratification to minimize loss of accuracy. We evaluate the approach empirically using two artificial datasets before we apply the resulting TM to forecast dengue outbreaks in the Philippines using the spatiotemporal properties of the data. Our results show that the loss of accuracy due to threshold sampling is insignificant. Furthermore, the dengue outbreak forecasts made by the TM are more accurate than those obtained by Support Vector Machines (SVMs), Decision Trees (DTs), and several multi-layered Artificial Neural Networks (ANNs), both in terms of forecasting precision and Fl-score.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE Symposium Series on Computational Intelligence (SSCI)

自引率

0.00%

发文量