{"title":"Scalable stream Bayes classification based on Dirichlet prior","authors":"O. Bina, Yuan Yanhua","doi":"10.1109/PIC.2017.8359593","DOIUrl":null,"url":null,"abstract":"Learning from fast data stream is one of the most challenging tasks in data stream mining. The fact that data streams are unbounded sequences, highlights exclusive challenges in contrast to classifiers from batch data. Most of methods aren't naturally parallel and thus their scalability is limited. This paper proposes a scalable data stream Bayes classifier utilizing a new estimation(DIB). The new estimation takes conjugate Dirichlet prior as parameter's prior distribution and thus improves the predictive accuracy. Meanwhile, this paper proposes a new distributed implementation of DIB on Flink. Experiments show that DIB classifier significantly outperforms Naïve Bayes in terms of accuracy. Also, the experiment proves parallel DIB running on Flink enhances the throughput and reduces execution time.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Progress in Informatics and Computing (PIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PIC.2017.8359593","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Learning from fast data stream is one of the most challenging tasks in data stream mining. The fact that data streams are unbounded sequences, highlights exclusive challenges in contrast to classifiers from batch data. Most of methods aren't naturally parallel and thus their scalability is limited. This paper proposes a scalable data stream Bayes classifier utilizing a new estimation(DIB). The new estimation takes conjugate Dirichlet prior as parameter's prior distribution and thus improves the predictive accuracy. Meanwhile, this paper proposes a new distributed implementation of DIB on Flink. Experiments show that DIB classifier significantly outperforms Naïve Bayes in terms of accuracy. Also, the experiment proves parallel DIB running on Flink enhances the throughput and reduces execution time.