{"title":"An improved symbolic aggregate approximation distance measure based on its statistical features","authors":"Chaw Thet Zan, H. Yamana","doi":"10.1145/3011141.3011146","DOIUrl":null,"url":null,"abstract":"The challenges in efficient data representation and similarity measures on massive amounts of time series have enormous impact on many applications. This paper addresses an improvement on Symbolic Aggregate approXimation (SAX), is one of the efficient representations for time series mining. Because SAX represents its symbols by the average (mean) value of a segment with the assumption of Gaussian distribution, it is insufficient to serve the entire deterministic information and causes sometimes incorrect results in time series classification. In this work, SAX representation and distance measure is improved with the addition of another moment of the prior distribution, standard deviation; SAX_SD is proposed. We provide comprehensive analysis for the proposed SAX_SD and confirm both the highest classification accuracy and the highest dimensionality reduction ratio on University of California, Riverside (UCR) datasets in comparison to state of the art methods such as SAX, Extended SAX (ESAX) and SAX Trend Distance (SAX_TD).","PeriodicalId":247823,"journal":{"name":"Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3011141.3011146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35
Abstract
The challenges in efficient data representation and similarity measures on massive amounts of time series have enormous impact on many applications. This paper addresses an improvement on Symbolic Aggregate approXimation (SAX), is one of the efficient representations for time series mining. Because SAX represents its symbols by the average (mean) value of a segment with the assumption of Gaussian distribution, it is insufficient to serve the entire deterministic information and causes sometimes incorrect results in time series classification. In this work, SAX representation and distance measure is improved with the addition of another moment of the prior distribution, standard deviation; SAX_SD is proposed. We provide comprehensive analysis for the proposed SAX_SD and confirm both the highest classification accuracy and the highest dimensionality reduction ratio on University of California, Riverside (UCR) datasets in comparison to state of the art methods such as SAX, Extended SAX (ESAX) and SAX Trend Distance (SAX_TD).