{"title":"Time Series Database Preprocessing for Data Mining Using Python","authors":"Hussein Farooq Tayeb, M. Karabatak, C. Varol","doi":"10.1109/ISDFS49300.2020.9116260","DOIUrl":null,"url":null,"abstract":"Data mining is an important method that we use for extracting meaningful information from data. Data preprocessing lays the groundwork for data mining yet most researchers unfortunately, ignore it. Before getting to the data mining stage, the target data set must be properly prepared. This paper describes steps followed for time series data preprocessing for data mining processes. The data that was used in the study is that of the minimum daily temperatures over 10 years (1981–1990) in the city of Melbourne, Australia. Python programming language is used to read the data and decompose it into trend, seasonality, and residue components. These components were plot and analyzed by removing the trend and seasonality to make the series stationary. Dicky Fuller’s stationary test was done on the data. The test statistics results show that Dicky Fuller’s null hypothesis can be rejected and the data is stationary. Hence, ready for the next step of data mining modeling processes.","PeriodicalId":221494,"journal":{"name":"2020 8th International Symposium on Digital Forensics and Security (ISDFS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 8th International Symposium on Digital Forensics and Security (ISDFS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDFS49300.2020.9116260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Data mining is an important method that we use for extracting meaningful information from data. Data preprocessing lays the groundwork for data mining yet most researchers unfortunately, ignore it. Before getting to the data mining stage, the target data set must be properly prepared. This paper describes steps followed for time series data preprocessing for data mining processes. The data that was used in the study is that of the minimum daily temperatures over 10 years (1981–1990) in the city of Melbourne, Australia. Python programming language is used to read the data and decompose it into trend, seasonality, and residue components. These components were plot and analyzed by removing the trend and seasonality to make the series stationary. Dicky Fuller’s stationary test was done on the data. The test statistics results show that Dicky Fuller’s null hypothesis can be rejected and the data is stationary. Hence, ready for the next step of data mining modeling processes.