{"title":"在Apache Spark流和批处理环境下的可扩展时间序列分类","authors":"Apostolos Glenis","doi":"10.1109/IISA50023.2020.9284415","DOIUrl":null,"url":null,"abstract":"Time series classification is an important problem since data from sensors become more prevalent over time. In addition most of the data arrive in the form of a stream and thus have to be handled with the limitation that apply to streaming environments (low latency,low memory footprint). In this paper we address the problem of scalable time series classification on both Batch and Streaming environments. More specifically we implemented two state-of-the-art time series classification on top of Apache Spark and we adapted one of them for streaming applications. We evaluated our algorithms against two open datasets on a 10-node cluster. The algorithms we implemented scaled gracefully both in the batch and streaming environment.","PeriodicalId":109238,"journal":{"name":"2020 11th International Conference on Information, Intelligence, Systems and Applications (IISA","volume":"158 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scalable Time Series Classification in streaming and batch environments on Apache Spark\",\"authors\":\"Apostolos Glenis\",\"doi\":\"10.1109/IISA50023.2020.9284415\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Time series classification is an important problem since data from sensors become more prevalent over time. In addition most of the data arrive in the form of a stream and thus have to be handled with the limitation that apply to streaming environments (low latency,low memory footprint). In this paper we address the problem of scalable time series classification on both Batch and Streaming environments. More specifically we implemented two state-of-the-art time series classification on top of Apache Spark and we adapted one of them for streaming applications. We evaluated our algorithms against two open datasets on a 10-node cluster. The algorithms we implemented scaled gracefully both in the batch and streaming environment.\",\"PeriodicalId\":109238,\"journal\":{\"name\":\"2020 11th International Conference on Information, Intelligence, Systems and Applications (IISA\",\"volume\":\"158 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 11th International Conference on Information, Intelligence, Systems and Applications (IISA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISA50023.2020.9284415\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 11th International Conference on Information, Intelligence, Systems and Applications (IISA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISA50023.2020.9284415","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Scalable Time Series Classification in streaming and batch environments on Apache Spark
Time series classification is an important problem since data from sensors become more prevalent over time. In addition most of the data arrive in the form of a stream and thus have to be handled with the limitation that apply to streaming environments (low latency,low memory footprint). In this paper we address the problem of scalable time series classification on both Batch and Streaming environments. More specifically we implemented two state-of-the-art time series classification on top of Apache Spark and we adapted one of them for streaming applications. We evaluated our algorithms against two open datasets on a 10-node cluster. The algorithms we implemented scaled gracefully both in the batch and streaming environment.