{"title":"Boolean Logical Operator Driven Selective Data Filtering For Large Datasets","authors":"G. Davidson, S. Majumdar","doi":"10.23919/ANNSIM55834.2022.9859309","DOIUrl":null,"url":null,"abstract":"Specific users of a system processing large data sets are often interested in only a small subset of the large volumes of available data. This paper presents research on a parallel processing based data filtering technique that filters out and stores only the subset of data that is of interest to a given user. A user’s preferences reflecting her/his interest are captured in a set of keywords or phrases which may be used in conjunction with Boolean operators. An Apache Spark based prototype is built and deployed on an Amazon EC2 cloud to demonstrate the viability of the approach and to analyze the performance of the proposed technique.","PeriodicalId":374469,"journal":{"name":"2022 Annual Modeling and Simulation Conference (ANNSIM)","volume":"31 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Annual Modeling and Simulation Conference (ANNSIM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ANNSIM55834.2022.9859309","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Specific users of a system processing large data sets are often interested in only a small subset of the large volumes of available data. This paper presents research on a parallel processing based data filtering technique that filters out and stores only the subset of data that is of interest to a given user. A user’s preferences reflecting her/his interest are captured in a set of keywords or phrases which may be used in conjunction with Boolean operators. An Apache Spark based prototype is built and deployed on an Amazon EC2 cloud to demonstrate the viability of the approach and to analyze the performance of the proposed technique.