{"title":"FIF: A NLP-based Feature Identification Framework for Data Warehouses","authors":"A. Prabhune, Ashish Chouhan","doi":"10.1145/3350546.3352530","DOIUrl":null,"url":null,"abstract":"In a data warehouse, selecting the relevant features is an iterative process that is laborious, time-consuming, and error-prone due to selection bias introduced by either the data expert or the data-analyst. In order to address this challenge, this paper introduces FIF, a Feature Identification Framework that uses Natural Language Processing (NLP) to analyze the hypotheses, identify the relevant feature space and predict the appropriate data mining task and model. The FIF is designed on the principles of microservices architecture pattern, comprising of five core groups of microservices: (a) NLP Pre-processor, (b) Attribute Identifier, (c) Feature Identifier, (d) Topic Modeller, and (e) Data Mining Task Evaluator. Finally, FIF is evaluated with five hypotheses against our data warehouse. CCS CONCEPTS • Information systems → Data warehouses; Wrappers (data mining); Document topic models; Similarity measures; • Computing methodologies → Feature selection; Natural language processing.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"216 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3350546.3352530","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In a data warehouse, selecting the relevant features is an iterative process that is laborious, time-consuming, and error-prone due to selection bias introduced by either the data expert or the data-analyst. In order to address this challenge, this paper introduces FIF, a Feature Identification Framework that uses Natural Language Processing (NLP) to analyze the hypotheses, identify the relevant feature space and predict the appropriate data mining task and model. The FIF is designed on the principles of microservices architecture pattern, comprising of five core groups of microservices: (a) NLP Pre-processor, (b) Attribute Identifier, (c) Feature Identifier, (d) Topic Modeller, and (e) Data Mining Task Evaluator. Finally, FIF is evaluated with five hypotheses against our data warehouse. CCS CONCEPTS • Information systems → Data warehouses; Wrappers (data mining); Document topic models; Similarity measures; • Computing methodologies → Feature selection; Natural language processing.