{"title":"Using Frequent Closed Itemsets for Data Dimensionality Reduction","authors":"Petr Krajča, Jan Outrata, Vilém Vychodil","doi":"10.1109/ICDM.2011.154","DOIUrl":null,"url":null,"abstract":"We address important issues of dimensionality reduction of transactional data sets where the input data consists of lists of transactions, each of them being a finite set of items. The reduction consists in finding a small set of new items, so-called factor-items, which is considerably smaller than the original set of items while comprising full or nearly full information about the original items. Using this type of reduction, the original data set can be represented by a smaller transactional data set using factor-items instead of the original items, thus reducing its dimensionality. The procedure utilized in this paper is based on approximate Boolean matrix decomposition. In this paper, we focus on the role of frequent closed item sets that can be used to determine factor-items. We present the factorization problem, its reduction to Boolean matrix decompositions, experiments with publicly available data sets, and an algorithm for computing decompositions.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 11th International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2011.154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
We address important issues of dimensionality reduction of transactional data sets where the input data consists of lists of transactions, each of them being a finite set of items. The reduction consists in finding a small set of new items, so-called factor-items, which is considerably smaller than the original set of items while comprising full or nearly full information about the original items. Using this type of reduction, the original data set can be represented by a smaller transactional data set using factor-items instead of the original items, thus reducing its dimensionality. The procedure utilized in this paper is based on approximate Boolean matrix decomposition. In this paper, we focus on the role of frequent closed item sets that can be used to determine factor-items. We present the factorization problem, its reduction to Boolean matrix decompositions, experiments with publicly available data sets, and an algorithm for computing decompositions.