Association analysis of significant frequent colossal itemsets mined from high dimensional datasets

2016 IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics Engineering (UPCON) Pub Date : 1900-01-01 DOI:10.1109/UPCON.2016.7894662

Manjunath K. Vanahalli, Nagamma Patil

{"title":"Association analysis of significant frequent colossal itemsets mined from high dimensional datasets","authors":"Manjunath K. Vanahalli, Nagamma Patil","doi":"10.1109/UPCON.2016.7894662","DOIUrl":null,"url":null,"abstract":"Bioinformatics has contributed to a different form of datasets called as high dimensional datasets. The high dimensional datasets are characterized by a large number of features and a small number of samples. The traditional algorithms expend most of the running time in mining large number of small and mid-size items which does not enclose valuable and significant information. The recent research focused on mining large cardinality itemsets called as colossal itemsets which are significant to many applications, especially in the field of bioinformatics. The existing frequent colossal itemset mining algorithms are unsuccessful in discovering complete set of significant frequent colossal itemsets. The mined colossal itemsets from existing algorithms provide erroneous support information which affects association analysis. Mining significant frequent colossal itemsets with accurate support information helps in attaining a high-level accuracy of association analysis. The proposed work highlights a novel pre-processing technique and bottom-up row enumeration algorithm to mine significant frequent colossal itemsets with accurate support information. A novel pre-processing technique efficiently utilizes minimum support threshold and minimum cardinality threshold to prune irrelevant samples and features. The experiment results demonstrate that the proposed algorithm has high accuracy over existing algorithms. Performance study indicates the efficiency of the pre-processing technique.","PeriodicalId":151809,"journal":{"name":"2016 IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics Engineering (UPCON)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics Engineering (UPCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UPCON.2016.7894662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Bioinformatics has contributed to a different form of datasets called as high dimensional datasets. The high dimensional datasets are characterized by a large number of features and a small number of samples. The traditional algorithms expend most of the running time in mining large number of small and mid-size items which does not enclose valuable and significant information. The recent research focused on mining large cardinality itemsets called as colossal itemsets which are significant to many applications, especially in the field of bioinformatics. The existing frequent colossal itemset mining algorithms are unsuccessful in discovering complete set of significant frequent colossal itemsets. The mined colossal itemsets from existing algorithms provide erroneous support information which affects association analysis. Mining significant frequent colossal itemsets with accurate support information helps in attaining a high-level accuracy of association analysis. The proposed work highlights a novel pre-processing technique and bottom-up row enumeration algorithm to mine significant frequent colossal itemsets with accurate support information. A novel pre-processing technique efficiently utilizes minimum support threshold and minimum cardinality threshold to prune irrelevant samples and features. The experiment results demonstrate that the proposed algorithm has high accuracy over existing algorithms. Performance study indicates the efficiency of the pre-processing technique.

查看原文本刊更多论文

从高维数据集中挖掘的重要频繁巨项集的关联分析

生物信息学促成了一种不同形式的数据集，称为高维数据集。高维数据集具有特征数量多、样本数量少的特点。传统的算法将大部分运行时间耗费在挖掘大量的中小型项目上，这些项目没有包含有价值和有意义的信息。近年来的研究主要集中在挖掘大型基数项目集，即巨型项目集，这对许多应用，特别是在生物信息学领域具有重要意义。现有的频繁巨项集挖掘算法无法发现有效频繁巨项集的完备集。从现有算法中挖掘的巨项集提供了错误的支持信息，影响了关联分析。利用准确的支持信息挖掘重要的频繁的巨项集有助于获得高精确度的关联分析。提出了一种新的预处理技术和自底向上的行枚举算法来挖掘具有准确支持信息的重要频繁巨项集。一种新的预处理技术，利用最小支持度阈值和最小基数阈值对不相关的样本和特征进行修剪。实验结果表明，该算法比现有算法具有更高的精度。性能研究表明了预处理技术的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics Engineering (UPCON)

自引率

0.00%

发文量