{"title":"从高维数据集中挖掘的重要频繁巨项集的关联分析","authors":"Manjunath K. Vanahalli, Nagamma Patil","doi":"10.1109/UPCON.2016.7894662","DOIUrl":null,"url":null,"abstract":"Bioinformatics has contributed to a different form of datasets called as high dimensional datasets. The high dimensional datasets are characterized by a large number of features and a small number of samples. The traditional algorithms expend most of the running time in mining large number of small and mid-size items which does not enclose valuable and significant information. The recent research focused on mining large cardinality itemsets called as colossal itemsets which are significant to many applications, especially in the field of bioinformatics. The existing frequent colossal itemset mining algorithms are unsuccessful in discovering complete set of significant frequent colossal itemsets. The mined colossal itemsets from existing algorithms provide erroneous support information which affects association analysis. Mining significant frequent colossal itemsets with accurate support information helps in attaining a high-level accuracy of association analysis. The proposed work highlights a novel pre-processing technique and bottom-up row enumeration algorithm to mine significant frequent colossal itemsets with accurate support information. A novel pre-processing technique efficiently utilizes minimum support threshold and minimum cardinality threshold to prune irrelevant samples and features. The experiment results demonstrate that the proposed algorithm has high accuracy over existing algorithms. Performance study indicates the efficiency of the pre-processing technique.","PeriodicalId":151809,"journal":{"name":"2016 IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics Engineering (UPCON)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Association analysis of significant frequent colossal itemsets mined from high dimensional datasets\",\"authors\":\"Manjunath K. Vanahalli, Nagamma Patil\",\"doi\":\"10.1109/UPCON.2016.7894662\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bioinformatics has contributed to a different form of datasets called as high dimensional datasets. The high dimensional datasets are characterized by a large number of features and a small number of samples. The traditional algorithms expend most of the running time in mining large number of small and mid-size items which does not enclose valuable and significant information. The recent research focused on mining large cardinality itemsets called as colossal itemsets which are significant to many applications, especially in the field of bioinformatics. The existing frequent colossal itemset mining algorithms are unsuccessful in discovering complete set of significant frequent colossal itemsets. The mined colossal itemsets from existing algorithms provide erroneous support information which affects association analysis. Mining significant frequent colossal itemsets with accurate support information helps in attaining a high-level accuracy of association analysis. The proposed work highlights a novel pre-processing technique and bottom-up row enumeration algorithm to mine significant frequent colossal itemsets with accurate support information. A novel pre-processing technique efficiently utilizes minimum support threshold and minimum cardinality threshold to prune irrelevant samples and features. The experiment results demonstrate that the proposed algorithm has high accuracy over existing algorithms. Performance study indicates the efficiency of the pre-processing technique.\",\"PeriodicalId\":151809,\"journal\":{\"name\":\"2016 IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics Engineering (UPCON)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics Engineering (UPCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UPCON.2016.7894662\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics Engineering (UPCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UPCON.2016.7894662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Association analysis of significant frequent colossal itemsets mined from high dimensional datasets
Bioinformatics has contributed to a different form of datasets called as high dimensional datasets. The high dimensional datasets are characterized by a large number of features and a small number of samples. The traditional algorithms expend most of the running time in mining large number of small and mid-size items which does not enclose valuable and significant information. The recent research focused on mining large cardinality itemsets called as colossal itemsets which are significant to many applications, especially in the field of bioinformatics. The existing frequent colossal itemset mining algorithms are unsuccessful in discovering complete set of significant frequent colossal itemsets. The mined colossal itemsets from existing algorithms provide erroneous support information which affects association analysis. Mining significant frequent colossal itemsets with accurate support information helps in attaining a high-level accuracy of association analysis. The proposed work highlights a novel pre-processing technique and bottom-up row enumeration algorithm to mine significant frequent colossal itemsets with accurate support information. A novel pre-processing technique efficiently utilizes minimum support threshold and minimum cardinality threshold to prune irrelevant samples and features. The experiment results demonstrate that the proposed algorithm has high accuracy over existing algorithms. Performance study indicates the efficiency of the pre-processing technique.