{"title":"关联规则挖掘:抗倾斜算法","authors":"Jun-Lin Lin, M. Dunham","doi":"10.1109/ICDE.1998.655811","DOIUrl":null,"url":null,"abstract":"Mining association rules among items in a large database has been recognized as one of the most important data mining problems. All proposed approaches for this problem require scanning the entire database at least or almost twice in the worst case. We propose several techniques which overcome the problem of data skew in the basket data. These techniques reduce the maximum number of scans to less than 2, and in most cases find all association rules in about 1 scan. Our algorithms employ prior knowledge collected during the mining process and/or via sampling, to further reduce the number of candidate itemsets and identify false candidate itemsets at an earlier stage.","PeriodicalId":264926,"journal":{"name":"Proceedings 14th International Conference on Data Engineering","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"98","resultStr":"{\"title\":\"Mining association rules: anti-skew algorithms\",\"authors\":\"Jun-Lin Lin, M. Dunham\",\"doi\":\"10.1109/ICDE.1998.655811\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mining association rules among items in a large database has been recognized as one of the most important data mining problems. All proposed approaches for this problem require scanning the entire database at least or almost twice in the worst case. We propose several techniques which overcome the problem of data skew in the basket data. These techniques reduce the maximum number of scans to less than 2, and in most cases find all association rules in about 1 scan. Our algorithms employ prior knowledge collected during the mining process and/or via sampling, to further reduce the number of candidate itemsets and identify false candidate itemsets at an earlier stage.\",\"PeriodicalId\":264926,\"journal\":{\"name\":\"Proceedings 14th International Conference on Data Engineering\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-02-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"98\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 14th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.1998.655811\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 14th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.1998.655811","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Mining association rules among items in a large database has been recognized as one of the most important data mining problems. All proposed approaches for this problem require scanning the entire database at least or almost twice in the worst case. We propose several techniques which overcome the problem of data skew in the basket data. These techniques reduce the maximum number of scans to less than 2, and in most cases find all association rules in about 1 scan. Our algorithms employ prior knowledge collected during the mining process and/or via sampling, to further reduce the number of candidate itemsets and identify false candidate itemsets at an earlier stage.