{"title":"Mining association algorithm with improved threshold based on ROC analysis","authors":"M. Kawahara, H. Kawano","doi":"10.1109/PACRIM.2001.953729","DOIUrl":null,"url":null,"abstract":"The mining association algorithm is one of the most popular data mining algorithms to derive association rules at high speed from huge databases. We have been developing navigation systems for semi-structured data like as Web data and bibliographic data. To navigate beginners, our systems give the association rules derived by the algorithm. However, the algorithm tends to derive those rules that contain noises such as stopwords then many systems use noise filters to remove such noises. In order to remove the noises automatically and derive more effective rules, we proposed an algorithm using the true positive rate and the false positive rate of derived rules in a database based on the ROC analysis. In this paper, we make corrections in the parameters to improve the extended mining association algorithm. Moreover, we evaluate the performance of our proposed algorithm using a experimental database and show how our proposed algorithm can derive effective association rules. We also show that our proposed algorithms can remove stopwords automatically from raw data.","PeriodicalId":261724,"journal":{"name":"2001 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (IEEE Cat. No.01CH37233)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2001 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (IEEE Cat. No.01CH37233)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACRIM.2001.953729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The mining association algorithm is one of the most popular data mining algorithms to derive association rules at high speed from huge databases. We have been developing navigation systems for semi-structured data like as Web data and bibliographic data. To navigate beginners, our systems give the association rules derived by the algorithm. However, the algorithm tends to derive those rules that contain noises such as stopwords then many systems use noise filters to remove such noises. In order to remove the noises automatically and derive more effective rules, we proposed an algorithm using the true positive rate and the false positive rate of derived rules in a database based on the ROC analysis. In this paper, we make corrections in the parameters to improve the extended mining association algorithm. Moreover, we evaluate the performance of our proposed algorithm using a experimental database and show how our proposed algorithm can derive effective association rules. We also show that our proposed algorithms can remove stopwords automatically from raw data.