Mining association algorithm with improved threshold based on ROC analysis

M. Kawahara, H. Kawano
{"title":"Mining association algorithm with improved threshold based on ROC analysis","authors":"M. Kawahara, H. Kawano","doi":"10.1109/PACRIM.2001.953729","DOIUrl":null,"url":null,"abstract":"The mining association algorithm is one of the most popular data mining algorithms to derive association rules at high speed from huge databases. We have been developing navigation systems for semi-structured data like as Web data and bibliographic data. To navigate beginners, our systems give the association rules derived by the algorithm. However, the algorithm tends to derive those rules that contain noises such as stopwords then many systems use noise filters to remove such noises. In order to remove the noises automatically and derive more effective rules, we proposed an algorithm using the true positive rate and the false positive rate of derived rules in a database based on the ROC analysis. In this paper, we make corrections in the parameters to improve the extended mining association algorithm. Moreover, we evaluate the performance of our proposed algorithm using a experimental database and show how our proposed algorithm can derive effective association rules. We also show that our proposed algorithms can remove stopwords automatically from raw data.","PeriodicalId":261724,"journal":{"name":"2001 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (IEEE Cat. No.01CH37233)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2001 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (IEEE Cat. No.01CH37233)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACRIM.2001.953729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

The mining association algorithm is one of the most popular data mining algorithms to derive association rules at high speed from huge databases. We have been developing navigation systems for semi-structured data like as Web data and bibliographic data. To navigate beginners, our systems give the association rules derived by the algorithm. However, the algorithm tends to derive those rules that contain noises such as stopwords then many systems use noise filters to remove such noises. In order to remove the noises automatically and derive more effective rules, we proposed an algorithm using the true positive rate and the false positive rate of derived rules in a database based on the ROC analysis. In this paper, we make corrections in the parameters to improve the extended mining association algorithm. Moreover, we evaluate the performance of our proposed algorithm using a experimental database and show how our proposed algorithm can derive effective association rules. We also show that our proposed algorithms can remove stopwords automatically from raw data.
基于ROC分析的改进阈值挖掘关联算法
挖掘关联算法是目前最流行的一种从海量数据库中高速导出关联规则的数据挖掘算法。我们一直在为Web数据和书目数据等半结构化数据开发导航系统。为了给初学者导航,我们的系统给出了由算法派生的关联规则。然而,该算法倾向于推导出那些包含停止词等噪声的规则,因此许多系统使用噪声滤波器来去除这些噪声。为了自动去除噪声并推导出更有效的规则,我们提出了一种基于ROC分析的数据库中导出规则的真阳性率和假阳性率的算法。在本文中,我们对参数进行了修正,以改进扩展挖掘关联算法。此外,我们使用实验数据库评估了我们提出的算法的性能,并展示了我们提出的算法如何推导出有效的关联规则。我们还证明了我们提出的算法可以自动从原始数据中删除停止词。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信