A novel decomposition algorithm for binary datatables: Encouraging results on discrimination tasks

M. Cadot, A. Lelu
{"title":"A novel decomposition algorithm for binary datatables: Encouraging results on discrimination tasks","authors":"M. Cadot, A. Lelu","doi":"10.1109/RCIS.2010.5507364","DOIUrl":null,"url":null,"abstract":"We present here an algorithm for decomposing any binary datatable into a set of “sufficient itemsets”, i.e. a non-redundant list of itemsets adequate for reconstructing the whole table up to a permutation of the rows. For doing so, we have replaced the “support” threshold criterion of the well-known Apriori algorithm by a “number of liberties”: the liberty count expresses how a (k+1)-level itemset is constrained by its k-level “parents”, till the level when the situation turns frozen. Our algorithm is symmetric: we take into account the absence of items as well as their presence in our itemsets. Conversely, we present a method for reconstituting the original data starting from our exact MIDOVA representation. We illustrate these points with the examples of Breast Cancer and Mushroom datasets from UCI Repository. We validate our approach by deriving a learning classifier approach and applying it to three discrimination problems drawn from the above-mentioned repository.","PeriodicalId":333366,"journal":{"name":"2010 Fourth International Conference on Research Challenges in Information Science (RCIS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Fourth International Conference on Research Challenges in Information Science (RCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RCIS.2010.5507364","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

We present here an algorithm for decomposing any binary datatable into a set of “sufficient itemsets”, i.e. a non-redundant list of itemsets adequate for reconstructing the whole table up to a permutation of the rows. For doing so, we have replaced the “support” threshold criterion of the well-known Apriori algorithm by a “number of liberties”: the liberty count expresses how a (k+1)-level itemset is constrained by its k-level “parents”, till the level when the situation turns frozen. Our algorithm is symmetric: we take into account the absence of items as well as their presence in our itemsets. Conversely, we present a method for reconstituting the original data starting from our exact MIDOVA representation. We illustrate these points with the examples of Breast Cancer and Mushroom datasets from UCI Repository. We validate our approach by deriving a learning classifier approach and applying it to three discrimination problems drawn from the above-mentioned repository.
一种新的二值数据分解算法:在判别任务上的令人鼓舞的结果
我们在这里提出了一种算法,用于将任何二进制数据表分解为一组“充分的项目集”,即一个非冗余的项目集列表,足以重构整个表直至行的排列。为此,我们用“自由数”取代了著名的Apriori算法的“支持”阈值标准:自由数表示(k+1)级项目集如何受到其k级“父级”的约束,直到情况冻结的水平。我们的算法是对称的:我们既考虑了项目集中不存在的项目,也考虑了项目集中存在的项目。相反,我们提出了一种从我们确切的MIDOVA表示开始重构原始数据的方法。我们用UCI知识库中的乳腺癌和蘑菇数据集的例子来说明这些观点。我们通过推导一种学习分类器方法来验证我们的方法,并将其应用于从上述存储库中提取的三个识别问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信