Double-Constrained Consensus Clustering with Application to Online Anti-Counterfeiting

IF 2.5 4区 综合性期刊 Q2 CHEMISTRY, MULTIDISCIPLINARY
Claudio Carpineto, Giovanni Romano
{"title":"Double-Constrained Consensus Clustering with Application to Online Anti-Counterfeiting","authors":"Claudio Carpineto, Giovanni Romano","doi":"10.3390/app131810050","DOIUrl":null,"url":null,"abstract":"Semi-supervised consensus clustering is a promising strategy to compensate for the subjectivity of clustering and its sensitivity to design factors, with various techniques being recently proposed to integrate domain knowledge and multiple clustering partitions. In this article, we present a new approach that makes double use of domain knowledge, namely to build the initial partitions, as well as to combine them. In particular, we show how to model and integrate must-link and cannot-link constraints into the objective function of a generic consensus clustering (CC) framework that maximizes the similarity between the consensus partition and the input partitions, which have, in turn, been enriched with the same constraints. In addition, borrowing from the theory of functional dependencies, the integrated framework exploits the notions of deductive closure and minimal cover to take full advantage of the logical implication between constraints. Using standard UCI benchmarks, we found that the resulting algorithm, termed CCC double-constrained consensus clustering), was more effective than plain CC at combining base-constrained partitions, with an average performance improvement of 5.54%. We then argue that CCC is especially well-suited for profiling counterfeit e-commerce websites, as constraints can be acquired by leveraging specific domain features, and demonstrate its potential for detecting affiliate marketing programs. Taken together, our experiments suggest that CCC makes the process of clustering more robust and able to withstand changes in clustering algorithms, datasets, and features, with a remarkable improvement in average performance.","PeriodicalId":48760,"journal":{"name":"Applied Sciences-Basel","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Sciences-Basel","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.3390/app131810050","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Semi-supervised consensus clustering is a promising strategy to compensate for the subjectivity of clustering and its sensitivity to design factors, with various techniques being recently proposed to integrate domain knowledge and multiple clustering partitions. In this article, we present a new approach that makes double use of domain knowledge, namely to build the initial partitions, as well as to combine them. In particular, we show how to model and integrate must-link and cannot-link constraints into the objective function of a generic consensus clustering (CC) framework that maximizes the similarity between the consensus partition and the input partitions, which have, in turn, been enriched with the same constraints. In addition, borrowing from the theory of functional dependencies, the integrated framework exploits the notions of deductive closure and minimal cover to take full advantage of the logical implication between constraints. Using standard UCI benchmarks, we found that the resulting algorithm, termed CCC double-constrained consensus clustering), was more effective than plain CC at combining base-constrained partitions, with an average performance improvement of 5.54%. We then argue that CCC is especially well-suited for profiling counterfeit e-commerce websites, as constraints can be acquired by leveraging specific domain features, and demonstrate its potential for detecting affiliate marketing programs. Taken together, our experiments suggest that CCC makes the process of clustering more robust and able to withstand changes in clustering algorithms, datasets, and features, with a remarkable improvement in average performance.
双约束一致性聚类及其在网上防伪中的应用
半监督一致性聚类是一种很有前途的策略,可以补偿聚类的主观性及其对设计因素的敏感性,最近提出了各种技术来集成领域知识和多个聚类分区。在本文中,我们提出了一种双重利用领域知识的新方法,即构建初始分区,以及将它们组合在一起。特别是,我们展示了如何将必须链接和不能链接的约束建模和集成到通用一致性聚类(CC)框架的目标函数中,该框架最大限度地提高了一致性分区和输入分区之间的相似性,而输入分区又被相同的约束所丰富。此外,借用函数依赖理论,集成框架利用演绎闭包和最小覆盖的概念,充分利用约束之间的逻辑含义。使用标准的UCI基准,我们发现所得到的算法(称为CCC双约束一致性聚类)在组合基本约束分区方面比普通CC更有效,平均性能提高了5.54%。然后我们认为CCC特别适合分析假冒电子商务网站,因为可以通过利用特定的领域功能来获得限制,并展示其检测联盟营销计划的潜力。总之,我们的实验表明,CCC使聚类过程更加稳健,能够承受聚类算法、数据集和特征的变化,平均性能显著提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Applied Sciences-Basel
Applied Sciences-Basel CHEMISTRY, MULTIDISCIPLINARYMATERIALS SCIE-MATERIALS SCIENCE, MULTIDISCIPLINARY
CiteScore
5.30
自引率
11.10%
发文量
10882
期刊介绍: Applied Sciences (ISSN 2076-3417) provides an advanced forum on all aspects of applied natural sciences. It publishes reviews, research papers and communications. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. Electronic files and software regarding the full details of the calculation or experimental procedure, if unable to be published in a normal way, can be deposited as supplementary electronic material.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信