寻找可规则解释的非负数据表示

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-02-13 DOI:10.1109/TKDE.2025.3538327

Matej Mihelčić;Pauli Miettinen

{"title":"寻找可规则解释的非负数据表示","authors":"Matej Mihelčić;Pauli Miettinen","doi":"10.1109/TKDE.2025.3538327","DOIUrl":null,"url":null,"abstract":"Non-negative Matrix Factorization (NMF) is an intensively used technique for obtaining parts-based, lower dimensional and non-negative representation. Researchers in biology, medicine, pharmacy and other fields often prefer NMF over other dimensionality reduction approaches (such as PCA) because the non-negativity of the approach naturally fits the characteristics of the domain problem and its results are easier to analyze and understand. Despite these advantages, obtaining exact characterization and interpretation of the NMF’s latent factors can still be difficult due to their numerical nature. Rule-based approaches, such as rule mining, conceptual clustering, subgroup discovery and redescription mining, are often considered more interpretable but lack lower-dimensional representation of the data. We present a version of the NMF approach that merges rule-based descriptions with advantages of part-based representation offered by the NMF. Given the numerical input data with non-negative entries and a set of rules with high entity coverage, the approach creates the lower-dimensional non-negative representation of the input data in such a way that its factors are described by the appropriate subset of the input rules. In addition to revealing important attributes for latent factors, their interaction and value ranges, this approach allows performing focused embedding potentially using multiple overlapping target labels.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2538-2549"},"PeriodicalIF":8.9000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10887020","citationCount":"0","resultStr":"{\"title\":\"Finding Rule-Interpretable Non-Negative Data Representation\",\"authors\":\"Matej Mihelčić;Pauli Miettinen\",\"doi\":\"10.1109/TKDE.2025.3538327\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Non-negative Matrix Factorization (NMF) is an intensively used technique for obtaining parts-based, lower dimensional and non-negative representation. Researchers in biology, medicine, pharmacy and other fields often prefer NMF over other dimensionality reduction approaches (such as PCA) because the non-negativity of the approach naturally fits the characteristics of the domain problem and its results are easier to analyze and understand. Despite these advantages, obtaining exact characterization and interpretation of the NMF’s latent factors can still be difficult due to their numerical nature. Rule-based approaches, such as rule mining, conceptual clustering, subgroup discovery and redescription mining, are often considered more interpretable but lack lower-dimensional representation of the data. We present a version of the NMF approach that merges rule-based descriptions with advantages of part-based representation offered by the NMF. Given the numerical input data with non-negative entries and a set of rules with high entity coverage, the approach creates the lower-dimensional non-negative representation of the input data in such a way that its factors are described by the appropriate subset of the input rules. In addition to revealing important attributes for latent factors, their interaction and value ranges, this approach allows performing focused embedding potentially using multiple overlapping target labels.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"37 5\",\"pages\":\"2538-2549\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-02-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10887020\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10887020/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10887020/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

非负矩阵因式分解（NMF）是一种广泛使用的技术，用于获得基于部件的低维非负表示。与其他降维方法（如 PCA）相比，生物学、医学、药学和其他领域的研究人员通常更青睐 NMF，因为该方法的非负性自然符合领域问题的特点，而且其结果更易于分析和理解。尽管有这些优势，但由于 NMF 潜在因子的数值性质，要准确描述和解释这些因子仍然有一定难度。基于规则的方法，如规则挖掘、概念聚类、子群发现和重新描述挖掘，通常被认为更易于解释，但缺乏数据的低维表示。我们提出的 NMF 方法融合了基于规则的描述和 NMF 提供的基于部分的表示的优势。给定带有非负条目的数字输入数据和一组具有较高实体覆盖率的规则，该方法就能创建输入数据的低维非负表示，从而用输入规则的适当子集来描述其因子。除了揭示潜在因素的重要属性、它们之间的相互作用和值范围之外，这种方法还允许使用多个重叠的目标标签进行有重点的嵌入。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Finding Rule-Interpretable Non-Negative Data Representation

Non-negative Matrix Factorization (NMF) is an intensively used technique for obtaining parts-based, lower dimensional and non-negative representation. Researchers in biology, medicine, pharmacy and other fields often prefer NMF over other dimensionality reduction approaches (such as PCA) because the non-negativity of the approach naturally fits the characteristics of the domain problem and its results are easier to analyze and understand. Despite these advantages, obtaining exact characterization and interpretation of the NMF’s latent factors can still be difficult due to their numerical nature. Rule-based approaches, such as rule mining, conceptual clustering, subgroup discovery and redescription mining, are often considered more interpretable but lack lower-dimensional representation of the data. We present a version of the NMF approach that merges rule-based descriptions with advantages of part-based representation offered by the NMF. Given the numerical input data with non-negative entries and a set of rules with high entity coverage, the approach creates the lower-dimensional non-negative representation of the input data in such a way that its factors are described by the appropriate subset of the input rules. In addition to revealing important attributes for latent factors, their interaction and value ranges, this approach allows performing focused embedding potentially using multiple overlapping target labels.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.