加权秩一二元矩阵分解的算法及应用。

IF 3.6 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Management Information Systems Pub Date : 2020-05-01 DOI:10.1145/3386599

Haibing Lu, X I Chen, Junmin Shi, Jaideep Vaidya, Vijayalakshmi Atluri, Yuan Hong, Wei Huang

{"title":"加权秩一二元矩阵分解的算法及应用。","authors":"Haibing Lu, X I Chen, Junmin Shi, Jaideep Vaidya, Vijayalakshmi Atluri, Yuan Hong, Wei Huang","doi":"10.1145/3386599","DOIUrl":null,"url":null,"abstract":"Many applications use data that are better represented in the binary matrix form, such as click-stream data, market basket data, document-term data, user-permission data in access control, and others. Matrix factorization methods have been widely used tools for the analysis of high-dimensional data, as they automatically extract sparse and meaningful features from data vectors. However, existing matrix factorization methods do not work well for the binary data. One crucial limitation is interpretability, as many matrix factorization methods decompose an input matrix into matrices with fractional or even negative components, which are hard to interpret in many real settings. Some matrix factorization methods, like binary matrix factorization, do limit decomposed matrices to binary values. However, these models are not flexible to accommodate some data analysis tasks, like trading off summary size with quality and discriminating different types of approximation errors. To address those issues, this article presents weighted rank-one binary matrix factorization, which is to approximate a binary matrix by the product of two binary vectors, with parameters controlling different types of approximation errors. By systematically running weighted rank-one binary matrix factorization, one can effectively perform various binary data analysis tasks, like compression, clustering, and pattern discovery. Theoretical properties on weighted rank-one binary matrix factorization are investigated and its connection to problems in other research domains are examined. As weighted rank-one binary matrix factorization in general is NP-hard, efficient and effective algorithms are presented. Extensive studies on applications of weighted rank-one binary matrix factorization are also conducted.","PeriodicalId":45274,"journal":{"name":"ACM Transactions on Management Information Systems","volume":"11 2","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3386599","citationCount":"8","resultStr":"{\"title\":\"Algorithms and Applications to Weighted Rank-one Binary Matrix Factorization.\",\"authors\":\"Haibing Lu, X I Chen, Junmin Shi, Jaideep Vaidya, Vijayalakshmi Atluri, Yuan Hong, Wei Huang\",\"doi\":\"10.1145/3386599\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many applications use data that are better represented in the binary matrix form, such as click-stream data, market basket data, document-term data, user-permission data in access control, and others. Matrix factorization methods have been widely used tools for the analysis of high-dimensional data, as they automatically extract sparse and meaningful features from data vectors. However, existing matrix factorization methods do not work well for the binary data. One crucial limitation is interpretability, as many matrix factorization methods decompose an input matrix into matrices with fractional or even negative components, which are hard to interpret in many real settings. Some matrix factorization methods, like binary matrix factorization, do limit decomposed matrices to binary values. However, these models are not flexible to accommodate some data analysis tasks, like trading off summary size with quality and discriminating different types of approximation errors. To address those issues, this article presents weighted rank-one binary matrix factorization, which is to approximate a binary matrix by the product of two binary vectors, with parameters controlling different types of approximation errors. By systematically running weighted rank-one binary matrix factorization, one can effectively perform various binary data analysis tasks, like compression, clustering, and pattern discovery. Theoretical properties on weighted rank-one binary matrix factorization are investigated and its connection to problems in other research domains are examined. As weighted rank-one binary matrix factorization in general is NP-hard, efficient and effective algorithms are presented. Extensive studies on applications of weighted rank-one binary matrix factorization are also conducted.\",\"PeriodicalId\":45274,\"journal\":{\"name\":\"ACM Transactions on Management Information Systems\",\"volume\":\"11 2\",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1145/3386599\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Management Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3386599\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Management Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3386599","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 8

摘要

许多应用程序使用二进制矩阵形式更好地表示的数据，例如点击流数据、市场购物篮数据、文档术语数据、访问控制中的用户权限数据等。矩阵分解方法可以自动从数据向量中提取稀疏而有意义的特征，是高维数据分析中广泛使用的工具。然而，现有的矩阵分解方法不能很好地处理二进制数据。一个关键的限制是可解释性，因为许多矩阵分解方法将输入矩阵分解为具有分数甚至负分量的矩阵，这在许多实际设置中很难解释。一些矩阵分解方法，如二元矩阵分解，将分解矩阵限制为二元值。然而，这些模型在适应某些数据分析任务时并不灵活，比如权衡汇总大小和质量以及区分不同类型的近似误差。为了解决这些问题，本文提出了加权秩一二进制矩阵分解，即通过两个二进制向量的乘积来近似二进制矩阵，参数控制不同类型的近似误差。通过系统地运行加权秩一二进制矩阵分解，可以有效地执行各种二进制数据分析任务，如压缩、聚类和模式发现。研究了加权秩一二元矩阵分解的理论性质，并探讨了其与其他研究领域问题的联系。由于加权秩一二元矩阵分解一般是np困难的，因此提出了高效的分解算法。对加权秩一二元矩阵分解的应用也进行了广泛的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Algorithms and Applications to Weighted Rank-one Binary Matrix Factorization.

Many applications use data that are better represented in the binary matrix form, such as click-stream data, market basket data, document-term data, user-permission data in access control, and others. Matrix factorization methods have been widely used tools for the analysis of high-dimensional data, as they automatically extract sparse and meaningful features from data vectors. However, existing matrix factorization methods do not work well for the binary data. One crucial limitation is interpretability, as many matrix factorization methods decompose an input matrix into matrices with fractional or even negative components, which are hard to interpret in many real settings. Some matrix factorization methods, like binary matrix factorization, do limit decomposed matrices to binary values. However, these models are not flexible to accommodate some data analysis tasks, like trading off summary size with quality and discriminating different types of approximation errors. To address those issues, this article presents weighted rank-one binary matrix factorization, which is to approximate a binary matrix by the product of two binary vectors, with parameters controlling different types of approximation errors. By systematically running weighted rank-one binary matrix factorization, one can effectively perform various binary data analysis tasks, like compression, clustering, and pattern discovery. Theoretical properties on weighted rank-one binary matrix factorization are investigated and its connection to problems in other research domains are examined. As weighted rank-one binary matrix factorization in general is NP-hard, efficient and effective algorithms are presented. Extensive studies on applications of weighted rank-one binary matrix factorization are also conducted.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Management Information Systems COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

6.30

自引率

20.00%

发文量