极低秩变量子集的深刻降维

Bruno Ordozgoiti, Sachith Pai, M. Kołczyńska
{"title":"极低秩变量子集的深刻降维","authors":"Bruno Ordozgoiti, Sachith Pai, M. Kołczyńska","doi":"10.1145/3442381.3450067","DOIUrl":null,"url":null,"abstract":"Dimensionality reduction techniques can be employed to produce robust, cost-effective predictive models, and to enhance interpretability in exploratory data analysis. However, the models produced by many of these methods are formulated in terms of abstract factors or are too high-dimensional to facilitate insight and fit within low computational budgets. In this paper we explore an alternative approach to interpretable dimensionality reduction. Given a data matrix, we study the following question: are there subsets of variables that can be primarily explained by a single factor? We formulate this challenge as the problem of finding submatrices close to rank one. Despite its potential, this topic has not been sufficiently addressed in the literature, and there exist virtually no algorithms for this purpose that are simultaneously effective, efficient and scalable. We formalize the task as two problems which we characterize in terms of computational complexity, and propose efficient, scalable algorithms with approximation guarantees. Our experiments demonstrate how our approach can produce insightful findings in data, and show our algorithms to be superior to strong baselines.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Insightful Dimensionality Reduction with Very Low Rank Variable Subsets\",\"authors\":\"Bruno Ordozgoiti, Sachith Pai, M. Kołczyńska\",\"doi\":\"10.1145/3442381.3450067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dimensionality reduction techniques can be employed to produce robust, cost-effective predictive models, and to enhance interpretability in exploratory data analysis. However, the models produced by many of these methods are formulated in terms of abstract factors or are too high-dimensional to facilitate insight and fit within low computational budgets. In this paper we explore an alternative approach to interpretable dimensionality reduction. Given a data matrix, we study the following question: are there subsets of variables that can be primarily explained by a single factor? We formulate this challenge as the problem of finding submatrices close to rank one. Despite its potential, this topic has not been sufficiently addressed in the literature, and there exist virtually no algorithms for this purpose that are simultaneously effective, efficient and scalable. We formalize the task as two problems which we characterize in terms of computational complexity, and propose efficient, scalable algorithms with approximation guarantees. Our experiments demonstrate how our approach can produce insightful findings in data, and show our algorithms to be superior to strong baselines.\",\"PeriodicalId\":106672,\"journal\":{\"name\":\"Proceedings of the Web Conference 2021\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Web Conference 2021\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3442381.3450067\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Web Conference 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442381.3450067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

降维技术可用于生成稳健、经济的预测模型,并增强探索性数据分析的可解释性。然而,许多这些方法产生的模型都是根据抽象因素制定的,或者过于高维,无法促进洞察力和适应低计算预算。在本文中,我们探索了一种可解释降维的替代方法。给定一个数据矩阵,我们研究以下问题:是否存在可以主要由单一因素解释的变量子集?我们将这个挑战表述为寻找接近秩1的子矩阵的问题。尽管它有潜力,但这个主题在文献中还没有得到充分的解决,而且实际上没有为此目的同时有效、高效和可扩展的算法。我们将任务形式化为两个问题,我们在计算复杂性方面进行了表征,并提出了具有近似保证的高效可扩展算法。我们的实验证明了我们的方法如何在数据中产生深刻的发现,并表明我们的算法优于强基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Insightful Dimensionality Reduction with Very Low Rank Variable Subsets
Dimensionality reduction techniques can be employed to produce robust, cost-effective predictive models, and to enhance interpretability in exploratory data analysis. However, the models produced by many of these methods are formulated in terms of abstract factors or are too high-dimensional to facilitate insight and fit within low computational budgets. In this paper we explore an alternative approach to interpretable dimensionality reduction. Given a data matrix, we study the following question: are there subsets of variables that can be primarily explained by a single factor? We formulate this challenge as the problem of finding submatrices close to rank one. Despite its potential, this topic has not been sufficiently addressed in the literature, and there exist virtually no algorithms for this purpose that are simultaneously effective, efficient and scalable. We formalize the task as two problems which we characterize in terms of computational complexity, and propose efficient, scalable algorithms with approximation guarantees. Our experiments demonstrate how our approach can produce insightful findings in data, and show our algorithms to be superior to strong baselines.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信