多元响应回归中的聚类检测

IF 4.4 2区 数学 Q1 STATISTICS & PROBABILITY
Bradley S. Price, Corban Allenbrand, Ben Sherwood
{"title":"多元响应回归中的聚类检测","authors":"Bradley S. Price, Corban Allenbrand, Ben Sherwood","doi":"10.1002/wics.1551","DOIUrl":null,"url":null,"abstract":"Multivariate regression, which can also be posed as a multitask machine learning problem, is used to better understand multiple outputs based on a given set of inputs. Many methods have been proposed on how to utilize shared information about responses with applications in fields such as economics, genomics, advanced manufacturing, and precision medicine. Interest in these areas coupled with the rise of large data sets (“big data”) has generated interest in how to make the computations more efficient, but also to develop methods that account for the heterogeneity that may exist between responses. One way to exploit this heterogeneity between responses is to use methods that detect groups, also called clusters, of related responses. These methods provide a framework that can increase computational speed and account for complexity of relationships of a large number of responses. With this flexibility, comes additional challenges such as how to identify these clusters of responses, model selection, and the development of more complex algorithms that combine concepts from both the supervised and unsupervised learning literature. We explore current state of the art methods, present a framework to better understand methods that utilize or detect clusters of responses, and provide insights on the computational challenges associated with this framework. Specifically we present a simulation study that discusses the challenges with model selection when detecting clusters of responses of interest. We also comment on extensions and open problems that are of interest to both the research and practitioner communities.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2021-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1551","citationCount":"3","resultStr":"{\"title\":\"Detecting clusters in multivariate response regression\",\"authors\":\"Bradley S. Price, Corban Allenbrand, Ben Sherwood\",\"doi\":\"10.1002/wics.1551\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multivariate regression, which can also be posed as a multitask machine learning problem, is used to better understand multiple outputs based on a given set of inputs. Many methods have been proposed on how to utilize shared information about responses with applications in fields such as economics, genomics, advanced manufacturing, and precision medicine. Interest in these areas coupled with the rise of large data sets (“big data”) has generated interest in how to make the computations more efficient, but also to develop methods that account for the heterogeneity that may exist between responses. One way to exploit this heterogeneity between responses is to use methods that detect groups, also called clusters, of related responses. These methods provide a framework that can increase computational speed and account for complexity of relationships of a large number of responses. With this flexibility, comes additional challenges such as how to identify these clusters of responses, model selection, and the development of more complex algorithms that combine concepts from both the supervised and unsupervised learning literature. We explore current state of the art methods, present a framework to better understand methods that utilize or detect clusters of responses, and provide insights on the computational challenges associated with this framework. Specifically we present a simulation study that discusses the challenges with model selection when detecting clusters of responses of interest. We also comment on extensions and open problems that are of interest to both the research and practitioner communities.\",\"PeriodicalId\":47779,\"journal\":{\"name\":\"Wiley Interdisciplinary Reviews-Computational Statistics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2021-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1002/wics.1551\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Wiley Interdisciplinary Reviews-Computational Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1002/wics.1551\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Wiley Interdisciplinary Reviews-Computational Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/wics.1551","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 3

摘要

多元回归也可以作为一个多任务机器学习问题,用于更好地理解基于给定输入集的多个输出。关于如何利用关于反应的共享信息,已经提出了许多方法,这些方法在经济学、基因组学、先进制造业和精准医学等领域都有应用。对这些领域的兴趣,加上大型数据集(“大数据”)的兴起,产生了人们对如何提高计算效率的兴趣,同时也产生了对开发解释响应之间可能存在的异质性的方法的兴趣。利用反应之间这种异质性的一种方法是使用检测相关反应的组(也称为集群)的方法。这些方法提供了一个可以提高计算速度并考虑大量响应关系复杂性的框架。这种灵活性带来了额外的挑战,如如何识别这些响应集群、模型选择,以及开发更复杂的算法,将监督和非监督学习文献中的概念结合起来。我们探索了当前最先进的方法,提出了一个框架来更好地理解利用或检测响应集群的方法,并提供了与该框架相关的计算挑战的见解。具体而言,我们提出了一项模拟研究,讨论了在检测感兴趣的响应集群时模型选择的挑战。我们还评论了研究和从业者群体感兴趣的扩展和开放问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Detecting clusters in multivariate response regression
Multivariate regression, which can also be posed as a multitask machine learning problem, is used to better understand multiple outputs based on a given set of inputs. Many methods have been proposed on how to utilize shared information about responses with applications in fields such as economics, genomics, advanced manufacturing, and precision medicine. Interest in these areas coupled with the rise of large data sets (“big data”) has generated interest in how to make the computations more efficient, but also to develop methods that account for the heterogeneity that may exist between responses. One way to exploit this heterogeneity between responses is to use methods that detect groups, also called clusters, of related responses. These methods provide a framework that can increase computational speed and account for complexity of relationships of a large number of responses. With this flexibility, comes additional challenges such as how to identify these clusters of responses, model selection, and the development of more complex algorithms that combine concepts from both the supervised and unsupervised learning literature. We explore current state of the art methods, present a framework to better understand methods that utilize or detect clusters of responses, and provide insights on the computational challenges associated with this framework. Specifically we present a simulation study that discusses the challenges with model selection when detecting clusters of responses of interest. We also comment on extensions and open problems that are of interest to both the research and practitioner communities.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.20
自引率
0.00%
发文量
31
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信