Detecting clusters in multivariate response regression

IF 5.4 2区数学 Q1 STATISTICS & PROBABILITY

Wiley Interdisciplinary Reviews-Computational Statistics Pub Date : 2021-02-03 DOI:10.1002/wics.1551

Bradley S. Price, Corban Allenbrand, Ben Sherwood

{"title":"Detecting clusters in multivariate response regression","authors":"Bradley S. Price, Corban Allenbrand, Ben Sherwood","doi":"10.1002/wics.1551","DOIUrl":null,"url":null,"abstract":"Multivariate regression, which can also be posed as a multitask machine learning problem, is used to better understand multiple outputs based on a given set of inputs. Many methods have been proposed on how to utilize shared information about responses with applications in fields such as economics, genomics, advanced manufacturing, and precision medicine. Interest in these areas coupled with the rise of large data sets (“big data”) has generated interest in how to make the computations more efficient, but also to develop methods that account for the heterogeneity that may exist between responses. One way to exploit this heterogeneity between responses is to use methods that detect groups, also called clusters, of related responses. These methods provide a framework that can increase computational speed and account for complexity of relationships of a large number of responses. With this flexibility, comes additional challenges such as how to identify these clusters of responses, model selection, and the development of more complex algorithms that combine concepts from both the supervised and unsupervised learning literature. We explore current state of the art methods, present a framework to better understand methods that utilize or detect clusters of responses, and provide insights on the computational challenges associated with this framework. Specifically we present a simulation study that discusses the challenges with model selection when detecting clusters of responses of interest. We also comment on extensions and open problems that are of interest to both the research and practitioner communities.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2021-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1551","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Wiley Interdisciplinary Reviews-Computational Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/wics.1551","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 3

Abstract

Multivariate regression, which can also be posed as a multitask machine learning problem, is used to better understand multiple outputs based on a given set of inputs. Many methods have been proposed on how to utilize shared information about responses with applications in fields such as economics, genomics, advanced manufacturing, and precision medicine. Interest in these areas coupled with the rise of large data sets (“big data”) has generated interest in how to make the computations more efficient, but also to develop methods that account for the heterogeneity that may exist between responses. One way to exploit this heterogeneity between responses is to use methods that detect groups, also called clusters, of related responses. These methods provide a framework that can increase computational speed and account for complexity of relationships of a large number of responses. With this flexibility, comes additional challenges such as how to identify these clusters of responses, model selection, and the development of more complex algorithms that combine concepts from both the supervised and unsupervised learning literature. We explore current state of the art methods, present a framework to better understand methods that utilize or detect clusters of responses, and provide insights on the computational challenges associated with this framework. Specifically we present a simulation study that discusses the challenges with model selection when detecting clusters of responses of interest. We also comment on extensions and open problems that are of interest to both the research and practitioner communities.

查看原文本刊更多论文

多元响应回归中的聚类检测

多元回归也可以作为一个多任务机器学习问题，用于更好地理解基于给定输入集的多个输出。关于如何利用关于反应的共享信息，已经提出了许多方法，这些方法在经济学、基因组学、先进制造业和精准医学等领域都有应用。对这些领域的兴趣，加上大型数据集（“大数据”）的兴起，产生了人们对如何提高计算效率的兴趣，同时也产生了对开发解释响应之间可能存在的异质性的方法的兴趣。利用反应之间这种异质性的一种方法是使用检测相关反应的组（也称为集群）的方法。这些方法提供了一个可以提高计算速度并考虑大量响应关系复杂性的框架。这种灵活性带来了额外的挑战，如如何识别这些响应集群、模型选择，以及开发更复杂的算法，将监督和非监督学习文献中的概念结合起来。我们探索了当前最先进的方法，提出了一个框架来更好地理解利用或检测响应集群的方法，并提供了与该框架相关的计算挑战的见解。具体而言，我们提出了一项模拟研究，讨论了在检测感兴趣的响应集群时模型选择的挑战。我们还评论了研究和从业者群体感兴趣的扩展和开放问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Wiley Interdisciplinary Reviews-Computational Statistics STATISTICS & PROBABILITY-

CiteScore

6.20

自引率

0.00%

发文量