Journal of machine learning research : JMLR最新文献

Hoeffding's inequality for general Markov chains with its applications to statistical learning. 一般马尔可夫链的Hoeffding不等式及其在统计学习中的应用。

IF 6

Journal of machine learning research : JMLR Pub Date : 2021-08-01

Jianqing Fan, Bai Jiang, Qiang Sun

引用次数: 0

Bayesian time-aligned factor analysis of paired multivariate time series. 配对多变量时间序列的贝叶斯时间对齐因子分析。

IF 6

Journal of machine learning research : JMLR Pub Date : 2021-01-01

Arkaprava Roy, Jana Schaich Borg, David B Dunson

引用次数: 0

Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data. 混合多视图数据的综合广义凸聚类优化与特征选择。

IF 6

Journal of machine learning research : JMLR Pub Date : 2021-01-01

Minjie Wang, Genevera I Allen

{"title":"Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data.","authors":"Minjie Wang, Genevera I Allen","doi":"","DOIUrl":"","url":null,"abstract":"In mixed multi-view data, multiple sets of diverse features are measured on the same set of samples. By integrating all available data sources, we seek to discover common group structure among the samples that may be hidden in individualistic cluster analyses of a single data view. While several techniques for such integrative clustering have been explored, we propose and develop a convex formalization that enjoys strong empirical performance and inherits the mathematical properties of increasingly popular convex clustering methods. Specifically, our Integrative Generalized Convex Clustering Optimization (iGecco) method employs different convex distances, losses, or divergences for each of the different data views with a joint convex fusion penalty that leads to common groups. Additionally, integrating mixed multi-view data is often challenging when each data source is high-dimensional. To perform feature selection in such scenarios, we develop an adaptive shifted group-lasso penalty that selects features by shrinking them towards their loss-specific centers. Our so-called iGecco+ approach selects features from each data view that are best for determining the groups, often leading to improved integrative clustering. To solve our problem, we develop a new type of generalized multi-block ADMM algorithm using sub-problem approximations that more efficiently fits our model for big data sets. Through a series of numerical experiments and real data examples on text mining and genomics, we show that iGecco+ achieves superior empirical performance for high-dimensional mixed multi-view data.","PeriodicalId":314696,"journal":{"name":"Journal of machine learning research : JMLR","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8570363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39596948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning and Planning for Time-Varying MDPs Using Maximum Likelihood Estimation. 使用最大似然估计的时变mdp学习和规划。

Journal of machine learning research : JMLR Pub Date : 2021-01-01 Epub Date: 2021-02-01

Melkior Ornik, Ufuk Topcu

{"title":"Learning and Planning for Time-Varying MDPs Using Maximum Likelihood Estimation.","authors":"Melkior Ornik, Ufuk Topcu","doi":"","DOIUrl":"","url":null,"abstract":"This paper proposes a formal approach to online learning and planning for agents operating in a priori unknown, time-varying environments. The proposed method computes the maximally likely model of the environment, given the observations about the environment made by an agent earlier in the system run and assuming knowledge of a bound on the maximal rate of change of system dynamics. Such an approach generalizes the estimation method commonly used in learning algorithms for unknown Markov decision processes with time-invariant transition probabilities, but is also able to quickly and correctly identify the system dynamics following a change. Based on the proposed method, we generalize the exploration bonuses used in learning for time-invariant Markov decision processes by introducing a notion of uncertainty in a learned time-varying model, and develop a control policy for time-varying Markov decision processes based on the exploitation and exploration trade-off. We demonstrate the proposed methods on four numerical examples: a patrolling task with a change in system dynamics, a two-state MDP with periodically changing outcomes of actions, a wind flow estimation task, and a multi-armed bandit problem with periodically changing probabilities of different rewards.","PeriodicalId":314696,"journal":{"name":"Journal of machine learning research : JMLR","volume":" ","pages":"1-40"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8739185/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39913174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimation and Inference for High Dimensional Generalized Linear Models: A Splitting and Smoothing Approach. 高维广义线性模型的估计与推理:一种分裂与平滑方法。

IF 6

Journal of machine learning research : JMLR Pub Date : 2021-01-01

Zhe Fei, Yi Li

{"title":"Estimation and Inference for High Dimensional Generalized Linear Models: A Splitting and Smoothing Approach.","authors":"Zhe Fei, Yi Li","doi":"","DOIUrl":"","url":null,"abstract":"The focus of modern biomedical studies has gradually shifted to explanation and estimation of joint effects of high dimensional predictors on disease risks. Quantifying uncertainty in these estimates may provide valuable insight into prevention strategies or treatment decisions for both patients and physicians. High dimensional inference, including confidence intervals and hypothesis testing, has sparked much interest. While much work has been done in the linear regression setting, there is lack of literature on inference for high dimensional generalized linear models. We propose a novel and computationally feasible method, which accommodates a variety of outcome types, including normal, binomial, and Poisson data. We use a \"splitting and smoothing\" approach, which splits samples into two parts, performs variable selection using one part and conducts partial regression with the other part. Averaging the estimates over multiple random splits, we obtain the smoothed estimates, which are numerically stable. We show that the estimates are consistent, asymptotically normal, and construct confidence intervals with proper coverage probabilities for all predictors. We examine the finite sample performance of our method by comparing it with the existing methods and applying it to analyze a lung cancer cohort study.","PeriodicalId":314696,"journal":{"name":"Journal of machine learning research : JMLR","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8442657/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39443931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The flare package for high dimensional linear regression and precision matrix estimation in R 高维线性回归和精确矩阵估计的火炬包

Journal of machine learning research : JMLR Pub Date : 2020-06-27 DOI: 10.5555/2789272.2789290

Xingguo Li, T. Zhao, Xiaoming Yuan, Han Liu

引用次数: 70

Generalized Score Matching for Non-Negative Data. 非负数据的广义分数匹配。

IF 6

Journal of machine learning research : JMLR Pub Date : 2019-04-01

Shiqing Yu, Mathias Drton, Ali Shojaie

引用次数: 0

All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously. 所有的模型都是错误的，但许多是有用的:通过同时研究整个预测模型类来了解变量的重要性。

IF 6

Journal of machine learning research : JMLR Pub Date : 2019-01-01

Aaron Fisher, Cynthia Rudin, Francesca Dominici

{"title":"All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.","authors":"Aaron Fisher, Cynthia Rudin, Francesca Dominici","doi":"","DOIUrl":"","url":null,"abstract":"Variable importance (VI) tools describe how much covariates contribute to a prediction model's accuracy. However, important variables for one well-performing model (for example, a linear model f (x) = x T β with a fixed coefficient vector β) may be unimportant for another model. In this paper, we propose model class reliance (MCR) as the range of VI values across all well-performing model in a prespecified class. Thus, MCR gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well. In the process of deriving MCR, we show several informative results for permutation-based VI estimates, based on the VI measures used in Random Forests. Specifically, we derive connections between permutation importance estimates for a single prediction model, U-statistics, conditional variable importance, conditional causal effects, and linear model coefficients. We then give probabilistic bounds for MCR, using a novel, generalizable technique. We apply MCR to a public data set of Broward County criminal records to study the reliance of recidivism prediction models on sex and race. In this application, MCR can be used to help inform VI for unknown, proprietary models.","PeriodicalId":314696,"journal":{"name":"Journal of machine learning research : JMLR","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8323609/pdf/nihms-1670270.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39264727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Determining the Number of Latent Factors in Statistical Multi-Relational Learning. 统计多关系学习中潜在因素数量的确定。

IF 6

Journal of machine learning research : JMLR Pub Date : 2019-01-01

Chengchun Shi, Wenbin Lu, Rui Song

{"title":"Determining the Number of Latent Factors in Statistical Multi-Relational Learning.","authors":"Chengchun Shi, Wenbin Lu, Rui Song","doi":"","DOIUrl":"","url":null,"abstract":"Statistical relational learning is primarily concerned with learning and inferring relationships between entities in large-scale knowledge graphs. Nickel et al. (2011) proposed a RESCAL tensor factorization model for statistical relational learning, which achieves better or at least comparable results on common benchmark data sets when compared to other state-of-the-art methods. Given a positive integer s, RESCAL computes an s-dimensional latent vector for each entity. The latent factors can be further used for solving relational learning tasks, such as collective classification, collective entity resolution and link-based clustering. The focus of this paper is to determine the number of latent factors in the RESCAL model. Due to the structure of the RESCAL model, its log-likelihood function is not concave. As a result, the corresponding maximum likelihood estimators (MLEs) may not be consistent. Nonetheless, we design a specific pseudometric, prove the consistency of the MLEs under this pseudometric and establish its rate of convergence. Based on these results, we propose a general class of information criteria and prove their model selection consistencies when the number of relations is either bounded or diverges at a proper rate of the number of entities. Simulations and real data examples show that our proposed information criteria have good finite sample properties.","PeriodicalId":314696,"journal":{"name":"Journal of machine learning research : JMLR","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6980192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37581845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Nonuniformity of P-values Can Occur Early in Diverging Dimensions. p值的非均匀性可以在发散维数的早期出现。

IF 6

Journal of machine learning research : JMLR Pub Date : 2019-01-01

Yingying Fan, Emre Demirkaya, Jinchi Lv

引用次数: 0