Journal of Machine Learning Research最新文献_第10页

Guarding against Spurious Discoveries in High Dimensions. 在高维中防范虚假的发现。

IF 6 3区计算机科学

Journal of Machine Learning Research Pub Date : 2016-01-01

Jianqing Fan, Wen-Xin Zhou

{"title":"Guarding against Spurious Discoveries in High Dimensions.","authors":"Jianqing Fan, Wen-Xin Zhou","doi":"","DOIUrl":"","url":null,"abstract":"Many data-mining and statistical machine learning algorithms have been developed to select a subset of covariates to associate with a response variable. Spurious discoveries can easily arise in high-dimensional data analysis due to enormous possibilities of such selections. How can we know statistically our discoveries better than those by chance? In this paper, we define a measure of goodness of spurious fit, which shows how good a response variable can be fitted by an optimally selected subset of covariates under the null model, and propose a simple and effective LAMM algorithm to compute it. It coincides with the maximum spurious correlation for linear models and can be regarded as a generalized maximum spurious correlation. We derive the asymptotic distribution of such goodness of spurious fit for generalized linear models and L1-regression. Such an asymptotic distribution depends on the sample size, ambient dimension, the number of variables used in the fit, and the covariance information. It can be consistently estimated by multiplier bootstrapping and used as a benchmark to guard against spurious discoveries. It can also be applied to model selection, which considers only candidate models with goodness of fits better than those by spurious fits. The theory and method are convincingly illustrated by simulated examples and an application to the binary outcomes from German Neuroblastoma Trials.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"17 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5603346/pdf/nihms842539.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35535102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Structure discovery in Bayesian networks by sampling partial orders 抽样偏阶贝叶斯网络的结构发现

IF 6 3区计算机科学

Journal of Machine Learning Research Pub Date : 2016-01-01 DOI: 10.5555/2946645.2946702

NiinimäkiTeppo, ParviainenPekka, KoivistoMikko

引用次数: 1

Choice of V for V-fold cross-validation in least-squares density estimation 最小二乘密度估计中V-fold交叉验证V值的选择

IF 6 3区计算机科学

Journal of Machine Learning Research Pub Date : 2016-01-01 DOI: 10.5555/2946645.3053490

ArlotSylvain, LerasleMatthieu

引用次数: 0

MOCCA: Mirrored Convex/Concave Optimization for Nonconvex Composite Functions. 非凸复合函数的镜像凸/凹优化。

IF 6 3区计算机科学

Journal of Machine Learning Research Pub Date : 2016-01-01

Rina Foygel Barber, Emil Y Sidky

{"title":"MOCCA: Mirrored Convex/Concave Optimization for Nonconvex Composite Functions.","authors":"Rina Foygel Barber, Emil Y Sidky","doi":"","DOIUrl":"","url":null,"abstract":"Many optimization problems arising in high-dimensional statistics decompose naturally into a sum of several terms, where the individual terms are relatively simple but the composite objective function can only be optimized with iterative algorithms. In this paper, we are interested in optimization problems of the form F(Kx) + G(x), where K is a fixed linear transformation, while F and G are functions that may be nonconvex and/or nondifferentiable. In particular, if either of the terms are nonconvex, existing alternating minimization techniques may fail to converge; other types of existing approaches may instead be unable to handle nondifferentiability. We propose the MOCCA (mirrored convex/concave) algorithm, a primal/dual optimization approach that takes a local convex approximation to each term at every iteration. Inspired by optimization problems arising in computed tomography (CT) imaging, this algorithm can handle a range of nonconvex composite optimization problems, and offers theoretical guarantees for convergence when the overall problem is approximately convex (that is, any concavity in one term is balanced out by convexity in the other term). Empirical results show fast convergence for several structured signal recovery problems.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"17 144","pages":"1-51"},"PeriodicalIF":6.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5789814/pdf/nihms870482.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35785739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The optimal sample complexity OF PAC learning PAC学习的最优样本复杂度

IF 6 3区计算机科学

Journal of Machine Learning Research Pub Date : 2016-01-01 DOI: 10.5555/2946645.2946683

HannekeSteve

引用次数: 7

Support Vector Hazards Machine: A Counting Process Framework for Learning Risk Scores for Censored Outcomes. 支持向量危险机：一个计算过程框架，用于学习审查结果的风险评分。

IF 6 3区计算机科学

Journal of Machine Learning Research Pub Date : 2016-01-01 Epub Date: 2016-08-01

Yuanjia Wang, Tianle Chen, Donglin Zeng

{"title":"Support Vector Hazards Machine: A Counting Process Framework for Learning Risk Scores for Censored Outcomes.","authors":"Yuanjia Wang, Tianle Chen, Donglin Zeng","doi":"","DOIUrl":"","url":null,"abstract":"Learning risk scores to predict dichotomous or continuous outcomes using machine learning approaches has been studied extensively. However, how to learn risk scores for time-to-event outcomes subject to right censoring has received little attention until recently. Existing approaches rely on inverse probability weighting or rank-based regression, which may be inefficient. In this paper, we develop a new support vector hazards machine (SVHM) approach to predict censored outcomes. Our method is based on predicting the counting process associated with the time-to-event outcomes among subjects at risk via a series of support vector machines. Introducing counting processes to represent time-to-event data leads to a connection between support vector machines in supervised learning and hazards regression in standard survival analysis. To account for different at risk populations at observed event times, a time-varying offset is used in estimating risk scores. The resulting optimization is a convex quadratic programming problem that can easily incorporate non-linearity using kernel trick. We demonstrate an interesting link from the profiled empirical risk function of SVHM to the Cox partial likelihood. We then formally show that SVHM is optimal in discriminating covariate-specific hazard function from population average hazard function, and establish the consistency and learning rate of the predicted risk using the estimated risk scores. Simulation studies show improved prediction accuracy of the event times using SVHM compared to existing machine learning methods and standard conventional approaches. Finally, we analyze two real world biomedical study data where we use clinical markers and neuroimaging biomarkers to predict age-at-onset of a disease, and demonstrate superiority of SVHM in distinguishing high risk versus low risk subjects.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"17 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210213/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71434774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gradients weights improve regression and classification 梯度权重改进了回归和分类

IF 6 3区计算机科学

Journal of Machine Learning Research Pub Date : 2016-01-01 DOI: 10.5555/2946645.2946667

KpotufeSamory, BoulariasAbdeslam, SchultzThomas, KimKyoungok

引用次数: 2

Fused lasso approach in regression coefficients clustering 回归系数聚类的融合套索方法

IF 6 3区计算机科学

Journal of Machine Learning Research Pub Date : 2016-01-01 DOI: 10.5555/2946645.3007066

TangLu

引用次数: 4

Input output kernel regression 输入输出核回归