Journal of Machine Learning Research最新文献

筛选
英文 中文
Guarding against Spurious Discoveries in High Dimensions. 在高维中防范虚假的发现。
IF 6 3区 计算机科学
Journal of Machine Learning Research Pub Date : 2016-01-01
Jianqing Fan, Wen-Xin Zhou
{"title":"Guarding against Spurious Discoveries in High Dimensions.","authors":"Jianqing Fan,&nbsp;Wen-Xin Zhou","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Many data-mining and statistical machine learning algorithms have been developed to select a subset of covariates to associate with a response variable. Spurious discoveries can easily arise in high-dimensional data analysis due to enormous possibilities of such selections. How can we know statistically our discoveries better than those by chance? In this paper, we define a measure of goodness of spurious fit, which shows how good a response variable can be fitted by an optimally selected subset of covariates under the null model, and propose a simple and effective LAMM algorithm to compute it. It coincides with the maximum spurious correlation for linear models and can be regarded as a generalized maximum spurious correlation. We derive the asymptotic distribution of such goodness of spurious fit for generalized linear models and <i>L</i><sub>1</sub>-regression. Such an asymptotic distribution depends on the sample size, ambient dimension, the number of variables used in the fit, and the covariance information. It can be consistently estimated by multiplier bootstrapping and used as a benchmark to guard against spurious discoveries. It can also be applied to model selection, which considers only candidate models with goodness of fits better than those by spurious fits. The theory and method are convincingly illustrated by simulated examples and an application to the binary outcomes from German Neuroblastoma Trials.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"17 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5603346/pdf/nihms842539.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35535102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure discovery in Bayesian networks by sampling partial orders 抽样偏阶贝叶斯网络的结构发现
IF 6 3区 计算机科学
Journal of Machine Learning Research Pub Date : 2016-01-01 DOI: 10.5555/2946645.2946702
NiinimäkiTeppo, ParviainenPekka, KoivistoMikko
{"title":"Structure discovery in Bayesian networks by sampling partial orders","authors":"NiinimäkiTeppo, ParviainenPekka, KoivistoMikko","doi":"10.5555/2946645.2946702","DOIUrl":"https://doi.org/10.5555/2946645.2946702","url":null,"abstract":"We present methods based on Metropolis-coupled Markov chain Monte Carlo (MC3) and annealed importance sampling (AIS) for estimating the posterior distribution of Bayesian networks. The methods draw...","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"1 1","pages":""},"PeriodicalIF":6.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71138831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Choice of V for V-fold cross-validation in least-squares density estimation 最小二乘密度估计中V-fold交叉验证V值的选择
IF 6 3区 计算机科学
Journal of Machine Learning Research Pub Date : 2016-01-01 DOI: 10.5555/2946645.3053490
ArlotSylvain, LerasleMatthieu
{"title":"Choice of V for V-fold cross-validation in least-squares density estimation","authors":"ArlotSylvain, LerasleMatthieu","doi":"10.5555/2946645.3053490","DOIUrl":"https://doi.org/10.5555/2946645.3053490","url":null,"abstract":"This paper studies V-fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the least-squares ...","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"67 1","pages":""},"PeriodicalIF":6.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71138913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MOCCA: Mirrored Convex/Concave Optimization for Nonconvex Composite Functions. 非凸复合函数的镜像凸/凹优化。
IF 6 3区 计算机科学
Journal of Machine Learning Research Pub Date : 2016-01-01
Rina Foygel Barber, Emil Y Sidky
{"title":"MOCCA: Mirrored Convex/Concave Optimization for Nonconvex Composite Functions.","authors":"Rina Foygel Barber,&nbsp;Emil Y Sidky","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Many optimization problems arising in high-dimensional statistics decompose naturally into a sum of several terms, where the individual terms are relatively simple but the composite objective function can only be optimized with iterative algorithms. In this paper, we are interested in optimization problems of the form F(<i>Kx</i>) + G(<i>x</i>), where <i>K</i> is a fixed linear transformation, while F and G are functions that may be nonconvex and/or nondifferentiable. In particular, if either of the terms are nonconvex, existing alternating minimization techniques may fail to converge; other types of existing approaches may instead be unable to handle nondifferentiability. We propose the MOCCA (mirrored convex/concave) algorithm, a primal/dual optimization approach that takes a local convex approximation to each term at every iteration. Inspired by optimization problems arising in computed tomography (CT) imaging, this algorithm can handle a range of nonconvex composite optimization problems, and offers theoretical guarantees for convergence when the overall problem is approximately convex (that is, any concavity in one term is balanced out by convexity in the other term). Empirical results show fast convergence for several structured signal recovery problems.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"17 144","pages":"1-51"},"PeriodicalIF":6.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5789814/pdf/nihms870482.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35785739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The optimal sample complexity OF PAC learning PAC学习的最优样本复杂度
IF 6 3区 计算机科学
Journal of Machine Learning Research Pub Date : 2016-01-01 DOI: 10.5555/2946645.2946683
HannekeSteve
{"title":"The optimal sample complexity OF PAC learning","authors":"HannekeSteve","doi":"10.5555/2946645.2946683","DOIUrl":"https://doi.org/10.5555/2946645.2946683","url":null,"abstract":"This work establishes a new upper bound on the number of samples sufficient for PAC learning in the realizable case. The bound matches known lower bounds up to numerical constant factors. This solv...","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"1 1","pages":""},"PeriodicalIF":6.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71138825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Support Vector Hazards Machine: A Counting Process Framework for Learning Risk Scores for Censored Outcomes. 支持向量危险机:一个计算过程框架,用于学习审查结果的风险评分。
IF 6 3区 计算机科学
Journal of Machine Learning Research Pub Date : 2016-01-01 Epub Date: 2016-08-01
Yuanjia Wang, Tianle Chen, Donglin Zeng
{"title":"Support Vector Hazards Machine: A Counting Process Framework for Learning Risk Scores for Censored Outcomes.","authors":"Yuanjia Wang,&nbsp;Tianle Chen,&nbsp;Donglin Zeng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Learning risk scores to predict dichotomous or continuous outcomes using machine learning approaches has been studied extensively. However, how to learn risk scores for time-to-event outcomes subject to right censoring has received little attention until recently. Existing approaches rely on inverse probability weighting or rank-based regression, which may be inefficient. In this paper, we develop a new support vector hazards machine (SVHM) approach to predict censored outcomes. Our method is based on predicting the counting process associated with the time-to-event outcomes among subjects at risk via a series of support vector machines. Introducing counting processes to represent time-to-event data leads to a connection between support vector machines in supervised learning and hazards regression in standard survival analysis. To account for different at risk populations at observed event times, a time-varying offset is used in estimating risk scores. The resulting optimization is a convex quadratic programming problem that can easily incorporate non-linearity using kernel trick. We demonstrate an interesting link from the profiled empirical risk function of SVHM to the Cox partial likelihood. We then formally show that SVHM is optimal in discriminating covariate-specific hazard function from population average hazard function, and establish the consistency and learning rate of the predicted risk using the estimated risk scores. Simulation studies show improved prediction accuracy of the event times using SVHM compared to existing machine learning methods and standard conventional approaches. Finally, we analyze two real world biomedical study data where we use clinical markers and neuroimaging biomarkers to predict age-at-onset of a disease, and demonstrate superiority of SVHM in distinguishing high risk versus low risk subjects.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"17 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210213/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71434774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gradients weights improve regression and classification 梯度权重改进了回归和分类
IF 6 3区 计算机科学
Journal of Machine Learning Research Pub Date : 2016-01-01 DOI: 10.5555/2946645.2946667
KpotufeSamory, BoulariasAbdeslam, SchultzThomas, KimKyoungok
{"title":"Gradients weights improve regression and classification","authors":"KpotufeSamory, BoulariasAbdeslam, SchultzThomas, KimKyoungok","doi":"10.5555/2946645.2946667","DOIUrl":"https://doi.org/10.5555/2946645.2946667","url":null,"abstract":"In regression problems over Rd, the unknown function f often varies more in some coordinates than in others. We show that weighting each coordinate i according to an estimate of the variation of f ...","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"1 1","pages":""},"PeriodicalIF":6.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71138716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fused lasso approach in regression coefficients clustering 回归系数聚类的融合套索方法
IF 6 3区 计算机科学
Journal of Machine Learning Research Pub Date : 2016-01-01 DOI: 10.5555/2946645.3007066
TangLu
{"title":"Fused lasso approach in regression coefficients clustering","authors":"TangLu","doi":"10.5555/2946645.3007066","DOIUrl":"https://doi.org/10.5555/2946645.3007066","url":null,"abstract":"As data sets of related studies become more easily accessible, combining data sets of similar studies is often undertaken in practice to achieve a larger sample size and higher power. A major chall...","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"1 1","pages":""},"PeriodicalIF":6.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71138837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Input output kernel regression 输入输出核回归
IF 6 3区 计算机科学
Journal of Machine Learning Research Pub Date : 2016-01-01 DOI: 10.5555/2946645.3053458
BrouardCéline, SzafranskiMarie, D'Alché-BucFlorence
{"title":"Input output kernel regression","authors":"BrouardCéline, SzafranskiMarie, D'Alché-BucFlorence","doi":"10.5555/2946645.3053458","DOIUrl":"https://doi.org/10.5555/2946645.3053458","url":null,"abstract":"In this paper, we introduce a novel approach, called Input Output Kernel Regression (IOKR), for learning mappings between structured inputs and structured outputs. The approach belongs to the famil...","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"1 1","pages":""},"PeriodicalIF":6.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71138902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces. 发散模型空间中支持向量机的一致信息准则。
IF 6 3区 计算机科学
Journal of Machine Learning Research Pub Date : 2016-01-01
Xiang Zhang, Yichao Wu, Lan Wang, Runze Li
{"title":"A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces.","authors":"Xiang Zhang,&nbsp;Yichao Wu,&nbsp;Lan Wang,&nbsp;Runze Li","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Information criteria have been popularly used in model selection and proved to possess nice theoretical properties. For classification, Claeskens et al. (2008) proposed support vector machine information criterion for feature selection and provided encouraging numerical evidence. Yet no theoretical justification was given there. This work aims to fill the gap and to provide some theoretical justifications for support vector machine information criterion in both fixed and diverging model spaces. We first derive a uniform convergence rate for the support vector machine solution and then show that a modification of the support vector machine information criterion achieves model selection consistency even when the number of features diverges at an exponential rate of the sample size. This consistency result can be further applied to selecting the optimal tuning parameter for various penalized support vector machine methods. Finite-sample performance of the proposed information criterion is investigated using Monte Carlo studies and one real-world gene selection problem.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"17 16","pages":"1-26"},"PeriodicalIF":6.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4883123/pdf/nihms733772.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34435261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信