International Conference on Algorithmic Learning Theory最新文献_第3页

Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares 最小二乘上重尾随机梯度下降算法的稳定性

International Conference on Algorithmic Learning Theory Pub Date : 2022-06-02 DOI: 10.48550/arXiv.2206.01274

Anant Raj, Melih Barsbey, M. Gürbüzbalaban, Lingjiong Zhu, Umut Simsekli

{"title":"Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares","authors":"Anant Raj, Melih Barsbey, M. Gürbüzbalaban, Lingjiong Zhu, Umut Simsekli","doi":"10.48550/arXiv.2206.01274","DOIUrl":"https://doi.org/10.48550/arXiv.2206.01274","url":null,"abstract":"Recent studies have shown that heavy tails can emerge in stochastic optimization and that the heaviness of the tails have links to the generalization error. While these studies have shed light on interesting aspects of the generalization behavior in modern settings, they relied on strong topological and statistical regularity assumptions, which are hard to verify in practice. Furthermore, it has been empirically illustrated that the relation between heavy tails and generalization might not always be monotonic in practice, contrary to the conclusions of existing theory. In this study, we establish novel links between the tail behavior and generalization properties of stochastic gradient descent (SGD), through the lens of algorithmic stability. We consider a quadratic optimization problem and use a heavy-tailed stochastic differential equation (and its Euler discretization) as a proxy for modeling the heavy-tailed behavior emerging in SGD. We then prove uniform stability bounds, which reveal the following outcomes: (i) Without making any exotic assumptions, we show that SGD will not be stable if the stability is measured with the squared-loss $xmapsto x^2$, whereas it in turn becomes stable if the stability is instead measured with a surrogate loss $xmapsto |x|^p$ with some $p<2$. (ii) Depending on the variance of the data, there exists a emph{`threshold of heavy-tailedness'} such that the generalization error decreases as the tails become heavier, as long as the tails are lighter than this threshold. This suggests that the relation between heavy tails and generalization is not globally monotonic. (iii) We prove matching lower-bounds on uniform stability, implying that our bounds are tight in terms of the heaviness of the tails. We support our theory with synthetic and real neural network experiments.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114435234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Tournaments, Johnson Graphs, and NC-Teaching 锦标赛，约翰逊图，和nc教学

International Conference on Algorithmic Learning Theory Pub Date : 2022-05-05 DOI: 10.48550/arXiv.2205.02792

H. Simon

{"title":"Tournaments, Johnson Graphs, and NC-Teaching","authors":"H. Simon","doi":"10.48550/arXiv.2205.02792","DOIUrl":"https://doi.org/10.48550/arXiv.2205.02792","url":null,"abstract":"Quite recently a teaching model, called\"No-Clash Teaching\"or simply\"NC-Teaching\", had been suggested that is provably optimal in the following strong sense. First, it satisfies Goldman and Matthias' collusion-freeness condition. Second, the NC-teaching dimension (= NCTD) is smaller than or equal to the teaching dimension with respect to any other collusion-free teaching model. It has also been shown that any concept class which has NC-teaching dimension $d$ and is defined over a domain of size $n$ can have at most $2^d binom{n}{d}$ concepts. The main results in this paper are as follows. First, we characterize the maximum concept classes of NC-teaching dimension $1$ as classes which are induced by tournaments (= complete oriented graphs) in a very natural way. Second, we show that there exists a family $(cC_n)_{nge1}$ of concept classes such that the well known recursive teaching dimension (= RTD) of $cC_n$ grows logarithmically in $n = |cC_n|$ while, for every $nge1$, the NC-teaching dimension of $cC_n$ equals $1$. Since the recursive teaching dimension of a finite concept class $cC$ is generally bounded $log|cC|$, the family $(cC_n)_{nge1}$ separates RTD from NCTD in the most striking way. The proof of existence of the family $(cC_n)_{nge1}$ makes use of the probabilistic method and random tournaments. Third, we improve the afore-mentioned upper bound $2^dbinom{n}{d}$ by a factor of order $sqrt{d}$. The verification of the superior bound makes use of Johnson graphs and maximum subgraphs not containing large narrow cliques.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129930101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Implicit Parameter-free Online Learning with Truncated Linear Models 截断线性模型的隐式无参数在线学习

International Conference on Algorithmic Learning Theory Pub Date : 2022-03-19 DOI: 10.48550/arXiv.2203.10327

Keyi Chen, Ashok Cutkosky, Francesco Orabona

{"title":"Implicit Parameter-free Online Learning with Truncated Linear Models","authors":"Keyi Chen, Ashok Cutkosky, Francesco Orabona","doi":"10.48550/arXiv.2203.10327","DOIUrl":"https://doi.org/10.48550/arXiv.2203.10327","url":null,"abstract":"Parameter-free algorithms are online learning algorithms that do not require setting learning rates. They achieve optimal regret with respect to the distance between the initial point and any competitor. Yet, parameter-free algorithms do not take into account the geometry of the losses. Recently, in the stochastic optimization literature, it has been proposed to instead use truncated linear lower bounds, which produce better performance by more closely modeling the losses. In particular, truncated linear models greatly reduce the problem of overshooting the minimum of the loss function. Unfortunately, truncated linear models cannot be used with parameter-free algorithms because the updates become very expensive to compute. In this paper, we propose new parameter-free algorithms that can take advantage of truncated linear models through a new update that has an\"implicit\"flavor. Based on a novel decomposition of the regret, the new update is efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter-free properties. We also conduct an empirical study demonstrating the practical utility of our algorithms.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131508395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Efficient and Optimal Fixed-Time Regret with Two Experts 两个专家的有效和最优的固定时间后悔

International Conference on Algorithmic Learning Theory Pub Date : 2022-03-15 DOI: 10.48550/arXiv.2203.07577

L. Greenstreet, Nicholas J. A. Harvey, Victor S. Portella

引用次数: 3

Metric Entropy Duality and the Sample Complexity of Outcome Indistinguishability 度量熵对偶性与结果不可区分性的样本复杂度

International Conference on Algorithmic Learning Theory Pub Date : 2022-03-09 DOI: 10.48550/arXiv.2203.04536

Lunjia Hu, Charlotte Peale, Omer Reingold

{"title":"Metric Entropy Duality and the Sample Complexity of Outcome Indistinguishability","authors":"Lunjia Hu, Charlotte Peale, Omer Reingold","doi":"10.48550/arXiv.2203.04536","DOIUrl":"https://doi.org/10.48550/arXiv.2203.04536","url":null,"abstract":"We give the first sample complexity characterizations for outcome indistinguishability, a theoretical framework of machine learning recently introduced by Dwork, Kim, Reingold, Rothblum, and Yona (STOC 2021). In outcome indistinguishability, the goal of the learner is to output a predictor that cannot be distinguished from the target predictor by a class $D$ of distinguishers examining the outcomes generated according to the predictors' predictions. In the distribution-specific and realizable setting where the learner is given the data distribution together with a predictor class $P$ containing the target predictor, we show that the sample complexity of outcome indistinguishability is characterized by the metric entropy of $P$ w.r.t. the dual Minkowski norm defined by $D$, and equivalently by the metric entropy of $D$ w.r.t. the dual Minkowski norm defined by $P$. This equivalence makes an intriguing connection to the long-standing metric entropy duality conjecture in convex geometry. Our sample complexity characterization implies a variant of metric entropy duality, which we show is nearly tight. In the distribution-free setting, we focus on the case considered by Dwork et al. where $P$ contains all possible predictors, hence the sample complexity only depends on $D$. In this setting, we show that the sample complexity of outcome indistinguishability is characterized by the fat-shattering dimension of $D$. We also show a strong sample complexity separation between realizable and agnostic outcome indistinguishability in both the distribution-free and the distribution-specific settings. This is in contrast to distribution-free (resp. distribution-specific) PAC learning where the sample complexity in both the realizable and the agnostic settings can be characterized by the VC dimension (resp. metric entropy).","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132145881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Leveraging Initial Hints for Free in Stochastic Linear Bandits 利用随机线性强盗的初始提示

International Conference on Algorithmic Learning Theory Pub Date : 2022-03-08 DOI: 10.48550/arXiv.2203.04274

Ashok Cutkosky, Christoph Dann, Abhimanyu Das, Qiuyi Zhang

引用次数: 1

Adversarially Robust Learning with Tolerance 对抗性稳健学习与宽容

International Conference on Algorithmic Learning Theory Pub Date : 2022-03-02 DOI: 10.48550/arXiv.2203.00849

H. Ashtiani, Vinayak Pathak, Ruth Urner

{"title":"Adversarially Robust Learning with Tolerance","authors":"H. Ashtiani, Vinayak Pathak, Ruth Urner","doi":"10.48550/arXiv.2203.00849","DOIUrl":"https://doi.org/10.48550/arXiv.2203.00849","url":null,"abstract":"We initiate the study of tolerant adversarial PAC-learning with respect to metric perturbation sets. In adversarial PAC-learning, an adversary is allowed to replace a test point $x$ with an arbitrary point in a closed ball of radius $r$ centered at $x$. In the tolerant version, the error of the learner is compared with the best achievable error with respect to a slightly larger perturbation radius $(1+gamma)r$. This simple tweak helps us bridge the gap between theory and practice and obtain the first PAC-type guarantees for algorithmic techniques that are popular in practice. Our first result concerns the widely-used ``perturb-and-smooth'' approach for adversarial learning. For perturbation sets with doubling dimension $d$, we show that a variant of these approaches PAC-learns any hypothesis class $mathcal{H}$ with VC-dimension $v$ in the $gamma$-tolerant adversarial setting with $Oleft(frac{v(1+1/gamma)^{O(d)}}{varepsilon}right)$ samples. This is in contrast to the traditional (non-tolerant) setting in which, as we show, the perturb-and-smooth approach can provably fail. Our second result shows that one can PAC-learn the same class using $widetilde{O}left(frac{d.vlog(1+1/gamma)}{varepsilon^2}right)$ samples even in the agnostic setting. This result is based on a novel compression-based algorithm, and achieves a linear dependence on the doubling dimension as well as the VC-dimension. This is in contrast to the non-tolerant setting where there is no known sample complexity upper bound that depend polynomially on the VC-dimension.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129477997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

La educación ambiental en los medios televisivos. Estudio de caso: Oromar TV 电视媒体的环境教育。案例研究:Oromar TV

International Conference on Algorithmic Learning Theory Pub Date : 2019-12-20 DOI: 10.17163/alt.v15n1.2020.10

Erik Alexander Cumba Castro

{"title":"La educación ambiental en los medios televisivos. Estudio de caso: Oromar TV","authors":"Erik Alexander Cumba Castro","doi":"10.17163/alt.v15n1.2020.10","DOIUrl":"https://doi.org/10.17163/alt.v15n1.2020.10","url":null,"abstract":"The current research article has the purpose of analyzing environmental education in television media in the province of Manabi. For which, it was decided to take the Oromar TV channel as a case study. This with the objective of measuring the social impact caused by the mass media in regard to the awareness and care of the environment in this province. In addition to examining the production of training content focused on environmental education within the programming of this channel. The methodology that was applied for the investigation is of qualitative type, so that the technique of documentary analysis was used for the revision of the programming of the Oromar TV channel, this was carried out in a sample period of two months. The results obtained show that there are shortcomings in the programming of the Oromar TV channel, due to the scarce productions of educational content. Therefore, it is concluded that, in the absence of an increase in training television programs, and total absence of specialized productions in the area of environmental education in the Oromar TV channel, that could cause a lack of knowledge in the television audience in regard to in matters of prevention and care of the environment in the province of Manabi.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133087933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Intrinsic Complexity of Partial Learning 部分学习的内在复杂性

International Conference on Algorithmic Learning Theory Pub Date : 2019-07-01 DOI: 10.1007/978-3-319-46379-7_12

Sanjay Jain, E. Kinber

引用次数: 0

Learning Pattern Languages over Groups 在群体中学习模式语言

International Conference on Algorithmic Learning Theory Pub Date : 2017-12-01 DOI: 10.1007/978-3-319-46379-7_13

R. Hölzl, Sanjay Jain, F. Stephan

引用次数: 5