arXiv - STAT - Machine Learning最新文献_第10页

Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis 线性匪徒的修正元汤普森抽样及其贝叶斯后悔分析

arXiv - STAT - Machine Learning Pub Date : 2024-09-10 DOI: arxiv-2409.06329

Hao Li, Dong Liang, Zheng Xie

{"title":"Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis","authors":"Hao Li, Dong Liang, Zheng Xie","doi":"arxiv-2409.06329","DOIUrl":"https://doi.org/arxiv-2409.06329","url":null,"abstract":"Meta-learning is characterized by its ability to learn how to learn, enabling\u0000the adaptation of learning strategies across different tasks. Recent research\u0000introduced the Meta-Thompson Sampling (Meta-TS), which meta-learns an unknown\u0000prior distribution sampled from a meta-prior by interacting with bandit\u0000instances drawn from it. However, its analysis was limited to Gaussian bandit.\u0000The contextual multi-armed bandit framework is an extension of the Gaussian\u0000Bandit, which challenges agent to utilize context vectors to predict the most\u0000valuable arms, optimally balancing exploration and exploitation to minimize\u0000regret over time. This paper introduces Meta-TSLB algorithm, a modified Meta-TS\u0000for linear contextual bandits. We theoretically analyze Meta-TSLB and derive an\u0000$ Oleft( left( m+log left( m right) right) sqrt{nlog left( n right)}\u0000right)$ bound on its Bayes regret, in which $m$ represents the number of\u0000bandit instances, and $n$ the number of rounds of Thompson Sampling.\u0000Additionally, our work complements the analysis of Meta-TS for linear\u0000contextual bandits. The performance of Meta-TSLB is evaluated experimentally\u0000under different settings, and we experimente and analyze the generalization\u0000capability of Meta-TSLB, showcasing its potential to adapt to unseen instances.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LLMs Will Always Hallucinate, and We Need to Live With This 法学硕士总会产生幻觉，我们需要接受这一点

arXiv - STAT - Machine Learning Pub Date : 2024-09-09 DOI: arxiv-2409.05746

Sourav Banerjee, Ayushi Agarwal, Saloni Singla

{"title":"LLMs Will Always Hallucinate, and We Need to Live With This","authors":"Sourav Banerjee, Ayushi Agarwal, Saloni Singla","doi":"arxiv-2409.05746","DOIUrl":"https://doi.org/arxiv-2409.05746","url":null,"abstract":"As Large Language Models become more ubiquitous across domains, it becomes\u0000important to examine their inherent limitations critically. This work argues\u0000that hallucinations in language models are not just occasional errors but an\u0000inevitable feature of these systems. We demonstrate that hallucinations stem\u0000from the fundamental mathematical and logical structure of LLMs. It is,\u0000therefore, impossible to eliminate them through architectural improvements,\u0000dataset enhancements, or fact-checking mechanisms. Our analysis draws on\u0000computational theory and Godel's First Incompleteness Theorem, which references\u0000the undecidability of problems like the Halting, Emptiness, and Acceptance\u0000Problems. We demonstrate that every stage of the LLM process-from training data\u0000compilation to fact retrieval, intent classification, and text generation-will\u0000have a non-zero probability of producing hallucinations. This work introduces\u0000the concept of Structural Hallucination as an intrinsic nature of these\u0000systems. By establishing the mathematical certainty of hallucinations, we\u0000challenge the prevailing notion that they can be fully mitigated.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scalable Multitask Learning Using Gradient-based Estimation of Task Affinity 利用基于梯度的任务亲和性估计进行可扩展的多任务学习

arXiv - STAT - Machine Learning Pub Date : 2024-09-09 DOI: arxiv-2409.06091

Dongyue Li, Aneesh Sharma, Hongyang R. Zhang

{"title":"Scalable Multitask Learning Using Gradient-based Estimation of Task Affinity","authors":"Dongyue Li, Aneesh Sharma, Hongyang R. Zhang","doi":"arxiv-2409.06091","DOIUrl":"https://doi.org/arxiv-2409.06091","url":null,"abstract":"Multitask learning is a widely used paradigm for training models on diverse\u0000tasks, with applications ranging from graph neural networks to language model\u0000fine-tuning. Since tasks may interfere with each other, a key notion for\u0000modeling their relationships is task affinity. This includes pairwise task\u0000affinity, computed among pairs of tasks, and higher-order affinity, computed\u0000among subsets of tasks. Naively computing either of them requires repeatedly\u0000training on data from various task combinations, which is computationally\u0000intensive. We present a new algorithm Grad-TAG that can estimate task\u0000affinities without this repeated training. The key idea of Grad-TAG is to train a \"base\" model for all tasks and then\u0000use a linearization technique to estimate the loss of the model for a specific\u0000task combination. The linearization works by computing a gradient-based\u0000approximation of the loss, using low-dimensional projections of gradients as\u0000features in a logistic regression to predict labels for the task combination.\u0000We show that the linearized model can provably approximate the loss when the\u0000gradient-based approximation is accurate, and also empirically verify that on\u0000several large models. Then, given the estimated task affinity, we design a\u0000semi-definite program for clustering similar tasks by maximizing the average\u0000density of clusters. We evaluate Grad-TAG's performance across seven datasets, including\u0000multi-label classification on graphs, and instruction fine-tuning of language\u0000models. Our task affinity estimates are within 2.7% distance to the true\u0000affinities while needing only 3% of FLOPs in full training. On our largest\u0000graph with 21M edges and 500 labeling tasks, our algorithm delivers estimates\u0000within 5% distance to the true affinities, using only 112 GPU hours. Our\u0000results show that Grad-TAG achieves excellent performance and runtime tradeoffs\u0000compared to existing approaches.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving Pretraining Data Using Perplexity Correlations 利用复杂性相关性改进预训练数据

arXiv - STAT - Machine Learning Pub Date : 2024-09-09 DOI: arxiv-2409.05816

Tristan Thrush, Christopher Potts, Tatsunori Hashimoto

引用次数: 0

Unified Neural Network Scaling Laws and Scale-time Equivalence 统一神经网络缩放定律和尺度-时间等效性

arXiv - STAT - Machine Learning Pub Date : 2024-09-09 DOI: arxiv-2409.05782

Akhilan Boopathy, Ila Fiete

{"title":"Unified Neural Network Scaling Laws and Scale-time Equivalence","authors":"Akhilan Boopathy, Ila Fiete","doi":"arxiv-2409.05782","DOIUrl":"https://doi.org/arxiv-2409.05782","url":null,"abstract":"As neural networks continue to grow in size but datasets might not, it is\u0000vital to understand how much performance improvement can be expected: is it\u0000more important to scale network size or data volume? Thus, neural network\u0000scaling laws, which characterize how test error varies with network size and\u0000data volume, have become increasingly important. However, existing scaling laws\u0000are often applicable only in limited regimes and often do not incorporate or\u0000predict well-known phenomena such as double descent. Here, we present a novel\u0000theoretical characterization of how three factors -- model size, training time,\u0000and data volume -- interact to determine the performance of deep neural\u0000networks. We first establish a theoretical and empirical equivalence between\u0000scaling the size of a neural network and increasing its training time\u0000proportionally. Scale-time equivalence challenges the current practice, wherein\u0000large models are trained for small durations, and suggests that smaller models\u0000trained over extended periods could match their efficacy. It also leads to a\u0000novel method for predicting the performance of large-scale networks from\u0000small-scale networks trained for extended epochs, and vice versa. We next\u0000combine scale-time equivalence with a linear model analysis of double descent\u0000to obtain a unified theoretical scaling law, which we confirm with experiments\u0000across vision benchmarks and network architectures. These laws explain several\u0000previously unexplained phenomena: reduced data requirements for generalization\u0000in larger models, heightened sensitivity to label noise in overparameterized\u0000models, and instances where increasing model scale does not necessarily enhance\u0000performance. Our findings hold significant implications for the practical\u0000deployment of neural networks, offering a more accessible and efficient path to\u0000training and fine-tuning large models.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust Non-adaptive Group Testing under Errors in Group Membership Specifications 小组成员规格错误下的稳健非适应性小组测试

arXiv - STAT - Machine Learning Pub Date : 2024-09-09 DOI: arxiv-2409.05345

Shuvayan Banerjee, Radhendushka Srivastava, James Saunderson, Ajit Rajwade

{"title":"Robust Non-adaptive Group Testing under Errors in Group Membership Specifications","authors":"Shuvayan Banerjee, Radhendushka Srivastava, James Saunderson, Ajit Rajwade","doi":"arxiv-2409.05345","DOIUrl":"https://doi.org/arxiv-2409.05345","url":null,"abstract":"Given $p$ samples, each of which may or may not be defective, group testing\u0000(GT) aims to determine their defect status by performing tests on $n < p$\u0000`groups', where a group is formed by mixing a subset of the $p$ samples.\u0000Assuming that the number of defective samples is very small compared to $p$, GT\u0000algorithms have provided excellent recovery of the status of all $p$ samples\u0000with even a small number of groups. Most existing methods, however, assume that\u0000the group memberships are accurately specified. This assumption may not always\u0000be true in all applications, due to various resource constraints. Such errors\u0000could occur, eg, when a technician, preparing the groups in a laboratory,\u0000unknowingly mixes together an incorrect subset of samples as compared to what\u0000was specified. We develop a new GT method, the Debiased Robust Lasso Test\u0000Method (DRLT), that handles such group membership specification errors. The\u0000proposed DRLT method is based on an approach to debias, or reduce the inherent\u0000bias in, estimates produced by Lasso, a popular and effective sparse regression\u0000technique. We also provide theoretical upper bounds on the reconstruction error\u0000produced by our estimator. Our approach is then combined with two carefully\u0000designed hypothesis tests respectively for (i) the identification of defective\u0000samples in the presence of errors in group membership specifications, and (ii)\u0000the identification of groups with erroneous membership specifications. The DRLT\u0000approach extends the literature on bias mitigation of statistical estimators\u0000such as the LASSO, to handle the important case when some of the measurements\u0000contain outliers, due to factors such as group membership specification errors.\u0000We present numerical results which show that our approach outperforms several\u0000baselines and robust regression techniques for identification of defective\u0000samples as well as erroneously specified groups.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

K-Fold Causal BART for CATE Estimation 用于 CATE 估算的 K 折因果 BART

arXiv - STAT - Machine Learning Pub Date : 2024-09-09 DOI: arxiv-2409.05665

Hugo Gobato Souto, Francisco Louzada Neto

{"title":"K-Fold Causal BART for CATE Estimation","authors":"Hugo Gobato Souto, Francisco Louzada Neto","doi":"arxiv-2409.05665","DOIUrl":"https://doi.org/arxiv-2409.05665","url":null,"abstract":"This research aims to propose and evaluate a novel model named K-Fold Causal\u0000Bayesian Additive Regression Trees (K-Fold Causal BART) for improved estimation\u0000of Average Treatment Effects (ATE) and Conditional Average Treatment Effects\u0000(CATE). The study employs synthetic and semi-synthetic datasets, including the\u0000widely recognized Infant Health and Development Program (IHDP) benchmark\u0000dataset, to validate the model's performance. Despite promising results in\u0000synthetic scenarios, the IHDP dataset reveals that the proposed model is not\u0000state-of-the-art for ATE and CATE estimation. Nonetheless, the research\u0000provides several novel insights: 1. The ps-BART model is likely the preferred\u0000choice for CATE and ATE estimation due to better generalization compared to the\u0000other benchmark models - including the Bayesian Causal Forest (BCF) model,\u0000which is considered by many the current best model for CATE estimation, 2. The\u0000BCF model's performance deteriorates significantly with increasing treatment\u0000effect heterogeneity, while the ps-BART model remains robust, 3. Models tend to\u0000be overconfident in CATE uncertainty quantification when treatment effect\u0000heterogeneity is low, 4. A second K-Fold method is unnecessary for avoiding\u0000overfitting in CATE estimation, as it adds computational costs without\u0000improving performance, 5. Detailed analysis reveals the importance of\u0000understanding dataset characteristics and using nuanced evaluation methods, 6.\u0000The conclusion of Curth et al. (2021) that indirect strategies for CATE\u0000estimation are superior for the IHDP dataset is contradicted by the results of\u0000this research. These findings challenge existing assumptions and suggest\u0000directions for future research to enhance causal inference methodologies.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting 用图形触发技术连接静止和不静止的强盗：崛起与腐烂

arXiv - STAT - Machine Learning Pub Date : 2024-09-09 DOI: arxiv-2409.05980

Gianmarco Genalti, Marco Mussi, Nicola Gatti, Marcello Restelli, Matteo Castiglioni, Alberto Maria Metelli

{"title":"Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting","authors":"Gianmarco Genalti, Marco Mussi, Nicola Gatti, Marcello Restelli, Matteo Castiglioni, Alberto Maria Metelli","doi":"arxiv-2409.05980","DOIUrl":"https://doi.org/arxiv-2409.05980","url":null,"abstract":"Rested and Restless Bandits are two well-known bandit settings that are\u0000useful to model real-world sequential decision-making problems in which the\u0000expected reward of an arm evolves over time due to the actions we perform or\u0000due to the nature. In this work, we propose Graph-Triggered Bandits (GTBs), a\u0000unifying framework to generalize and extend rested and restless bandits. In\u0000this setting, the evolution of the arms' expected rewards is governed by a\u0000graph defined over the arms. An edge connecting a pair of arms $(i,j)$\u0000represents the fact that a pull of arm $i$ triggers the evolution of arm $j$,\u0000and vice versa. Interestingly, rested and restless bandits are both special\u0000cases of our model for some suitable (degenerated) graph. As relevant case\u0000studies for this setting, we focus on two specific types of monotonic bandits:\u0000rising, where the expected reward of an arm grows as the number of triggers\u0000increases, and rotting, where the opposite behavior occurs. For these cases, we\u0000study the optimal policies. We provide suitable algorithms for all scenarios\u0000and discuss their theoretical guarantees, highlighting the complexity of the\u0000learning problem concerning instance-dependent terms that encode specific\u0000properties of the underlying graph structure.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving Tree Probability Estimation with Stochastic Optimization and Variance Reduction 利用随机优化和方差缩小改进树概率估计

arXiv - STAT - Machine Learning Pub Date : 2024-09-09 DOI: arxiv-2409.05282

Tianyu Xie, Musu Yuan, Minghua Deng, Cheng Zhang

引用次数: 0

Breaking Neural Network Scaling Laws with Modularity 用模块化打破神经网络扩展法则

arXiv - STAT - Machine Learning Pub Date : 2024-09-09 DOI: arxiv-2409.05780

Akhilan Boopathy, Sunshine Jiang, William Yue, Jaedong Hwang, Abhiram Iyer, Ila Fiete

{"title":"Breaking Neural Network Scaling Laws with Modularity","authors":"Akhilan Boopathy, Sunshine Jiang, William Yue, Jaedong Hwang, Abhiram Iyer, Ila Fiete","doi":"arxiv-2409.05780","DOIUrl":"https://doi.org/arxiv-2409.05780","url":null,"abstract":"Modular neural networks outperform nonmodular neural networks on tasks\u0000ranging from visual question answering to robotics. These performance\u0000improvements are thought to be due to modular networks' superior ability to\u0000model the compositional and combinatorial structure of real-world problems.\u0000However, a theoretical explanation of how modularity improves generalizability,\u0000and how to leverage task modularity while training networks remains elusive.\u0000Using recent theoretical progress in explaining neural network generalization,\u0000we investigate how the amount of training data required to generalize on a task\u0000varies with the intrinsic dimensionality of a task's input. We show\u0000theoretically that when applied to modularly structured tasks, while nonmodular\u0000networks require an exponential number of samples with task dimensionality,\u0000modular networks' sample complexity is independent of task dimensionality:\u0000modular networks can generalize in high dimensions. We then develop a novel\u0000learning rule for modular networks to exploit this advantage and empirically\u0000show the improved generalization of the rule, both in- and out-of-distribution,\u0000on high-dimensional, modular tasks.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0