arXiv - CS - Machine Learning最新文献

筛选
英文 中文
Machine Learning for Public Good: Predicting Urban Crime Patterns to Enhance Community Safety 机器学习促进公益:预测城市犯罪模式以加强社区安全
arXiv - CS - Machine Learning Pub Date : 2024-09-17 DOI: arxiv-2409.10838
Sia Gupta, Simeon Sayer
{"title":"Machine Learning for Public Good: Predicting Urban Crime Patterns to Enhance Community Safety","authors":"Sia Gupta, Simeon Sayer","doi":"arxiv-2409.10838","DOIUrl":"https://doi.org/arxiv-2409.10838","url":null,"abstract":"In recent years, urban safety has become a paramount concern for city\u0000planners and law enforcement agencies. Accurate prediction of likely crime\u0000occurrences can significantly enhance preventive measures and resource\u0000allocation. However, many law enforcement departments lack the tools to analyze\u0000and apply advanced AI and ML techniques that can support city planners, watch\u0000programs, and safety leaders to take proactive steps towards overall community\u0000safety. This paper explores the effectiveness of ML techniques to predict spatial and\u0000temporal patterns of crimes in urban areas. Leveraging police dispatch call\u0000data from San Jose, CA, the research goal is to achieve a high degree of\u0000accuracy in categorizing calls into priority levels particularly for more\u0000dangerous situations that require an immediate law enforcement response. This\u0000categorization is informed by the time, place, and nature of the call. The\u0000research steps include data extraction, preprocessing, feature engineering,\u0000exploratory data analysis, implementation, optimization and tuning of different\u0000supervised machine learning models and neural networks. The accuracy and\u0000precision are examined for different models and features at varying granularity\u0000of crime categories and location precision. The results demonstrate that when compared to a variety of other models,\u0000Random Forest classification models are most effective in identifying dangerous\u0000situations and their corresponding priority levels with high accuracy (Accuracy\u0000= 85%, AUC = 0.92) at a local level while ensuring a minimum amount of false\u0000negatives. While further research and data gathering is needed to include other\u0000social and economic factors, these results provide valuable insights for law\u0000enforcement agencies to optimize resources, develop proactive deployment\u0000approaches, and adjust response patterns to enhance overall public safety\u0000outcomes in an unbiased way.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Time Series Reasoning with LLMs 利用 LLM 实现时间序列推理
arXiv - CS - Machine Learning Pub Date : 2024-09-17 DOI: arxiv-2409.11376
Winnie Chow, Lauren Gardiner, Haraldur T. Hallgrímsson, Maxwell A. Xu, Shirley You Ren
{"title":"Towards Time Series Reasoning with LLMs","authors":"Winnie Chow, Lauren Gardiner, Haraldur T. Hallgrímsson, Maxwell A. Xu, Shirley You Ren","doi":"arxiv-2409.11376","DOIUrl":"https://doi.org/arxiv-2409.11376","url":null,"abstract":"Multi-modal large language models (MLLMs) have enabled numerous advances in\u0000understanding and reasoning in domains like vision, but we have not yet seen\u0000this broad success for time-series. Although prior works on time-series MLLMs\u0000have shown promising performance in time-series forecasting, very few works\u0000show how an LLM could be used for time-series reasoning in natural language. We\u0000propose a novel multi-modal time-series LLM approach that learns generalizable\u0000information across various domains with powerful zero-shot performance. First,\u0000we train a lightweight time-series encoder on top of an LLM to directly extract\u0000time-series information. Then, we fine-tune our model with chain-of-thought\u0000augmented time-series tasks to encourage the model to generate reasoning paths.\u0000We show that our model learns a latent representation that reflects specific\u0000time-series features (e.g. slope, frequency), as well as outperforming GPT-4o\u0000on a set of zero-shot reasoning tasks on a variety of domains.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Time-Series Forecasting, Knowledge Distillation, and Refinement within a Multimodal PDE Foundation Model 多模态 PDE 基础模型中的时间序列预测、知识提炼和完善
arXiv - CS - Machine Learning Pub Date : 2024-09-17 DOI: arxiv-2409.11609
Derek Jollie, Jingmin Sun, Zecheng Zhang, Hayden Schaeffer
{"title":"Time-Series Forecasting, Knowledge Distillation, and Refinement within a Multimodal PDE Foundation Model","authors":"Derek Jollie, Jingmin Sun, Zecheng Zhang, Hayden Schaeffer","doi":"arxiv-2409.11609","DOIUrl":"https://doi.org/arxiv-2409.11609","url":null,"abstract":"Symbolic encoding has been used in multi-operator learning as a way to embed\u0000additional information for distinct time-series data. For spatiotemporal\u0000systems described by time-dependent partial differential equations, the\u0000equation itself provides an additional modality to identify the system. The\u0000utilization of symbolic expressions along side time-series samples allows for\u0000the development of multimodal predictive neural networks. A key challenge with\u0000current approaches is that the symbolic information, i.e. the equations, must\u0000be manually preprocessed (simplified, rearranged, etc.) to match and relate to\u0000the existing token library, which increases costs and reduces flexibility,\u0000especially when dealing with new differential equations. We propose a new token\u0000library based on SymPy to encode differential equations as an additional\u0000modality for time-series models. The proposed approach incurs minimal cost, is\u0000automated, and maintains high prediction accuracy for forecasting tasks.\u0000Additionally, we include a Bayesian filtering module that connects the\u0000different modalities to refine the learned equation. This improves the accuracy\u0000of the learned symbolic representation and the predicted time-series.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fair Anomaly Detection For Imbalanced Groups 针对不平衡群体的公平异常检测
arXiv - CS - Machine Learning Pub Date : 2024-09-17 DOI: arxiv-2409.10951
Ziwei Wu, Lecheng Zheng, Yuancheng Yu, Ruizhong Qiu, John Birge, Jingrui He
{"title":"Fair Anomaly Detection For Imbalanced Groups","authors":"Ziwei Wu, Lecheng Zheng, Yuancheng Yu, Ruizhong Qiu, John Birge, Jingrui He","doi":"arxiv-2409.10951","DOIUrl":"https://doi.org/arxiv-2409.10951","url":null,"abstract":"Anomaly detection (AD) has been widely studied for decades in many real-world\u0000applications, including fraud detection in finance, and intrusion detection for\u0000cybersecurity, etc. Due to the imbalanced nature between protected and\u0000unprotected groups and the imbalanced distributions of normal examples and\u0000anomalies, the learning objectives of most existing anomaly detection methods\u0000tend to solely concentrate on the dominating unprotected group. Thus, it has\u0000been recognized by many researchers about the significance of ensuring model\u0000fairness in anomaly detection. However, the existing fair anomaly detection\u0000methods tend to erroneously label most normal examples from the protected group\u0000as anomalies in the imbalanced scenario where the unprotected group is more\u0000abundant than the protected group. This phenomenon is caused by the improper\u0000design of learning objectives, which statistically focus on learning the\u0000frequent patterns (i.e., the unprotected group) while overlooking the\u0000under-represented patterns (i.e., the protected group). To address these\u0000issues, we propose FairAD, a fairness-aware anomaly detection method targeting\u0000the imbalanced scenario. It consists of a fairness-aware contrastive learning\u0000module and a rebalancing autoencoder module to ensure fairness and handle the\u0000imbalanced data issue, respectively. Moreover, we provide the theoretical\u0000analysis that shows our proposed contrastive learning regularization guarantees\u0000group fairness. Empirical studies demonstrate the effectiveness and efficiency\u0000of FairAD across multiple real-world datasets.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Relative Representations: Topological and Geometric Perspectives 相对表示法:拓扑和几何视角
arXiv - CS - Machine Learning Pub Date : 2024-09-17 DOI: arxiv-2409.10967
Alejandro García-Castellanos, Giovanni Luca Marchetti, Danica Kragic, Martina Scolamiero
{"title":"Relative Representations: Topological and Geometric Perspectives","authors":"Alejandro García-Castellanos, Giovanni Luca Marchetti, Danica Kragic, Martina Scolamiero","doi":"arxiv-2409.10967","DOIUrl":"https://doi.org/arxiv-2409.10967","url":null,"abstract":"Relative representations are an established approach to zero-shot model\u0000stitching, consisting of a non-trainable transformation of the latent space of\u0000a deep neural network. Based on insights of topological and geometric nature,\u0000we propose two improvements to relative representations. First, we introduce a\u0000normalization procedure in the relative transformation, resulting in invariance\u0000to non-isotropic rescalings and permutations. The latter coincides with the\u0000symmetries in parameter space induced by common activation functions. Second,\u0000we propose to deploy topological densification when fine-tuning relative\u0000representations, a topological regularization loss encouraging clustering\u0000within classes. We provide an empirical investigation on a natural language\u0000task, where both the proposed variations yield improved performance on\u0000zero-shot model stitching.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implicit Reasoning in Deep Time Series Forecasting 深度时间序列预测中的隐含推理
arXiv - CS - Machine Learning Pub Date : 2024-09-17 DOI: arxiv-2409.10840
Willa Potosnak, Cristian Challu, Mononito Goswami, Michał Wiliński, Nina Żukowska
{"title":"Implicit Reasoning in Deep Time Series Forecasting","authors":"Willa Potosnak, Cristian Challu, Mononito Goswami, Michał Wiliński, Nina Żukowska","doi":"arxiv-2409.10840","DOIUrl":"https://doi.org/arxiv-2409.10840","url":null,"abstract":"Recently, time series foundation models have shown promising zero-shot\u0000forecasting performance on time series from a wide range of domains. However,\u0000it remains unclear whether their success stems from a true understanding of\u0000temporal dynamics or simply from memorizing the training data. While implicit\u0000reasoning in language models has been studied, similar evaluations for time\u0000series models have been largely unexplored. This work takes an initial step\u0000toward assessing the reasoning abilities of deep time series forecasting\u0000models. We find that certain linear, MLP-based, and patch-based Transformer\u0000models generalize effectively in systematically orchestrated\u0000out-of-distribution scenarios, suggesting underexplored reasoning capabilities\u0000beyond simple pattern memorization.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated Learning with Integrated Sensing, Communication, and Computation: Frameworks and Performance Analysis 集成传感、通信和计算的联合学习:框架和性能分析
arXiv - CS - Machine Learning Pub Date : 2024-09-17 DOI: arxiv-2409.11240
Yipeng Liang, Qimei Chen, Hao Jiang
{"title":"Federated Learning with Integrated Sensing, Communication, and Computation: Frameworks and Performance Analysis","authors":"Yipeng Liang, Qimei Chen, Hao Jiang","doi":"arxiv-2409.11240","DOIUrl":"https://doi.org/arxiv-2409.11240","url":null,"abstract":"With the emergence of integrated sensing, communication, and computation\u0000(ISCC) in the upcoming 6G era, federated learning with ISCC (FL-ISCC),\u0000integrating sample collection, local training, and parameter exchange and\u0000aggregation, has garnered increasing interest for enhancing training\u0000efficiency. Currently, FL-ISCC primarily includes two algorithms: FedAVG-ISCC\u0000and FedSGD-ISCC. However, the theoretical understanding of the performance and\u0000advantages of these algorithms remains limited. To address this gap, we\u0000investigate a general FL-ISCC framework, implementing both FedAVG-ISCC and\u0000FedSGD-ISCC. We experimentally demonstrate the substantial potential of the\u0000ISCC framework in reducing latency and energy consumption in FL. Furthermore,\u0000we provide a theoretical analysis and comparison. The results reveal that:1)\u0000Both sample collection and communication errors negatively impact algorithm\u0000performance, highlighting the need for careful design to optimize FL-ISCC\u0000applications. 2) FedAVG-ISCC performs better than FedSGD-ISCC under IID data\u0000due to its advantage with multiple local updates. 3) FedSGD-ISCC is more robust\u0000than FedAVG-ISCC under non-IID data, where the multiple local updates in\u0000FedAVG-ISCC worsen performance as non-IID data increases. FedSGD-ISCC maintains\u0000performance levels similar to IID conditions. 4) FedSGD-ISCC is more resilient\u0000to communication errors than FedAVG-ISCC, which suffers from significant\u0000performance degradation as communication errors increase.Extensive simulations\u0000confirm the effectiveness of the FL-ISCC framework and validate our theoretical\u0000analysis.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression algorithms 使用 Parquet 数据集格式和混合精度训练回归算法,改善机器学习的碳足迹
arXiv - CS - Machine Learning Pub Date : 2024-09-17 DOI: arxiv-2409.11071
Andrew Antonopoulos
{"title":"Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression algorithms","authors":"Andrew Antonopoulos","doi":"arxiv-2409.11071","DOIUrl":"https://doi.org/arxiv-2409.11071","url":null,"abstract":"This study was the 2nd part of my dissertation for my master degree and\u0000compared the power consumption using the Comma-Separated-Values (CSV) and\u0000parquet dataset format with the default floating point (32bit) and Nvidia mixed\u0000precision (16bit and 32bit) while training a regression ML model. The same\u0000custom PC as per the 1st part, which was dedicated to the classification\u0000testing and analysis, was built to perform the experiments, and different ML\u0000hyper-parameters, such as batch size, neurons, and epochs, were chosen to build\u0000Deep Neural Networks (DNN). A benchmarking test with default hyper-parameter\u0000values for the DNN was used as a reference, while the experiments used a\u0000combination of different settings. The results were recorded in Excel, and\u0000descriptive statistics were chosen to calculate the mean between the groups and\u0000compare them using graphs and tables. The outcome was positive when using mixed\u0000precision combined with specific hyper-parameters. Compared to the\u0000benchmarking, optimising the regression models reduced the power consumption\u0000between 7 and 11 Watts. The regression results show that while mixed precision\u0000can help improve power consumption, we must carefully consider the\u0000hyper-parameters. A high number of batch sizes and neurons will negatively\u0000affect power consumption. However, this research required inferential\u0000statistics, specifically ANOVA and T-test, to compare the relationship between\u0000the means. The results reported no statistical significance between the means\u0000in the regression tests and accepted H0. Therefore, choosing different ML\u0000techniques and the Parquet dataset format will not improve the computational\u0000power consumption and the overall ML carbon footprint. However, a more\u0000extensive implementation with a cluster of GPUs can increase the sample size\u0000significantly, as it is an essential factor and can change the outcome of the\u0000statistical analysis.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GINTRIP: Interpretable Temporal Graph Regression using Information bottleneck and Prototype-based method GINTRIP:利用信息瓶颈和基于原型的方法实现可解释的时序图回归
arXiv - CS - Machine Learning Pub Date : 2024-09-17 DOI: arxiv-2409.10996
Ali Royat, Seyed Mohamad Moghadas, Lesley De Cruz, Adrian Munteanu
{"title":"GINTRIP: Interpretable Temporal Graph Regression using Information bottleneck and Prototype-based method","authors":"Ali Royat, Seyed Mohamad Moghadas, Lesley De Cruz, Adrian Munteanu","doi":"arxiv-2409.10996","DOIUrl":"https://doi.org/arxiv-2409.10996","url":null,"abstract":"Deep neural networks (DNNs) have demonstrated remarkable performance across\u0000various domains, yet their application to temporal graph regression tasks faces\u0000significant challenges regarding interpretability. This critical issue, rooted\u0000in the inherent complexity of both DNNs and underlying spatio-temporal patterns\u0000in the graph, calls for innovative solutions. While interpretability concerns\u0000in Graph Neural Networks (GNNs) mirror those of DNNs, to the best of our\u0000knowledge, no notable work has addressed the interpretability of temporal GNNs\u0000using a combination of Information Bottleneck (IB) principles and\u0000prototype-based methods. Our research introduces a novel approach that uniquely\u0000integrates these techniques to enhance the interpretability of temporal graph\u0000regression models. The key contributions of our work are threefold: We\u0000introduce the underline{G}raph underline{IN}terpretability in\u0000underline{T}emporal underline{R}egression task using underline{I}nformation\u0000bottleneck and underline{P}rototype (GINTRIP) framework, the first combined\u0000application of IB and prototype-based methods for interpretable temporal graph\u0000tasks. We derive a novel theoretical bound on mutual information (MI),\u0000extending the applicability of IB principles to graph regression tasks. We\u0000incorporate an unsupervised auxiliary classification head, fostering multi-task\u0000learning and diverse concept representation, which enhances the model\u0000bottleneck's interpretability. Our model is evaluated on real-world traffic\u0000datasets, outperforming existing methods in both forecasting accuracy and\u0000interpretability-related metrics.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SOAP: Improving and Stabilizing Shampoo using Adam SOAP:使用亚当改进和稳定洗发水
arXiv - CS - Machine Learning Pub Date : 2024-09-17 DOI: arxiv-2409.11321
Nikhil Vyas, Depen Morwani, Rosie Zhao, Itai Shapira, David Brandfonbrener, Lucas Janson, Sham Kakade
{"title":"SOAP: Improving and Stabilizing Shampoo using Adam","authors":"Nikhil Vyas, Depen Morwani, Rosie Zhao, Itai Shapira, David Brandfonbrener, Lucas Janson, Sham Kakade","doi":"arxiv-2409.11321","DOIUrl":"https://doi.org/arxiv-2409.11321","url":null,"abstract":"There is growing evidence of the effectiveness of Shampoo, a higher-order\u0000preconditioning method, over Adam in deep learning optimization tasks. However,\u0000Shampoo's drawbacks include additional hyperparameters and computational\u0000overhead when compared to Adam, which only updates running averages of first-\u0000and second-moment quantities. This work establishes a formal connection between\u0000Shampoo (implemented with the 1/2 power) and Adafactor -- a memory-efficient\u0000approximation of Adam -- showing that Shampoo is equivalent to running\u0000Adafactor in the eigenbasis of Shampoo's preconditioner. This insight leads to\u0000the design of a simpler and computationally efficient algorithm:\u0000$textbf{S}$hampo$textbf{O}$ with $textbf{A}$dam in the\u0000$textbf{P}$reconditioner's eigenbasis (SOAP). With regards to improving Shampoo's computational efficiency, the most\u0000straightforward approach would be to simply compute Shampoo's\u0000eigendecomposition less frequently. Unfortunately, as our empirical results\u0000show, this leads to performance degradation that worsens with this frequency.\u0000SOAP mitigates this degradation by continually updating the running average of\u0000the second moment, just as Adam does, but in the current (slowly changing)\u0000coordinate basis. Furthermore, since SOAP is equivalent to running Adam in a\u0000rotated space, it introduces only one additional hyperparameter (the\u0000preconditioning frequency) compared to Adam. We empirically evaluate SOAP on\u0000language model pre-training with 360m and 660m sized models. In the large batch\u0000regime, SOAP reduces the number of iterations by over 40% and wall clock time\u0000by over 35% compared to AdamW, with approximately 20% improvements in both\u0000metrics compared to Shampoo. An implementation of SOAP is available at\u0000https://github.com/nikhilvyas/SOAP.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信