{"title":"Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling","authors":"Arthur Müller, Lukas Vollenkemper","doi":"arxiv-2409.11933","DOIUrl":"https://doi.org/arxiv-2409.11933","url":null,"abstract":"The integration of Reinforcement Learning (RL) with heuristic methods is an\u0000emerging trend for solving optimization problems, which leverages RL's ability\u0000to learn from the data generated during the search process. One promising\u0000approach is to train an RL agent as an improvement heuristic, starting with a\u0000suboptimal solution that is iteratively improved by applying small changes. We\u0000apply this approach to a real-world multiobjective production scheduling\u0000problem. Our approach utilizes a network architecture that includes Transformer\u0000encoding to learn the relationships between jobs. Afterwards, a probability\u0000matrix is generated from which pairs of jobs are sampled and then swapped to\u0000improve the solution. We benchmarked our approach against other heuristics\u0000using real data from our industry partner, demonstrating its superior\u0000performance.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Explainable Machine Learning Approach to Traffic Accident Fatality Prediction","authors":"Md. Asif Khan Rifat, Ahmedul Kabir, Armana Sabiha Huq","doi":"arxiv-2409.11929","DOIUrl":"https://doi.org/arxiv-2409.11929","url":null,"abstract":"Road traffic accidents (RTA) pose a significant public health threat\u0000worldwide, leading to considerable loss of life and economic burdens. This is\u0000particularly acute in developing countries like Bangladesh. Building reliable\u0000models to forecast crash outcomes is crucial for implementing effective\u0000preventive measures. To aid in developing targeted safety interventions, this\u0000study presents a machine learning-based approach for classifying fatal and\u0000non-fatal road accident outcomes using data from the Dhaka metropolitan traffic\u0000crash database from 2017 to 2022. Our framework utilizes a range of machine\u0000learning classification algorithms, comprising Logistic Regression, Support\u0000Vector Machines, Naive Bayes, Random Forest, Decision Tree, Gradient Boosting,\u0000LightGBM, and Artificial Neural Network. We prioritize model interpretability\u0000by employing the SHAP (SHapley Additive exPlanations) method, which elucidates\u0000the key factors influencing accident fatality. Our results demonstrate that\u0000LightGBM outperforms other models, achieving a ROC-AUC score of 0.72. The\u0000global, local, and feature dependency analyses are conducted to acquire deeper\u0000insights into the behavior of the model. SHAP analysis reveals that casualty\u0000class, time of accident, location, vehicle type, and road type play pivotal\u0000roles in determining fatality risk. These findings offer valuable insights for\u0000policymakers and road safety practitioners in developing countries, enabling\u0000the implementation of evidence-based strategies to reduce traffic crash\u0000fatalities.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient wavelet-based physics-informed neural networks for singularly perturbed problems","authors":"Himanshu Pandey, Anshima Singh, Ratikanta Behera","doi":"arxiv-2409.11847","DOIUrl":"https://doi.org/arxiv-2409.11847","url":null,"abstract":"Physics-informed neural networks (PINNs) are a class of deep learning models\u0000that utilize physics as differential equations to address complex problems,\u0000including ones that may involve limited data availability. However, tackling\u0000solutions of differential equations with oscillations or singular perturbations\u0000and shock-like structures becomes challenging for PINNs. Considering these\u0000challenges, we designed an efficient wavelet-based PINNs (W-PINNs) model to\u0000solve singularly perturbed differential equations. Here, we represent the\u0000solution in wavelet space using a family of smooth-compactly supported\u0000wavelets. This framework represents the solution of a differential equation\u0000with significantly fewer degrees of freedom while still retaining in capturing,\u0000identifying, and analyzing the local structure of complex physical phenomena.\u0000The architecture allows the training process to search for a solution within\u0000wavelet space, making the process faster and more accurate. The proposed model\u0000does not rely on automatic differentiations for derivatives involved in\u0000differential equations and does not require any prior information regarding the\u0000behavior of the solution, such as the location of abrupt features. Thus,\u0000through a strategic fusion of wavelets with PINNs, W-PINNs excel at capturing\u0000localized nonlinear information, making them well-suited for problems showing\u0000abrupt behavior in certain regions, such as singularly perturbed problems. The\u0000efficiency and accuracy of the proposed neural network model are demonstrated\u0000in various test problems, i.e., highly singularly perturbed nonlinear\u0000differential equations, the FitzHugh-Nagumo (FHN), and Predator-prey\u0000interaction models. The proposed design model exhibits impressive comparisons\u0000with traditional PINNs and the recently developed wavelet-based PINNs, which\u0000use wavelets as an activation function for solving nonlinear differential\u0000equations.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features","authors":"Jiuqi Wang, Shangtong Zhang","doi":"arxiv-2409.12135","DOIUrl":"https://doi.org/arxiv-2409.12135","url":null,"abstract":"Temporal difference (TD) learning with linear function approximation,\u0000abbreviated as linear TD, is a classic and powerful prediction algorithm in\u0000reinforcement learning. While it is well understood that linear TD converges\u0000almost surely to a unique point, this convergence traditionally requires the\u0000assumption that the features used by the approximator are linearly independent.\u0000However, this linear independence assumption does not hold in many practical\u0000scenarios. This work is the first to establish the almost sure convergence of\u0000linear TD without requiring linearly independent features. In fact, we do not\u0000make any assumptions on the features. We prove that the approximated value\u0000function converges to a unique point and the weight iterates converge to a set.\u0000We also establish a notion of local stability of the weight iterates.\u0000Importantly, we do not need to introduce any other additional assumptions and\u0000do not need to make any modification to the linear TD algorithm. Key to our\u0000analysis is a novel characterization of bounded invariant sets of the mean ODE\u0000of linear TD.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview","authors":"Yanshu Wang, Tong Yang, Xiyan Liang, Guoan Wang, Hanning Lu, Xu Zhe, Yaoming Li, Li Weitao","doi":"arxiv-2409.11650","DOIUrl":"https://doi.org/arxiv-2409.11650","url":null,"abstract":"This paper provides a comprehensive overview of the principles, challenges,\u0000and methodologies associated with quantizing large-scale neural network models.\u0000As neural networks have evolved towards larger and more complex architectures\u0000to address increasingly sophisticated tasks, the computational and energy costs\u0000have escalated significantly. We explore the necessity and impact of model size\u0000growth, highlighting the performance benefits as well as the computational\u0000challenges and environmental considerations. The core focus is on model\u0000quantization as a fundamental approach to mitigate these challenges by reducing\u0000model size and improving efficiency without substantially compromising\u0000accuracy. We delve into various quantization techniques, including both\u0000post-training quantization (PTQ) and quantization-aware training (QAT), and\u0000analyze several state-of-the-art algorithms such as LLM-QAT, PEQA(L4Q),\u0000ZeroQuant, SmoothQuant, and others. Through comparative analysis, we examine\u0000how these methods address issues like outliers, importance weighting, and\u0000activation quantization, ultimately contributing to more sustainable and\u0000accessible deployment of large-scale models.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hoang V. Tran, Thieu N. Vo, Tho H. Tran, An T. Nguyen, Tan Minh Nguyen
{"title":"Monomial Matrix Group Equivariant Neural Functional Networks","authors":"Hoang V. Tran, Thieu N. Vo, Tho H. Tran, An T. Nguyen, Tan Minh Nguyen","doi":"arxiv-2409.11697","DOIUrl":"https://doi.org/arxiv-2409.11697","url":null,"abstract":"Neural functional networks (NFNs) have recently gained significant attention\u0000due to their diverse applications, ranging from predicting network\u0000generalization and network editing to classifying implicit neural\u0000representation. Previous NFN designs often depend on permutation symmetries in\u0000neural networks' weights, which traditionally arise from the unordered\u0000arrangement of neurons in hidden layers. However, these designs do not take\u0000into account the weight scaling symmetries of $operatorname{ReLU}$ networks,\u0000and the weight sign flipping symmetries of $operatorname{sin}$ or\u0000$operatorname{tanh}$ networks. In this paper, we extend the study of the group\u0000action on the network weights from the group of permutation matrices to the\u0000group of monomial matrices by incorporating scaling/sign-flipping symmetries.\u0000Particularly, we encode these scaling/sign-flipping symmetries by designing our\u0000corresponding equivariant and invariant layers. We name our new family of NFNs\u0000the Monomial Matrix Group Equivariant Neural Functional Networks\u0000(Monomial-NFN). Because of the expansion of the symmetries, Monomial-NFN has\u0000much fewer independent trainable parameters compared to the baseline NFNs in\u0000the literature, thus enhancing the model's efficiency. Moreover, for fully\u0000connected and convolutional neural networks, we theoretically prove that all\u0000groups that leave these networks invariant while acting on their weight spaces\u0000are some subgroups of the monomial matrix group. We provide empirical evidences\u0000to demonstrate the advantages of our model over existing baselines, achieving\u0000competitive performance and efficiency.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers","authors":"Ekaterina Grishina, Mikhail Gorbunov, Maxim Rakhuba","doi":"arxiv-2409.11859","DOIUrl":"https://doi.org/arxiv-2409.11859","url":null,"abstract":"Controlling the spectral norm of the Jacobian matrix, which is related to the\u0000convolution operation, has been shown to improve generalization, training\u0000stability and robustness in CNNs. Existing methods for computing the norm\u0000either tend to overestimate it or their performance may deteriorate quickly\u0000with increasing the input and kernel sizes. In this paper, we demonstrate that\u0000the tensor version of the spectral norm of a four-dimensional convolution\u0000kernel, up to a constant factor, serves as an upper bound for the spectral norm\u0000of the Jacobian matrix associated with the convolution operation. This new\u0000upper bound is independent of the input image resolution, differentiable and\u0000can be efficiently calculated during training. Through experiments, we\u0000demonstrate how this new bound can be used to improve the performance of\u0000convolutional architectures.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FedLF: Adaptive Logit Adjustment and Feature Optimization in Federated Long-Tailed Learning","authors":"Xiuhua Lu, Peng Li, Xuefeng Jiang","doi":"arxiv-2409.12105","DOIUrl":"https://doi.org/arxiv-2409.12105","url":null,"abstract":"Federated learning offers a paradigm to the challenge of preserving privacy\u0000in distributed machine learning. However, datasets distributed across each\u0000client in the real world are inevitably heterogeneous, and if the datasets can\u0000be globally aggregated, they tend to be long-tailed distributed, which greatly\u0000affects the performance of the model. The traditional approach to federated\u0000learning primarily addresses the heterogeneity of data among clients, yet it\u0000fails to address the phenomenon of class-wise bias in global long-tailed data.\u0000This results in the trained model focusing on the head classes while neglecting\u0000the equally important tail classes. Consequently, it is essential to develop a\u0000methodology that considers classes holistically. To address the above problems,\u0000we propose a new method FedLF, which introduces three modifications in the\u0000local training phase: adaptive logit adjustment, continuous class centred\u0000optimization, and feature decorrelation. We compare seven state-of-the-art\u0000methods with varying degrees of data heterogeneity and long-tailed\u0000distribution. Extensive experiments on benchmark datasets CIFAR-10-LT and\u0000CIFAR-100-LT demonstrate that our approach effectively mitigates the problem of\u0000model performance degradation due to data heterogeneity and long-tailed\u0000distribution. our code is available at https://github.com/18sym/FedLF.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recent Advances in OOD Detection: Problems and Approaches","authors":"Shuo Lu, YingSheng Wang, LuJun Sheng, AiHua Zheng, LinXiao He, Jian Liang","doi":"arxiv-2409.11884","DOIUrl":"https://doi.org/arxiv-2409.11884","url":null,"abstract":"Out-of-distribution (OOD) detection aims to detect test samples outside the\u0000training category space, which is an essential component in building reliable\u0000machine learning systems. Existing reviews on OOD detection primarily focus on\u0000method taxonomy, surveying the field by categorizing various approaches.\u0000However, many recent works concentrate on non-traditional OOD detection\u0000scenarios, such as test-time adaptation, multi-modal data sources and other\u0000novel contexts. In this survey, we uniquely review recent advances in OOD\u0000detection from the problem scenario perspective for the first time. According\u0000to whether the training process is completely controlled, we divide OOD\u0000detection methods into training-driven and training-agnostic. Besides,\u0000considering the rapid development of pre-trained models, large pre-trained\u0000model-based OOD detection is also regarded as an important category and\u0000discussed separately. Furthermore, we provide a discussion of the evaluation\u0000scenarios, a variety of applications, and several future research directions.\u0000We believe this survey with new taxonomy will benefit the proposal of new\u0000methods and the expansion of more practical scenarios. A curated list of\u0000related papers is provided in the Github repository:\u0000url{https://github.com/shuolucs/Awesome-Out-Of-Distribution-Detection}","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Less Memory Means smaller GPUs: Backpropagation with Compressed Activations","authors":"Daniel Barley, Holger Fröning","doi":"arxiv-2409.11902","DOIUrl":"https://doi.org/arxiv-2409.11902","url":null,"abstract":"The ever-growing scale of deep neural networks (DNNs) has lead to an equally\u0000rapid growth in computational resource requirements. Many recent architectures,\u0000most prominently Large Language Models, have to be trained using supercomputers\u0000with thousands of accelerators, such as GPUs or TPUs. Next to the vast number\u0000of floating point operations the memory footprint of DNNs is also exploding. In\u0000contrast, GPU architectures are notoriously short on memory. Even comparatively\u0000small architectures like some EfficientNet variants cannot be trained on a\u0000single consumer-grade GPU at reasonable mini-batch sizes. During training,\u0000intermediate input activations have to be stored until backpropagation for\u0000gradient calculation. These make up the vast majority of the memory footprint.\u0000In this work we therefore consider compressing activation maps for the backward\u0000pass using pooling, which can reduce both the memory footprint and amount of\u0000data movement. The forward computation remains uncompressed. We empirically\u0000show convergence and study effects on feature detection at the example of the\u0000common vision architecture ResNet. With this approach we are able to reduce the\u0000peak memory consumption by 29% at the cost of a longer training schedule, while\u0000maintaining prediction accuracy compared to an uncompressed baseline.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}