{"title":"FedLF: Adaptive Logit Adjustment and Feature Optimization in Federated Long-Tailed Learning","authors":"Xiuhua Lu, Peng Li, Xuefeng Jiang","doi":"arxiv-2409.12105","DOIUrl":"https://doi.org/arxiv-2409.12105","url":null,"abstract":"Federated learning offers a paradigm to the challenge of preserving privacy\u0000in distributed machine learning. However, datasets distributed across each\u0000client in the real world are inevitably heterogeneous, and if the datasets can\u0000be globally aggregated, they tend to be long-tailed distributed, which greatly\u0000affects the performance of the model. The traditional approach to federated\u0000learning primarily addresses the heterogeneity of data among clients, yet it\u0000fails to address the phenomenon of class-wise bias in global long-tailed data.\u0000This results in the trained model focusing on the head classes while neglecting\u0000the equally important tail classes. Consequently, it is essential to develop a\u0000methodology that considers classes holistically. To address the above problems,\u0000we propose a new method FedLF, which introduces three modifications in the\u0000local training phase: adaptive logit adjustment, continuous class centred\u0000optimization, and feature decorrelation. We compare seven state-of-the-art\u0000methods with varying degrees of data heterogeneity and long-tailed\u0000distribution. Extensive experiments on benchmark datasets CIFAR-10-LT and\u0000CIFAR-100-LT demonstrate that our approach effectively mitigates the problem of\u0000model performance degradation due to data heterogeneity and long-tailed\u0000distribution. our code is available at https://github.com/18sym/FedLF.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recent Advances in OOD Detection: Problems and Approaches","authors":"Shuo Lu, YingSheng Wang, LuJun Sheng, AiHua Zheng, LinXiao He, Jian Liang","doi":"arxiv-2409.11884","DOIUrl":"https://doi.org/arxiv-2409.11884","url":null,"abstract":"Out-of-distribution (OOD) detection aims to detect test samples outside the\u0000training category space, which is an essential component in building reliable\u0000machine learning systems. Existing reviews on OOD detection primarily focus on\u0000method taxonomy, surveying the field by categorizing various approaches.\u0000However, many recent works concentrate on non-traditional OOD detection\u0000scenarios, such as test-time adaptation, multi-modal data sources and other\u0000novel contexts. In this survey, we uniquely review recent advances in OOD\u0000detection from the problem scenario perspective for the first time. According\u0000to whether the training process is completely controlled, we divide OOD\u0000detection methods into training-driven and training-agnostic. Besides,\u0000considering the rapid development of pre-trained models, large pre-trained\u0000model-based OOD detection is also regarded as an important category and\u0000discussed separately. Furthermore, we provide a discussion of the evaluation\u0000scenarios, a variety of applications, and several future research directions.\u0000We believe this survey with new taxonomy will benefit the proposal of new\u0000methods and the expansion of more practical scenarios. A curated list of\u0000related papers is provided in the Github repository:\u0000url{https://github.com/shuolucs/Awesome-Out-Of-Distribution-Detection}","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Less Memory Means smaller GPUs: Backpropagation with Compressed Activations","authors":"Daniel Barley, Holger Fröning","doi":"arxiv-2409.11902","DOIUrl":"https://doi.org/arxiv-2409.11902","url":null,"abstract":"The ever-growing scale of deep neural networks (DNNs) has lead to an equally\u0000rapid growth in computational resource requirements. Many recent architectures,\u0000most prominently Large Language Models, have to be trained using supercomputers\u0000with thousands of accelerators, such as GPUs or TPUs. Next to the vast number\u0000of floating point operations the memory footprint of DNNs is also exploding. In\u0000contrast, GPU architectures are notoriously short on memory. Even comparatively\u0000small architectures like some EfficientNet variants cannot be trained on a\u0000single consumer-grade GPU at reasonable mini-batch sizes. During training,\u0000intermediate input activations have to be stored until backpropagation for\u0000gradient calculation. These make up the vast majority of the memory footprint.\u0000In this work we therefore consider compressing activation maps for the backward\u0000pass using pooling, which can reduce both the memory footprint and amount of\u0000data movement. The forward computation remains uncompressed. We empirically\u0000show convergence and study effects on feature detection at the example of the\u0000common vision architecture ResNet. With this approach we are able to reduce the\u0000peak memory consumption by 29% at the cost of a longer training schedule, while\u0000maintaining prediction accuracy compared to an uncompressed baseline.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maarten Meire, Quinten Van Baelen, Ted Ooijevaar, Peter Karsmakers
{"title":"Constraint Guided AutoEncoders for Joint Optimization of Condition Indicator Estimation and Anomaly Detection in Machine Condition Monitoring","authors":"Maarten Meire, Quinten Van Baelen, Ted Ooijevaar, Peter Karsmakers","doi":"arxiv-2409.11807","DOIUrl":"https://doi.org/arxiv-2409.11807","url":null,"abstract":"The main goal of machine condition monitoring is, as the name implies, to\u0000monitor the condition of industrial applications. The objective of this\u0000monitoring can be mainly split into two problems. A diagnostic problem, where\u0000normal data should be distinguished from anomalous data, otherwise called\u0000Anomaly Detection (AD), or a prognostic problem, where the aim is to predict\u0000the evolution of a Condition Indicator (CI) that reflects the condition of an\u0000asset throughout its life time. When considering machine condition monitoring,\u0000it is expected that this CI shows a monotonic behavior, as the condition of a\u0000machine gradually degrades over time. This work proposes an extension to\u0000Constraint Guided AutoEncoders (CGAE), which is a robust AD method, that\u0000enables building a single model that can be used for both AD and CI estimation.\u0000For the purpose of improved CI estimation the extension incorporates a\u0000constraint that enforces the model to have monotonically increasing CI\u0000predictions over time. Experimental results indicate that the proposed\u0000algorithm performs similar, or slightly better, than CGAE, with regards to AD,\u0000while improving the monotonic behavior of the CI.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Snelleman, B. M. Renting, H. H. Hoos, J. N. van Rijn
{"title":"Edge-Based Graph Component Pooling","authors":"T. Snelleman, B. M. Renting, H. H. Hoos, J. N. van Rijn","doi":"arxiv-2409.11856","DOIUrl":"https://doi.org/arxiv-2409.11856","url":null,"abstract":"Graph-structured data naturally occurs in many research fields, such as\u0000chemistry and sociology. The relational information contained therein can be\u0000leveraged to statistically model graph properties through geometrical deep\u0000learning. Graph neural networks employ techniques, such as message-passing\u0000layers, to propagate local features through a graph. However, message-passing\u0000layers can be computationally expensive when dealing with large and sparse\u0000graphs. Graph pooling operators offer the possibility of removing or merging\u0000nodes in such graphs, thus lowering computational costs. However, pooling\u0000operators that remove nodes cause data loss, and pooling operators that merge\u0000nodes are often computationally expensive. We propose a pooling operator that\u0000merges nodes so as not to cause data loss but is also conceptually simple and\u0000computationally inexpensive. We empirically demonstrate that the proposed\u0000pooling operator performs statistically significantly better than edge pool on\u0000four popular benchmark datasets while reducing time complexity and the number\u0000of trainable parameters by 70.6% on average. Compared to another maximally\u0000powerful method named Graph Isomporhic Network, we show that we outperform them\u0000on two popular benchmark datasets while reducing the number of learnable\u0000parameters on average by 60.9%.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised Domain Adaptation Via Data Pruning","authors":"Andrea Napoli, Paul White","doi":"arxiv-2409.12076","DOIUrl":"https://doi.org/arxiv-2409.12076","url":null,"abstract":"The removal of carefully-selected examples from training data has recently\u0000emerged as an effective way of improving the robustness of machine learning\u0000models. However, the best way to select these examples remains an open\u0000question. In this paper, we consider the problem from the perspective of\u0000unsupervised domain adaptation (UDA). We propose AdaPrune, a method for UDA\u0000whereby training examples are removed to attempt to align the training\u0000distribution to that of the target data. By adopting the maximum mean\u0000discrepancy (MMD) as the criterion for alignment, the problem can be neatly\u0000formulated and solved as an integer quadratic program. We evaluate our approach\u0000on a real-world domain shift task of bioacoustic event detection. As a method\u0000for UDA, we show that AdaPrune outperforms related techniques, and is\u0000complementary to other UDA algorithms such as CORAL. Our analysis of the\u0000relationship between the MMD and model accuracy, along with t-SNE plots,\u0000validate the proposed method as a principled and well-founded way of performing\u0000data pruning.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Grid Graph Neural Networks with Self-Attention for Computational Mechanics","authors":"Paul Garnier, Jonathan Viquerat, Elie Hachem","doi":"arxiv-2409.11899","DOIUrl":"https://doi.org/arxiv-2409.11899","url":null,"abstract":"Advancement in finite element methods have become essential in various\u0000disciplines, and in particular for Computational Fluid Dynamics (CFD), driving\u0000research efforts for improved precision and efficiency. While Convolutional\u0000Neural Networks (CNNs) have found success in CFD by mapping meshes into images,\u0000recent attention has turned to leveraging Graph Neural Networks (GNNs) for\u0000direct mesh processing. This paper introduces a novel model merging\u0000Self-Attention with Message Passing in GNNs, achieving a 15% reduction in RMSE\u0000on the well known flow past a cylinder benchmark. Furthermore, a dynamic mesh\u0000pruning technique based on Self-Attention is proposed, that leads to a robust\u0000GNN-based multigrid approach, also reducing RMSE by 15%. Additionally, a new\u0000self-supervised training method based on BERT is presented, resulting in a 25%\u0000RMSE reduction. The paper includes an ablation study and outperforms\u0000state-of-the-art models on several challenging datasets, promising advancements\u0000similar to those recently achieved in natural language and image processing.\u0000Finally, the paper introduces a dataset with meshes larger than existing ones\u0000by at least an order of magnitude. Code and Datasets will be released at\u0000https://github.com/DonsetPG/multigrid-gnn.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Interpretable End-Stage Renal Disease (ESRD) Prediction: Utilizing Administrative Claims Data with Explainable AI Techniques","authors":"Yubo Li, Saba Al-Sayouri, Rema Padman","doi":"arxiv-2409.12087","DOIUrl":"https://doi.org/arxiv-2409.12087","url":null,"abstract":"This study explores the potential of utilizing administrative claims data,\u0000combined with advanced machine learning and deep learning techniques, to\u0000predict the progression of Chronic Kidney Disease (CKD) to End-Stage Renal\u0000Disease (ESRD). We analyze a comprehensive, 10-year dataset provided by a major\u0000health insurance organization to develop prediction models for multiple\u0000observation windows using traditional machine learning methods such as Random\u0000Forest and XGBoost as well as deep learning approaches such as Long Short-Term\u0000Memory (LSTM) networks. Our findings demonstrate that the LSTM model,\u0000particularly with a 24-month observation window, exhibits superior performance\u0000in predicting ESRD progression, outperforming existing models in the\u0000literature. We further apply SHapley Additive exPlanations (SHAP) analysis to\u0000enhance interpretability, providing insights into the impact of individual\u0000features on predictions at the individual patient level. This study underscores\u0000the value of leveraging administrative claims data for CKD management and\u0000predicting ESRD progression.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Semi-Supervised Learning via Representative and Diverse Sample Selection","authors":"Qian Shao, Jiangrui Kang, Qiyuan Chen, Zepeng Li, Hongxia Xu, Yiwen Cao, Jiajuan Liang, Jian Wu","doi":"arxiv-2409.11653","DOIUrl":"https://doi.org/arxiv-2409.11653","url":null,"abstract":"Semi-Supervised Learning (SSL) has become a preferred paradigm in many deep\u0000learning tasks, which reduces the need for human labor. Previous studies\u0000primarily focus on effectively utilising the labelled and unlabeled data to\u0000improve performance. However, we observe that how to select samples for\u0000labelling also significantly impacts performance, particularly under extremely\u0000low-budget settings. The sample selection task in SSL has been under-explored\u0000for a long time. To fill in this gap, we propose a Representative and Diverse\u0000Sample Selection approach (RDSS). By adopting a modified Frank-Wolfe algorithm\u0000to minimise a novel criterion $alpha$-Maximum Mean Discrepancy ($alpha$-MMD),\u0000RDSS samples a representative and diverse subset for annotation from the\u0000unlabeled data. We demonstrate that minimizing $alpha$-MMD enhances the\u0000generalization ability of low-budget learning. Experimental results show that\u0000RDSS consistently improves the performance of several popular SSL frameworks\u0000and outperforms the state-of-the-art sample selection approaches used in Active\u0000Learning (AL) and Semi-Supervised Active Learning (SSAL), even with constrained\u0000annotation budgets.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Wazed AliIntelligent Embedded Systems, Asif bin MustafaSchool of CIT, Technical University of Munich, Munich, Germany, Md. Aukerul Moin ShuvoDept. of Computer Science and Engineering, Rajshahi University of Engg. & Technology, Rajshahi, Bangladesh, Bernhard SickIntelligent Embedded Systems
{"title":"Location based Probabilistic Load Forecasting of EV Charging Sites: Deep Transfer Learning with Multi-Quantile Temporal Convolutional Network","authors":"Mohammad Wazed AliIntelligent Embedded Systems, Asif bin MustafaSchool of CIT, Technical University of Munich, Munich, Germany, Md. Aukerul Moin ShuvoDept. of Computer Science and Engineering, Rajshahi University of Engg. & Technology, Rajshahi, Bangladesh, Bernhard SickIntelligent Embedded Systems","doi":"arxiv-2409.11862","DOIUrl":"https://doi.org/arxiv-2409.11862","url":null,"abstract":"Electrification of vehicles is a potential way of reducing fossil fuel usage\u0000and thus lessening environmental pollution. Electric Vehicles (EVs) of various\u0000types for different transport modes (including air, water, and land) are\u0000evolving. Moreover, different EV user groups (commuters, commercial or domestic\u0000users, drivers) may use different charging infrastructures (public, private,\u0000home, and workplace) at various times. Therefore, usage patterns and energy\u0000demand are very stochastic. Characterizing and forecasting the charging demand\u0000of these diverse EV usage profiles is essential in preventing power outages.\u0000Previously developed data-driven load models are limited to specific use cases\u0000and locations. None of these models are simultaneously adaptive enough to\u0000transfer knowledge of day-ahead forecasting among EV charging sites of diverse\u0000locations, trained with limited data, and cost-effective. This article presents\u0000a location-based load forecasting of EV charging sites using a deep\u0000Multi-Quantile Temporal Convolutional Network (MQ-TCN) to overcome the\u0000limitations of earlier models. We conducted our experiments on data from four\u0000charging sites, namely Caltech, JPL, Office-1, and NREL, which have diverse EV\u0000user types like students, full-time and part-time employees, random visitors,\u0000etc. With a Prediction Interval Coverage Probability (PICP) score of 93.62%,\u0000our proposed deep MQ-TCN model exhibited a remarkable 28.93% improvement over\u0000the XGBoost model for a day-ahead load forecasting at the JPL charging site. By\u0000transferring knowledge with the inductive Transfer Learning (TL) approach, the\u0000MQ-TCN model achieved a 96.88% PICP score for the load forecasting task at the\u0000NREL site using only two weeks of data.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}