Machine LearningPub Date : 2024-01-16DOI: 10.1007/s10994-023-06414-w
{"title":"Understanding imbalanced data: XAI & interpretable ML framework","authors":"","doi":"10.1007/s10994-023-06414-w","DOIUrl":"https://doi.org/10.1007/s10994-023-06414-w","url":null,"abstract":"<h3>Abstract</h3> <p>There is a gap between current methods that explain deep learning models that work on imbalanced image data and the needs of the imbalanced learning community. Existing methods that explain imbalanced data are geared toward binary classification, single layer machine learning models and low dimensional data. Current eXplainable Artificial Intelligence (XAI) techniques for vision data mainly focus on mapping predictions of specific <em>instances</em> to inputs, instead of examining <em>global</em> data properties and complexities of entire classes. Therefore, there is a need for a framework that is tailored to modern deep networks, that incorporates large, high dimensional, multi-class datasets, and uncovers data complexities commonly found in imbalanced data. We propose a set of techniques that can be used by both deep learning model users to identify, visualize and understand class prototypes, sub-concepts and outlier instances; and by imbalanced learning algorithm developers to detect features and class exemplars that are key to model performance. The components of our framework can be applied sequentially in their entirety or individually, making it fully flexible to the user’s specific needs (https://github.com/dd1github/XAI_for_Imbalanced_Learning).</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"10 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139481034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine LearningPub Date : 2024-01-16DOI: 10.1007/s10994-023-06490-y
Raoul Heese, Moritz Wolter, Sascha Mücke, Lukas Franken, Nico Piatkowski
{"title":"On the effects of biased quantum random numbers on the initialization of artificial neural networks","authors":"Raoul Heese, Moritz Wolter, Sascha Mücke, Lukas Franken, Nico Piatkowski","doi":"10.1007/s10994-023-06490-y","DOIUrl":"https://doi.org/10.1007/s10994-023-06490-y","url":null,"abstract":"<p>Recent advances in practical quantum computing have led to a variety of cloud-based quantum computing platforms that allow researchers to evaluate their algorithms on noisy intermediate-scale quantum devices. A common property of quantum computers is that they can exhibit instances of true randomness as opposed to pseudo-randomness obtained from classical systems. Investigating the effects of such true quantum randomness in the context of machine learning is appealing, and recent results vaguely suggest that benefits can indeed be achieved from the use of quantum random numbers. To shed some more light on this topic, we empirically study the effects of hardware-biased quantum random numbers on the initialization of artificial neural network weights in numerical experiments. We find no statistically significant difference in comparison with unbiased quantum random numbers as well as biased and unbiased random numbers from a classical pseudo-random number generator. The quantum random numbers for our experiments are obtained from real quantum hardware.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"212 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139480977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine LearningPub Date : 2024-01-16DOI: 10.1007/s10994-023-06493-9
Zezeng Li, Shenghao Li, Lianbao Jin, Na Lei, Zhongxuan Luo
{"title":"OT-net: a reusable neural optimal transport solver","authors":"Zezeng Li, Shenghao Li, Lianbao Jin, Na Lei, Zhongxuan Luo","doi":"10.1007/s10994-023-06493-9","DOIUrl":"https://doi.org/10.1007/s10994-023-06493-9","url":null,"abstract":"<p>With the widespread application of optimal transport (OT), its calculation becomes essential, and various algorithms have emerged. However, the existing methods either have low efficiency or cannot represent discontinuous maps. A novel reusable neural OT solver <b>OT-Net</b> is thus presented, which first learns Brenier’s height representation via the neural network to get its potential, and then obtains the OT map by the gradient of the potential. The algorithm has two merits: (1) When new target samples are added, the OT map can be calculated straightly, which greatly improves the efficiency and reusability of the map. (2) It can easily represent discontinuous maps, which allows it to match any target distribution with discontinuous supports and achieve sharp boundaries, and thus eliminate mode collapse. Moreover, we conducted error analyses on the proposed algorithm and demonstrated the empirical success of our approach in image generation, color transfer, and domain adaptation.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"31 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139480905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine LearningPub Date : 2024-01-12DOI: 10.1007/s10994-023-06470-2
Malik AL-Essa, Giuseppina Andresini, Annalisa Appice, Donato Malerba
{"title":"PANACEA: a neural model ensemble for cyber-threat detection","authors":"Malik AL-Essa, Giuseppina Andresini, Annalisa Appice, Donato Malerba","doi":"10.1007/s10994-023-06470-2","DOIUrl":"https://doi.org/10.1007/s10994-023-06470-2","url":null,"abstract":"<p>Ensemble learning is a strategy commonly used to fuse different base models by creating a model ensemble that is expected more accurate on unseen data than the base models. This study describes a new cyber-threat detection method, called <span>PANACEA</span>, that uses ensemble learning coupled with adversarial training in deep learning, in order to gain accuracy with neural models trained in cybersecurity problems. The selection of the base models is one of the main challenges to handle, in order to train accurate ensembles. This study describes a model ensemble pruning approach based on eXplainable AI (XAI) to increase the ensemble diversity and gain accuracy in ensemble classification. We base on the idea that being able to identify base models that give relevance to different input feature sub-spaces may help in improving the accuracy of an ensemble trained to recognise different signatures of different cyber-attack patterns. To this purpose, we use a global XAI technique to measure the ensemble model diversity with respect to the effect of the input features on the accuracy of the base neural models combined in the ensemble. Experiments carried out on four benchmark cybersecurity datasets (three network intrusion detection datasets and one malware detection dataset) show the beneficial effects of the proposed combination of adversarial training, ensemble learning and XAI on the accuracy of multi-class classifications of cyber-data achieved by the neural model ensemble.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"30 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139463137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine LearningPub Date : 2024-01-12DOI: 10.1007/s10994-023-06445-3
{"title":"Hierarchical U-net with re-parameterization technique for spatio-temporal weather forecasting","authors":"","doi":"10.1007/s10994-023-06445-3","DOIUrl":"https://doi.org/10.1007/s10994-023-06445-3","url":null,"abstract":"<h3>Abstract</h3> <p>Due to the considerable computational demands of physics-based numerical weather prediction, especially when modeling fine-grained spatio-temporal atmospheric phenomena, deep learning methods offer an advantageous approach by leveraging specialized computing devices to accelerate training and significantly reduce computational costs. Consequently, the application of deep learning methods has presented a novel solution in the field of weather forecasting. In this context, we introduce a groundbreaking deep learning-based weather prediction architecture known as Hierarchical U-Net (HU-Net) with re-parameterization techniques. The HU-Net comprises two essential components: a feature extraction module and a U-Net module with re-parameterization techniques. The feature extraction module consists of two branches. First, the global pattern extraction employs adaptive Fourier neural operators and self-attention, well-known for capturing long-term dependencies in the data. Second, the local pattern extraction utilizes convolution operations as fundamental building blocks, highly proficient in modeling local correlations. Moreover, a feature fusion block dynamically combines dual-scale information. The U-Net module adopts RepBlock with re-parameterization techniques as the fundamental building block, enabling efficient and rapid inference. In extensive experiments carried out on the large-scale weather benchmark dataset <em>WeatherBench</em> at a resolution of 1.40625<span> <span>(^circ )</span> </span>, the results demonstrate that our proposed HU-Net outperforms other baseline models in both prediction accuracy and inference time.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"35 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139463114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine LearningPub Date : 2024-01-11DOI: 10.1007/s10994-023-06419-5
{"title":"Compositional scene modeling with global object-centric representations","authors":"","doi":"10.1007/s10994-023-06419-5","DOIUrl":"https://doi.org/10.1007/s10994-023-06419-5","url":null,"abstract":"<h3>Abstract</h3> <p>The appearance of the same object may vary in different scene images due to occlusions between objects. Humans can quickly identify the same object, even if occlusions exist, by completing the occluded parts based on its complete canonical image in the memory. Achieving this ability is still challenging for existing models, especially in the unsupervised learning setting. Inspired by such an ability of humans, we propose a novel object-centric representation learning method to identify the same object in different scenes that may be occluded by learning global object-centric representations of complete canonical objects without supervision. The representation of each object is divided into an extrinsic part, which characterizes scene-dependent information (i.e., position and size), and an intrinsic part, which characterizes globally invariant information (i.e., appearance and shape). The former can be inferred with an improved IC-SBP module. The latter is extracted by combining rectangular and arbitrary-shaped attention and is used to infer the identity representation via a proposed patch-matching strategy with a set of learnable global object-centric representations of complete canonical objects. In the experiment, three 2D scene datasets are used to verify the proposed method’s ability to recognize the identity of the same object in different scenes. A complex 3D scene dataset and a real-world dataset are used to evaluate the performance of scene decomposition. Our experimental results demonstrate that the proposed method outperforms the comparison methods in terms of same object recognition and scene decomposition.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"14 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139463235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tracking treatment effect heterogeneity in evolving environments","authors":"Tian Qin, Long-Fei Li, Tian-Zuo Wang, Zhi-Hua Zhou","doi":"10.1007/s10994-023-06421-x","DOIUrl":"https://doi.org/10.1007/s10994-023-06421-x","url":null,"abstract":"<p>Heterogeneous treatment effect (HTE) estimation plays a crucial role in developing personalized treatment plans across various applications. Conventional approaches assume that the observed data are independent and identically distributed (i.i.d.). In some real applications, however, the assumption does not hold: the environment may evolve, which leads to variations in HTE over time. To enable HTE estimation in evolving environments, we introduce and formulate the online HTE estimation problem. We propose an online ensemble-based HTE estimation method called ETHOS, which is capable of adapting to unknown evolving environments by ensembling the outputs of multiple base estimators that track environmental changes at different scales. Theoretical analysis reveals that ETHOS achieves an optimal expected dynamic regret <span>(O(sqrt{T(1+P_T)}))</span>, where <i>T</i> denotes the number of observed examples and <span>(P_T)</span> characterizes the intensity of environment changes. The achieved dynamic regret ensures that our method consistently approaches the optimal online estimators as long as the evolution of the environment is moderate. We conducted extensive experiments on three common benchmark datasets with various environment evolving mechanisms. The results validate the theoretical analysis and the effectiveness of our proposed method.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"211 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139463173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine LearningPub Date : 2024-01-10DOI: 10.1007/s10994-023-06437-3
{"title":"Nrat: towards adversarial training with inherent label noise","authors":"","doi":"10.1007/s10994-023-06437-3","DOIUrl":"https://doi.org/10.1007/s10994-023-06437-3","url":null,"abstract":"<h3>Abstract</h3> <p>Adversarial training (AT) has been widely recognized as the most effective defense approach against adversarial attacks on deep neural networks and it is formulated as a min-max optimization. Most AT algorithms are geared towards research-oriented datasets such as MNIST, CIFAR10, etc., where the labels are generally correct. However, noisy labels, e.g., mislabelling, are inevitable in real-world datasets. In this paper, we investigate AT with inherent label noise, where the training dataset itself contains mislabeled samples. We first empirically show that the performance of AT typically degrades as the label noise rate increases. Then, we propose a <em>Noisy-Robust Adversarial Training</em> (NRAT) algorithm, which leverages the recent advancements in learning with noisy labels to enhance the performance of AT in the presence of label noise. For experimental comparison, we consider two essential metrics in AT: (i) trade-off between natural and robust accuracy; (ii) robust overfitting. Our experiments show that NRAT’s performance is on par with, or better than, the state-of-the-art AT methods on both evaluation metrics. Our code is publicly available at: https://github.com/TrustAI/NRAT.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"101 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139423204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine LearningPub Date : 2024-01-10DOI: 10.1007/s10994-023-06428-4
Sascha Marton, Stefan Lüdtke, Christian Bartelt, Andrej Tschalzev, Heiner Stuckenschmidt
{"title":"Explaining neural networks without access to training data","authors":"Sascha Marton, Stefan Lüdtke, Christian Bartelt, Andrej Tschalzev, Heiner Stuckenschmidt","doi":"10.1007/s10994-023-06428-4","DOIUrl":"https://doi.org/10.1007/s10994-023-06428-4","url":null,"abstract":"<p>We consider generating explanations for neural networks in cases where the network’s training data is not accessible, for instance due to privacy or safety issues. Recently, Interpretation Nets (<span>(mathcal {I})</span>-Nets) have been proposed as a sample-free approach to post-hoc, global model interpretability that does not require access to training data. They formulate interpretation as a machine learning task that maps network representations (parameters) to a representation of an interpretable function. In this paper, we extend the <span>(mathcal {I})</span>-Net framework to the cases of standard and soft decision trees as surrogate models. We propose a suitable decision tree representation and design of the corresponding <span>(mathcal {I})</span>-Net output layers. Furthermore, we make <span>(mathcal {I})</span>-Nets applicable to real-world tasks by considering more realistic distributions when generating the <span>(mathcal {I})</span>-Net’s training data. We empirically evaluate our approach against traditional global, post-hoc interpretability approaches and show that it achieves superior results when the training data is not accessible.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"84 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139421112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine LearningPub Date : 2024-01-10DOI: 10.1007/s10994-023-06411-z
{"title":"Principled diverse counterfactuals in multilinear models","authors":"","doi":"10.1007/s10994-023-06411-z","DOIUrl":"https://doi.org/10.1007/s10994-023-06411-z","url":null,"abstract":"<h3>Abstract</h3> <p>Machine learning (ML) applications have automated numerous real-life tasks, improving both private and public life. However, the black-box nature of many state-of-the-art models poses the challenge of model verification; how can one be sure that the algorithm bases its decisions on the proper criteria, or that it does not discriminate against certain minority groups? In this paper we propose a way to generate diverse counterfactual explanations from multilinear models, a broad class which includes Random Forests, as well as Bayesian Networks.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"44 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139423088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}