Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)最新文献_第10页

Powershap: A Power-full Shapley Feature Selection Method Powershap: Power-full Shapley特征选择方法

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference) Pub Date : 2022-06-16 DOI: 10.48550/arXiv.2206.08394

Jarne Verhaeghe, Jeroen Van Der Donckt, F. Ongenae, S. Hoecke

{"title":"Powershap: A Power-full Shapley Feature Selection Method","authors":"Jarne Verhaeghe, Jeroen Van Der Donckt, F. Ongenae, S. Hoecke","doi":"10.48550/arXiv.2206.08394","DOIUrl":"https://doi.org/10.48550/arXiv.2206.08394","url":null,"abstract":"Feature selection is a crucial step in developing robust and powerful machine learning models. Feature selection techniques can be divided into two categories: filter and wrapper methods. While wrapper methods commonly result in strong predictive performances, they suffer from a large computational complexity and therefore take a significant amount of time to complete, especially when dealing with high-dimensional feature sets. Alternatively, filter methods are considerably faster, but suffer from several other disadvantages, such as (i) requiring a threshold value, (ii) not taking into account intercorrelation between features, and (iii) ignoring feature interactions with the model. To this end, we present powershap, a novel wrapper feature selection method, which leverages statistical hypothesis testing and power calculations in combination with Shapley values for quick and intuitive feature selection. Powershap is built on the core assumption that an informative feature will have a larger impact on the prediction compared to a known random feature. Benchmarks and simulations show that powershap outperforms other filter methods with predictive performances on par with wrapper methods while being significantly faster, often even reaching half or a third of the execution time. As such, powershap provides a competitive and quick algorithm that can be used by various models in different domains. Furthermore, powershap is implemented as a plug-and-play and open-source sklearn component, enabling easy integration in conventional data science pipelines. User experience is even further enhanced by also providing an automatic mode that automatically tunes the hyper-parameters of the powershap algorithm, allowing to use the algorithm without any configuration needed.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":"1 1","pages":"71-87"},"PeriodicalIF":0.0,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77563693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

ARES: Locally Adaptive Reconstruction-based Anomaly Scoring ARES:基于局部自适应重构的异常评分

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference) Pub Date : 2022-06-15 DOI: 10.48550/arXiv.2206.07604

Adam Goodge, Bryan Hooi, See-Kiong Ng, W. Ng

引用次数: 0

Summarizing Labeled Multi-Graphs 标注多图总结

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference) Pub Date : 2022-06-15 DOI: 10.48550/arXiv.2206.07674

Dimitris Berberidis, P. Liang, L. Akoglu

{"title":"Summarizing Labeled Multi-Graphs","authors":"Dimitris Berberidis, P. Liang, L. Akoglu","doi":"10.48550/arXiv.2206.07674","DOIUrl":"https://doi.org/10.48550/arXiv.2206.07674","url":null,"abstract":"Real-world graphs can be difficult to interpret and visualize beyond a certain size. To address this issue, graph summarization aims to simplify and shrink a graph, while maintaining its high-level structure and characteristics. Most summarization methods are designed for homogeneous, undirected, simple graphs; however, many real-world graphs are ornate; with characteristics including node labels, directed edges, edge multiplicities, and self-loops. In this paper we propose LM-Gsum, a versatile yet rigorous graph summarization model that (to the best of our knowledge, for the first time) can handle graphs with all the aforementioned characteristics (and any combination thereof). Moreover, our proposed model captures basic sub-structures that are prevalent in real-world graphs, such as cliques, stars, etc. LM-Gsum compactly quantifies the information content of a complex graph using a novel encoding scheme, where it seeks to minimize the total number of bits required to encode (i) the summary graph, as well as (ii) the corrections required for reconstructing the input graph losslessly. To accelerate the summary construction, it creates super-nodes efficiently by merging nodes in groups. Experiments demonstrate that LM-Gsum facilitates the visualization of real-world complex graphs, revealing interpretable structures and high- level relationships. Furthermore, LM-Gsum achieves better trade-off between compression rate and running time, relative to existing methods (only) on comparable settings.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":"36 1","pages":"53-68"},"PeriodicalIF":0.0,"publicationDate":"2022-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87918419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Defending Observation Attacks in Deep Reinforcement Learning via Detection and Denoising 利用检测和去噪防御深度强化学习中的观察攻击

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference) Pub Date : 2022-06-14 DOI: 10.48550/arXiv.2206.07188

Zikang Xiong, Joe Eappen, He Zhu, S. Jagannathan

引用次数: 4

On the Generalization of Neural Combinatorial Optimization Heuristics 神经组合优化启发式的推广

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference) Pub Date : 2022-06-01 DOI: 10.48550/arXiv.2206.00787

S. Manchanda, Sofia Michel, Darko Drakulic, J. Andreoli

引用次数: 5

Factorized Structured Regression for Large-Scale Varying Coefficient Models 大型变系数模型的因式结构回归

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference) Pub Date : 2022-05-25 DOI: 10.48550/arXiv.2205.13080

David Rugamer, Andreas Bender, Simon Wiegrebe, Daniel Racek, Bernd Bischl, Christian L. Muller, Clemens Stachl

{"title":"Factorized Structured Regression for Large-Scale Varying Coefficient Models","authors":"David Rugamer, Andreas Bender, Simon Wiegrebe, Daniel Racek, Bernd Bischl, Christian L. Muller, Clemens Stachl","doi":"10.48550/arXiv.2205.13080","DOIUrl":"https://doi.org/10.48550/arXiv.2205.13080","url":null,"abstract":"Recommender Systems (RS) pervade many aspects of our everyday digital life. Proposed to work at scale, state-of-the-art RS allow the modeling of thousands of interactions and facilitate highly individualized recommendations. Conceptually, many RS can be viewed as instances of statistical regression models that incorporate complex feature effects and potentially non-Gaussian outcomes. Such structured regression models, including time-aware varying coefficients models, are, however, limited in their applicability to categorical effects and inclusion of a large number of interactions. Here, we propose Factorized Structured Regression (FaStR) for scalable varying coefficient models. FaStR overcomes limitations of general regression models for large-scale data by combining structured additive regression and factorization approaches in a neural network-based model implementation. This fusion provides a scalable framework for the estimation of statistical models in previously infeasible data settings. Empirical results confirm that the estimation of varying coefficients of our approach is on par with state-of-the-art regression techniques, while scaling notably better and also being competitive with other time-aware RS in terms of prediction performance. We illustrate FaStR's performance and interpretability on a large-scale behavioral study with smartphone user data.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":"25 1","pages":"20-35"},"PeriodicalIF":0.0,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82680514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning MAVIPER:可解释多智能体强化学习的学习决策树策略

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference) Pub Date : 2022-05-25 DOI: 10.48550/arXiv.2205.12449

Stephanie Milani, Zhicheng Zhang, Nicholay Topin, Z. Shi, C. Kamhoua, E. Papalexakis, Fei Fang

{"title":"MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning","authors":"Stephanie Milani, Zhicheng Zhang, Nicholay Topin, Z. Shi, C. Kamhoua, E. Papalexakis, Fei Fang","doi":"10.48550/arXiv.2205.12449","DOIUrl":"https://doi.org/10.48550/arXiv.2205.12449","url":null,"abstract":"Many recent breakthroughs in multi-agent reinforcement learning (MARL) require the use of deep neural networks, which are challenging for human experts to interpret and understand. On the other hand, existing work on interpretable reinforcement learning (RL) has shown promise in extracting more interpretable decision tree-based policies from neural networks, but only in the single-agent setting. To fill this gap, we propose the first set of algorithms that extract interpretable decision-tree policies from neural networks trained with MARL. The first algorithm, IVIPER, extends VIPER, a recent method for single-agent interpretable RL, to the multi-agent setting. We demonstrate that IVIPER learns high-quality decision-tree policies for each agent. To better capture coordination between agents, we propose a novel centralized decision-tree training algorithm, MAVIPER. MAVIPER jointly grows the trees of each agent by predicting the behavior of the other agents using their anticipated trees, and uses resampling to focus on states that are critical for its interactions with other agents. We show that both algorithms generally outperform the baselines and that MAVIPER-trained agents achieve better-coordinated performance than IVIPER-trained agents on three different multi-agent particle-world environments.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":"2 1","pages":"251-266"},"PeriodicalIF":0.0,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80667173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

On the Prediction Instability of Graph Neural Networks 论图神经网络的预测不稳定性

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference) Pub Date : 2022-05-20 DOI: 10.48550/arXiv.2205.10070

Max Klabunde, F. Lemmerich

引用次数: 4

Wasserstein t-SNE

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference) Pub Date : 2022-05-16 DOI: 10.48550/arXiv.2205.07531

Fynn Bachmann, Philipp Hennig, Dmitry Kobak

{"title":"Wasserstein t-SNE","authors":"Fynn Bachmann, Philipp Hennig, Dmitry Kobak","doi":"10.48550/arXiv.2205.07531","DOIUrl":"https://doi.org/10.48550/arXiv.2205.07531","url":null,"abstract":"Scientific datasets often have hierarchical structure: for example, in surveys, individual participants (samples) might be grouped at a higher level (units) such as their geographical region. In these settings, the interest is often in exploring the structure on the unit level rather than on the sample level. Units can be compared based on the distance between their means, however this ignores the within-unit distribution of samples. Here we develop an approach for exploratory analysis of hierarchical datasets using the Wasserstein distance metric that takes into account the shapes of within-unit distributions. We use t-SNE to construct 2D embeddings of the units, based on the matrix of pairwise Wasserstein distances between them. The distance matrix can be efficiently computed by approximating each unit with a Gaussian distribution, but we also provide a scalable method to compute exact Wasserstein distances. We use synthetic data to demonstrate the effectiveness of our Wasserstein t-SNE, and apply it to data from the 2017 German parliamentary election, considering polling stations as samples and voting districts as units. The resulting embedding uncovers meaningful structure in the data.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":"134 1","pages":"104-120"},"PeriodicalIF":0.0,"publicationDate":"2022-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86326688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Near out-of-distribution detection for low-resolution radar micro-Doppler signatures 低分辨率雷达微多普勒特征的近离分布检测

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference) Pub Date : 2022-05-12 DOI: 10.48550/arXiv.2205.07869

Martin Bauw, S. Velasco-Forero, J. Angulo, C. Adnet, O. Airiau

{"title":"Near out-of-distribution detection for low-resolution radar micro-Doppler signatures","authors":"Martin Bauw, S. Velasco-Forero, J. Angulo, C. Adnet, O. Airiau","doi":"10.48550/arXiv.2205.07869","DOIUrl":"https://doi.org/10.48550/arXiv.2205.07869","url":null,"abstract":"Near out-of-distribution detection (OODD) aims at discriminating semantically similar data points without the supervision required for classification. This paper puts forward an OODD use case for radar targets detection extensible to other kinds of sensors and detection scenarios. We emphasize the relevance of OODD and its specific supervision requirements for the detection of a multimodal, diverse targets class among other similar radar targets and clutter in real-life critical systems. We propose a comparison of deep and non-deep OODD methods on simulated low-resolution pulse radar micro-Doppler signatures, considering both a spectral and a covariance matrix input representation. The covariance representation aims at estimating whether dedicated second-order processing is appropriate to discriminate signatures. The potential contributions of labeled anomalies in training, self-supervised learning, contrastive learning insights and innovative training losses are discussed, and the impact of training set contamination caused by mislabelling is investigated.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":"90 1","pages":"384-399"},"PeriodicalIF":0.0,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83909963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4