{"title":"Heterogeneous Sheaf Neural Networks","authors":"Luke Braithwaite, Iulia Duta, Pietro Liò","doi":"arxiv-2409.08036","DOIUrl":"https://doi.org/arxiv-2409.08036","url":null,"abstract":"Heterogeneous graphs, with nodes and edges of different types, are commonly\u0000used to model relational structures in many real-world applications. Standard\u0000Graph Neural Networks (GNNs) struggle to process heterogeneous data due to\u0000oversmoothing. Instead, current approaches have focused on accounting for the\u0000heterogeneity in the model architecture, leading to increasingly complex\u0000models. Inspired by recent work, we propose using cellular sheaves to model the\u0000heterogeneity in the graph's underlying topology. Instead of modelling the data\u0000as a graph, we represent it as cellular sheaves, which allows us to encode the\u0000different data types directly in the data structure, eliminating the need to\u0000inject them into the architecture. We introduce HetSheaf, a general framework\u0000for heterogeneous sheaf neural networks, and a series of heterogeneous sheaf\u0000predictors to better encode the data's heterogeneity into the sheaf structure.\u0000Finally, we empirically evaluate HetSheaf on several standard heterogeneous\u0000graph benchmarks, achieving competitive results whilst being more\u0000parameter-efficient.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPMT: Enhanced Semi-Supervised Model for Traffic Incident Detection","authors":"Xinying Lu, Jianli Xiao","doi":"arxiv-2409.07839","DOIUrl":"https://doi.org/arxiv-2409.07839","url":null,"abstract":"For traffic incident detection, the acquisition of data and labels is notably\u0000resource-intensive, rendering semi-supervised traffic incident detection both a\u0000formidable and consequential challenge. Thus, this paper focuses on traffic\u0000incident detection with a semi-supervised learning way. It proposes a\u0000semi-supervised learning model named FPMT within the framework of MixText. The\u0000data augmentation module introduces Generative Adversarial Networks to balance\u0000and expand the dataset. During the mix-up process in the hidden space, it\u0000employs a probabilistic pseudo-mixing mechanism to enhance regularization and\u0000elevate model precision. In terms of training strategy, it initiates with\u0000unsupervised training on all data, followed by supervised fine-tuning on a\u0000subset of labeled data, and ultimately completing the goal of semi-supervised\u0000training. Through empirical validation on four authentic datasets, our FPMT\u0000model exhibits outstanding performance across various metrics. Particularly\u0000noteworthy is its robust performance even in scenarios with low label rates.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A framework for measuring the training efficiency of a neural architecture","authors":"Eduardo Cueto-Mendoza, John D. Kelleher","doi":"arxiv-2409.07925","DOIUrl":"https://doi.org/arxiv-2409.07925","url":null,"abstract":"Measuring Efficiency in neural network system development is an open research\u0000problem. This paper presents an experimental framework to measure the training\u0000efficiency of a neural architecture. To demonstrate our approach, we analyze\u0000the training efficiency of Convolutional Neural Networks and Bayesian\u0000equivalents on the MNIST and CIFAR-10 tasks. Our results show that training\u0000efficiency decays as training progresses and varies across different stopping\u0000criteria for a given neural model and learning task. We also find a non-linear\u0000relationship between training stopping criteria, training Efficiency, model\u0000size, and training Efficiency. Furthermore, we illustrate the potential confounding effects of overtraining\u0000on measuring the training efficiency of a neural architecture. Regarding\u0000relative training efficiency across different architectures, our results\u0000indicate that CNNs are more efficient than BCNNs on both datasets. More\u0000generally, as a learning task becomes more complex, the relative difference in\u0000training efficiency between different architectures becomes more pronounced.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ivan Ovinnikov, Eugene Bykovets, Joachim M. Buhmann
{"title":"Learning Causally Invariant Reward Functions from Diverse Demonstrations","authors":"Ivan Ovinnikov, Eugene Bykovets, Joachim M. Buhmann","doi":"arxiv-2409.08012","DOIUrl":"https://doi.org/arxiv-2409.08012","url":null,"abstract":"Inverse reinforcement learning methods aim to retrieve the reward function of\u0000a Markov decision process based on a dataset of expert demonstrations. The\u0000commonplace scarcity and heterogeneous sources of such demonstrations can lead\u0000to the absorption of spurious correlations in the data by the learned reward\u0000function. Consequently, this adaptation often exhibits behavioural overfitting\u0000to the expert data set when a policy is trained on the obtained reward function\u0000under distribution shift of the environment dynamics. In this work, we explore\u0000a novel regularization approach for inverse reinforcement learning methods\u0000based on the causal invariance principle with the goal of improved reward\u0000function generalization. By applying this regularization to both exact and\u0000approximate formulations of the learning task, we demonstrate superior policy\u0000performance when trained using the recovered reward functions in a transfer\u0000setting","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Taylor-Sensus Network: Embracing Noise to Enlighten Uncertainty for Scientific Data","authors":"Guangxuan Song, Dongmei Fu, Zhongwei Qiu, Jintao Meng, Dawei Zhang","doi":"arxiv-2409.07942","DOIUrl":"https://doi.org/arxiv-2409.07942","url":null,"abstract":"Uncertainty estimation is crucial in scientific data for machine learning.\u0000Current uncertainty estimation methods mainly focus on the model's inherent\u0000uncertainty, while neglecting the explicit modeling of noise in the data.\u0000Furthermore, noise estimation methods typically rely on temporal or spatial\u0000dependencies, which can pose a significant challenge in structured scientific\u0000data where such dependencies among samples are often absent. To address these\u0000challenges in scientific research, we propose the Taylor-Sensus Network\u0000(TSNet). TSNet innovatively uses a Taylor series expansion to model complex,\u0000heteroscedastic noise and proposes a deep Taylor block for aware noise\u0000distribution. TSNet includes a noise-aware contrastive learning module and a\u0000data density perception module for aleatoric and epistemic uncertainty.\u0000Additionally, an uncertainty combination operator is used to integrate these\u0000uncertainties, and the network is trained using a novel heteroscedastic mean\u0000square error loss. TSNet demonstrates superior performance over mainstream and\u0000state-of-the-art methods in experiments, highlighting its potential in\u0000scientific research and noise resistance. It will be open-source to facilitate\u0000the community of \"AI for Science\".","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CliquePH: Higher-Order Information for Graph Neural Networks through Persistent Homology on Clique Graphs","authors":"Davide Buffelli, Farzin Soleymani, Bastian Rieck","doi":"arxiv-2409.08217","DOIUrl":"https://doi.org/arxiv-2409.08217","url":null,"abstract":"Graph neural networks have become the default choice by practitioners for\u0000graph learning tasks such as graph classification and node classification.\u0000Nevertheless, popular graph neural network models still struggle to capture\u0000higher-order information, i.e., information that goes emph{beyond} pairwise\u0000interactions. Recent work has shown that persistent homology, a tool from\u0000topological data analysis, can enrich graph neural networks with topological\u0000information that they otherwise could not capture. Calculating such features is\u0000efficient for dimension 0 (connected components) and dimension 1 (cycles).\u0000However, when it comes to higher-order structures, it does not scale well, with\u0000a complexity of $O(n^d)$, where $n$ is the number of nodes and $d$ is the order\u0000of the structures. In this work, we introduce a novel method that extracts\u0000information about higher-order structures in the graph while still using the\u0000efficient low-dimensional persistent homology algorithm. On standard benchmark\u0000datasets, we show that our method can lead to up to $31%$ improvements in test\u0000accuracy.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"85 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jake Street, Isibor Ihianle, Funminiyi Olajide, Ahmad Lotfi
{"title":"Enhanced Online Grooming Detection Employing Context Determination and Message-Level Analysis","authors":"Jake Street, Isibor Ihianle, Funminiyi Olajide, Ahmad Lotfi","doi":"arxiv-2409.07958","DOIUrl":"https://doi.org/arxiv-2409.07958","url":null,"abstract":"Online Grooming (OG) is a prevalent threat facing predominately children\u0000online, with groomers using deceptive methods to prey on the vulnerability of\u0000children on social media/messaging platforms. These attacks can have severe\u0000psychological and physical impacts, including a tendency towards\u0000revictimization. Current technical measures are inadequate, especially with the\u0000advent of end-to-end encryption which hampers message monitoring. Existing\u0000solutions focus on the signature analysis of child abuse media, which does not\u0000effectively address real-time OG detection. This paper proposes that OG attacks\u0000are complex, requiring the identification of specific communication patterns\u0000between adults and children. It introduces a novel approach leveraging advanced\u0000models such as BERT and RoBERTa for Message-Level Analysis and a Context\u0000Determination approach for classifying actor interactions, including the\u0000introduction of Actor Significance Thresholds and Message Significance\u0000Thresholds. The proposed method aims to enhance accuracy and robustness in\u0000detecting OG by considering the dynamic and multi-faceted nature of these\u0000attacks. Cross-dataset experiments evaluate the robustness and versatility of\u0000our approach. This paper's contributions include improved detection\u0000methodologies and the potential for application in various scenarios,\u0000addressing gaps in current literature and practices.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Role of Deep Learning Regularizations on Actors in Offline RL","authors":"Denis Tarasov, Anja Surina, Caglar Gulcehre","doi":"arxiv-2409.07606","DOIUrl":"https://doi.org/arxiv-2409.07606","url":null,"abstract":"Deep learning regularization techniques, such as emph{dropout}, emph{layer\u0000normalization}, or emph{weight decay}, are widely adopted in the construction\u0000of modern artificial neural networks, often resulting in more robust training\u0000processes and improved generalization capabilities. However, in the domain of\u0000emph{Reinforcement Learning} (RL), the application of these techniques has\u0000been limited, usually applied to value function estimators\u0000citep{hiraoka2021dropout, smith2022walk}, and may result in detrimental\u0000effects. This issue is even more pronounced in offline RL settings, which bear\u0000greater similarity to supervised learning but have received less attention.\u0000Recent work in continuous offline RL has demonstrated that while we can build\u0000sufficiently powerful critic networks, the generalization of actor networks\u0000remains a bottleneck. In this study, we empirically show that applying standard\u0000regularization techniques to actor networks in offline RL actor-critic\u0000algorithms yields improvements of 6% on average across two algorithms and\u0000three different continuous D4RL domains.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benoit Dufumier, Javiera Castillo-Navarro, Devis Tuia, Jean-Philippe Thiran
{"title":"What to align in multimodal contrastive learning?","authors":"Benoit Dufumier, Javiera Castillo-Navarro, Devis Tuia, Jean-Philippe Thiran","doi":"arxiv-2409.07402","DOIUrl":"https://doi.org/arxiv-2409.07402","url":null,"abstract":"Humans perceive the world through multisensory integration, blending the\u0000information of different modalities to adapt their behavior. Contrastive\u0000learning offers an appealing solution for multimodal self-supervised learning.\u0000Indeed, by considering each modality as a different view of the same entity, it\u0000learns to align features of different modalities in a shared representation\u0000space. However, this approach is intrinsically limited as it only learns shared\u0000or redundant information between modalities, while multimodal interactions can\u0000arise in other ways. In this work, we introduce CoMM, a Contrastive MultiModal\u0000learning strategy that enables the communication between modalities in a single\u0000multimodal space. Instead of imposing cross- or intra- modality constraints, we\u0000propose to align multimodal representations by maximizing the mutual\u0000information between augmented versions of these multimodal features. Our\u0000theoretical analysis shows that shared, synergistic and unique terms of\u0000information naturally emerge from this formulation, allowing us to estimate\u0000multimodal interactions beyond redundancy. We test CoMM both in a controlled\u0000and in a series of real-world settings: in the former, we demonstrate that CoMM\u0000effectively captures redundant, unique and synergistic information between\u0000modalities. In the latter, CoMM learns complex multimodal interactions and\u0000achieves state-of-the-art results on the six multimodal benchmarks.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcus Rüb, Philipp Tuchel, Axel Sikora, Daniel Mueller-Gritschneder
{"title":"A Continual and Incremental Learning Approach for TinyML On-device Training Using Dataset Distillation and Model Size Adaption","authors":"Marcus Rüb, Philipp Tuchel, Axel Sikora, Daniel Mueller-Gritschneder","doi":"arxiv-2409.07114","DOIUrl":"https://doi.org/arxiv-2409.07114","url":null,"abstract":"A new algorithm for incremental learning in the context of Tiny Machine\u0000learning (TinyML) is presented, which is optimized for low-performance and\u0000energy efficient embedded devices. TinyML is an emerging field that deploys\u0000machine learning models on resource-constrained devices such as\u0000microcontrollers, enabling intelligent applications like voice recognition,\u0000anomaly detection, predictive maintenance, and sensor data processing in\u0000environments where traditional machine learning models are not feasible. The\u0000algorithm solve the challenge of catastrophic forgetting through the use of\u0000knowledge distillation to create a small, distilled dataset. The novelty of the\u0000method is that the size of the model can be adjusted dynamically, so that the\u0000complexity of the model can be adapted to the requirements of the task. This\u0000offers a solution for incremental learning in resource-constrained\u0000environments, where both model size and computational efficiency are critical\u0000factors. Results show that the proposed algorithm offers a promising approach\u0000for TinyML incremental learning on embedded devices. The algorithm was tested\u0000on five datasets including: CIFAR10, MNIST, CORE50, HAR, Speech Commands. The\u0000findings indicated that, despite using only 43% of Floating Point Operations\u0000(FLOPs) compared to a larger fixed model, the algorithm experienced a\u0000negligible accuracy loss of just 1%. In addition, the presented method is\u0000memory efficient. While state-of-the-art incremental learning is usually very\u0000memory intensive, the method requires only 1% of the original data set.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"112 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}