{"title":"Recurrent Aggregators in Neural Algorithmic Reasoning","authors":"Kaijia Xu, Petar Veličković","doi":"arxiv-2409.07154","DOIUrl":"https://doi.org/arxiv-2409.07154","url":null,"abstract":"Neural algorithmic reasoning (NAR) is an emerging field that seeks to design\u0000neural networks that mimic classical algorithmic computations. Today, graph\u0000neural networks (GNNs) are widely used in neural algorithmic reasoners due to\u0000their message passing framework and permutation equivariance. In this extended\u0000abstract, we challenge this design choice, and replace the equivariant\u0000aggregation function with a recurrent neural network. While seemingly\u0000counter-intuitive, this approach has appropriate grounding when nodes have a\u0000natural ordering -- and this is the case frequently in established reasoning\u0000benchmarks like CLRS-30. Indeed, our recurrent NAR (RNAR) model performs very\u0000strongly on such tasks, while handling many others gracefully. A notable\u0000achievement of RNAR is its decisive state-of-the-art result on the Heapsort and\u0000Quickselect tasks, both deemed as a significant challenge for contemporary\u0000neural algorithmic reasoners -- especially the latter, where RNAR achieves a\u0000mean micro-F1 score of 87%.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ensemble Methods for Sequence Classification with Hidden Markov Models","authors":"Maxime Kawawa-Beaudan, Srijan Sood, Soham Palande, Ganapathy Mani, Tucker Balch, Manuela Veloso","doi":"arxiv-2409.07619","DOIUrl":"https://doi.org/arxiv-2409.07619","url":null,"abstract":"We present a lightweight approach to sequence classification using Ensemble\u0000Methods for Hidden Markov Models (HMMs). HMMs offer significant advantages in\u0000scenarios with imbalanced or smaller datasets due to their simplicity,\u0000interpretability, and efficiency. These models are particularly effective in\u0000domains such as finance and biology, where traditional methods struggle with\u0000high feature dimensionality and varied sequence lengths. Our ensemble-based\u0000scoring method enables the comparison of sequences of any length and improves\u0000performance on imbalanced datasets. This study focuses on the binary classification problem, particularly in\u0000scenarios with data imbalance, where the negative class is the majority (e.g.,\u0000normal data) and the positive class is the minority (e.g., anomalous data),\u0000often with extreme distribution skews. We propose a novel training approach for\u0000HMM Ensembles that generalizes to multi-class problems and supports\u0000classification and anomaly detection. Our method fits class-specific groups of\u0000diverse models using random data subsets, and compares likelihoods across\u0000classes to produce composite scores, achieving high average precisions and\u0000AUCs. In addition, we compare our approach with neural network-based methods such\u0000as Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks\u0000(LSTMs), highlighting the efficiency and robustness of HMMs in data-scarce\u0000environments. Motivated by real-world use cases, our method demonstrates robust\u0000performance across various benchmarks, offering a flexible framework for\u0000diverse applications.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcus Rüb, Axel Sikora, Daniel Mueller-Gritschneder
{"title":"Advancing On-Device Neural Network Training with TinyPropv2: Dynamic, Sparse, and Efficient Backpropagation","authors":"Marcus Rüb, Axel Sikora, Daniel Mueller-Gritschneder","doi":"arxiv-2409.07109","DOIUrl":"https://doi.org/arxiv-2409.07109","url":null,"abstract":"This study introduces TinyPropv2, an innovative algorithm optimized for\u0000on-device learning in deep neural networks, specifically designed for low-power\u0000microcontroller units. TinyPropv2 refines sparse backpropagation by dynamically\u0000adjusting the level of sparsity, including the ability to selectively skip\u0000training steps. This feature significantly lowers computational effort without\u0000substantially compromising accuracy. Our comprehensive evaluation across\u0000diverse datasets CIFAR 10, CIFAR100, Flower, Food, Speech Command, MNIST, HAR,\u0000and DCASE2020 reveals that TinyPropv2 achieves near-parity with full training\u0000methods, with an average accuracy drop of only around 1 percent in most cases.\u0000For instance, against full training, TinyPropv2's accuracy drop is minimal, for\u0000example, only 0.82 percent on CIFAR 10 and 1.07 percent on CIFAR100. In terms\u0000of computational effort, TinyPropv2 shows a marked reduction, requiring as\u0000little as 10 percent of the computational effort needed for full training in\u0000some scenarios, and consistently outperforms other sparse training\u0000methodologies. These findings underscore TinyPropv2's capacity to efficiently\u0000manage computational resources while maintaining high accuracy, positioning it\u0000as an advantageous solution for advanced embedded device applications in the\u0000IoT ecosystem.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey of Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges","authors":"Guiliang Liu, Sheng Xu, Shicheng Liu, Ashish Gaurav, Sriram Ganapathi Subramanian, Pascal Poupart","doi":"arxiv-2409.07569","DOIUrl":"https://doi.org/arxiv-2409.07569","url":null,"abstract":"Inverse Constrained Reinforcement Learning (ICRL) is the task of inferring\u0000the implicit constraints followed by expert agents from their demonstration\u0000data. As an emerging research topic, ICRL has received considerable attention\u0000in recent years. This article presents a categorical survey of the latest\u0000advances in ICRL. It serves as a comprehensive reference for machine learning\u0000researchers and practitioners, as well as starters seeking to comprehend the\u0000definitions, advancements, and important challenges in ICRL. We begin by\u0000formally defining the problem and outlining the algorithmic framework that\u0000facilitates constraint inference across various scenarios. These include\u0000deterministic or stochastic environments, environments with limited\u0000demonstrations, and multiple agents. For each context, we illustrate the\u0000critical challenges and introduce a series of fundamental methods to tackle\u0000these issues. This survey encompasses discrete, virtual, and realistic\u0000environments for evaluating ICRL agents. We also delve into the most pertinent\u0000applications of ICRL, such as autonomous driving, robot control, and sports\u0000analytics. To stimulate continuing research, we conclude the survey with a\u0000discussion of key unresolved questions in ICRL that can effectively foster a\u0000bridge between theoretical understanding and practical industrial applications.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"STAND: Data-Efficient and Self-Aware Precondition Induction for Interactive Task Learning","authors":"Daniel Weitekamp, Kenneth Koedinger","doi":"arxiv-2409.07653","DOIUrl":"https://doi.org/arxiv-2409.07653","url":null,"abstract":"STAND is a data-efficient and computationally efficient machine learning\u0000approach that produces better classification accuracy than popular approaches\u0000like XGBoost on small-data tabular classification problems like learning rule\u0000preconditions from interactive training. STAND accounts for a complete set of\u0000good candidate generalizations instead of selecting a single generalization by\u0000breaking ties randomly. STAND can use any greedy concept construction strategy,\u0000like decision tree learning or sequential covering, and build a structure that\u0000approximates a version space over disjunctive normal logical statements. Unlike\u0000candidate elimination approaches to version-space learning, STAND does not\u0000suffer from issues of version-space collapse from noisy data nor is it\u0000restricted to learning strictly conjunctive concepts. More importantly, STAND\u0000can produce a measure called instance certainty that can predict increases in\u0000holdout set performance and has high utility as an active-learning heuristic.\u0000Instance certainty enables STAND to be self-aware of its own learning: it knows\u0000when it learns and what example will help it learn the most. We illustrate that\u0000instance certainty has desirable properties that can help users select next\u0000training problems, and estimate when training is complete in applications where\u0000users interactively teach an AI a complex program.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zeno Kujawa, John Poole, Dobrik Georgiev, Danilo Numeroso, Pietro Liò
{"title":"Neural Algorithmic Reasoning with Multiple Correct Solutions","authors":"Zeno Kujawa, John Poole, Dobrik Georgiev, Danilo Numeroso, Pietro Liò","doi":"arxiv-2409.06953","DOIUrl":"https://doi.org/arxiv-2409.06953","url":null,"abstract":"Neural Algorithmic Reasoning (NAR) aims to optimize classical algorithms.\u0000However, canonical implementations of NAR train neural networks to return only\u0000a single solution, even when there are multiple correct solutions to a problem,\u0000such as single-source shortest paths. For some applications, it is desirable to\u0000recover more than one correct solution. To that end, we give the first method\u0000for NAR with multiple solutions. We demonstrate our method on two classical\u0000algorithms: Bellman-Ford (BF) and Depth-First Search (DFS), favouring deeper\u0000insight into two algorithms over a broader survey of algorithms. This method\u0000involves generating appropriate training data as well as sampling and\u0000validating solutions from model output. Each step of our method, which can\u0000serve as a framework for neural algorithmic reasoning beyond the tasks\u0000presented in this paper, might be of independent interest to the field and our\u0000results represent the first attempt at this task in the NAR literature.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paula Rodriguez-Diaz, Lingkai Kong, Kai Wang, David Alvarez-Melis, Milind Tambe
{"title":"What is the Right Notion of Distance between Predict-then-Optimize Tasks?","authors":"Paula Rodriguez-Diaz, Lingkai Kong, Kai Wang, David Alvarez-Melis, Milind Tambe","doi":"arxiv-2409.06997","DOIUrl":"https://doi.org/arxiv-2409.06997","url":null,"abstract":"Comparing datasets is a fundamental task in machine learning, essential for\u0000various learning paradigms; from evaluating train and test datasets for model\u0000generalization to using dataset similarity for detecting data drift. While\u0000traditional notions of dataset distances offer principled measures of\u0000similarity, their utility has largely been assessed through prediction error\u0000minimization. However, in Predict-then-Optimize (PtO) frameworks, where\u0000predictions serve as inputs for downstream optimization tasks, model\u0000performance is measured through decision regret minimization rather than\u0000prediction error minimization. In this work, we (i) show that traditional\u0000dataset distances, which rely solely on feature and label dimensions, lack\u0000informativeness in the PtO context, and (ii) propose a new dataset distance\u0000that incorporates the impacts of downstream decisions. Our results show that\u0000this decision-aware dataset distance effectively captures adaptation success in\u0000PtO contexts, providing a PtO adaptation bound in terms of dataset distance.\u0000Empirically, we show that our proposed distance measure accurately predicts\u0000transferability across three different PtO tasks from the literature.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Unified Contrastive Loss for Self-Training","authors":"Aurelien Gauffre, Julien Horvat, Massih-Reza Amini","doi":"arxiv-2409.07292","DOIUrl":"https://doi.org/arxiv-2409.07292","url":null,"abstract":"Self-training methods have proven to be effective in exploiting abundant\u0000unlabeled data in semi-supervised learning, particularly when labeled data is\u0000scarce. While many of these approaches rely on a cross-entropy loss function\u0000(CE), recent advances have shown that the supervised contrastive loss function\u0000(SupCon) can be more effective. Additionally, unsupervised contrastive learning\u0000approaches have also been shown to capture high quality data representations in\u0000the unsupervised setting. To benefit from these advantages in a semi-supervised\u0000setting, we propose a general framework to enhance self-training methods, which\u0000replaces all instances of CE losses with a unique contrastive loss. By using\u0000class prototypes, which are a set of class-wise trainable parameters, we\u0000recover the probability distributions of the CE setting and show a theoretical\u0000equivalence with it. Our framework, when applied to popular self-training\u0000methods, results in significant performance improvements across three different\u0000datasets with a limited number of labeled data. Additionally, we demonstrate\u0000further improvements in convergence speed, transfer ability, and hyperparameter\u0000stability. The code is available at\u0000url{https://github.com/AurelienGauffre/semisupcon/}.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-Invasive Glucose Prediction System Enhanced by Mixed Linear Models and Meta-Forests for Domain Generalization","authors":"Yuyang Sun, Panagiotis Kosmas","doi":"arxiv-2409.07308","DOIUrl":"https://doi.org/arxiv-2409.07308","url":null,"abstract":"In this study, we present a non-invasive glucose prediction system that\u0000integrates Near-Infrared (NIR) spectroscopy and millimeter-wave (mm-wave)\u0000sensing. We employ a Mixed Linear Model (MixedLM) to analyze the association\u0000between mm-wave frequency S_21 parameters and blood glucose levels within a\u0000heterogeneous dataset. The MixedLM method considers inter-subject variability\u0000and integrates multiple predictors, offering a more comprehensive analysis than\u0000traditional correlation analysis. Additionally, we incorporate a Domain\u0000Generalization (DG) model, Meta-forests, to effectively handle domain variance\u0000in the dataset, enhancing the model's adaptability to individual differences.\u0000Our results demonstrate promising accuracy in glucose prediction for unseen\u0000subjects, with a mean absolute error (MAE) of 17.47 mg/dL, a root mean square\u0000error (RMSE) of 31.83 mg/dL, and a mean absolute percentage error (MAPE) of\u000010.88%, highlighting its potential for clinical application. This study marks a\u0000significant step towards developing accurate, personalized, and non-invasive\u0000glucose monitoring systems, contributing to improved diabetes management.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chufan Gao, Mandis Beigi, Afrah Shafquat, Jacob Aptekar, Jimeng Sun
{"title":"TrialSynth: Generation of Synthetic Sequential Clinical Trial Data","authors":"Chufan Gao, Mandis Beigi, Afrah Shafquat, Jacob Aptekar, Jimeng Sun","doi":"arxiv-2409.07089","DOIUrl":"https://doi.org/arxiv-2409.07089","url":null,"abstract":"Analyzing data from past clinical trials is part of the ongoing effort to\u0000optimize the design, implementation, and execution of new clinical trials and\u0000more efficiently bring life-saving interventions to market. While there have\u0000been recent advances in the generation of static context synthetic clinical\u0000trial data, due to both limited patient availability and constraints imposed by\u0000patient privacy needs, the generation of fine-grained synthetic time-sequential\u0000clinical trial data has been challenging. Given that patient trajectories over\u0000an entire clinical trial are of high importance for optimizing trial design and\u0000efforts to prevent harmful adverse events, there is a significant need for the\u0000generation of high-fidelity time-sequence clinical trial data. Here we\u0000introduce TrialSynth, a Variational Autoencoder (VAE) designed to address the\u0000specific challenges of generating synthetic time-sequence clinical trial data.\u0000Distinct from related clinical data VAE methods, the core of our method\u0000leverages Hawkes Processes (HP), which are particularly well-suited for\u0000modeling event-type and time gap prediction needed to capture the structure of\u0000sequential clinical trial data. Our experiments demonstrate that TrialSynth\u0000surpasses the performance of other comparable methods that can generate\u0000sequential clinical trial data, in terms of both fidelity and in enabling the\u0000generation of highly accurate event sequences across multiple real-world\u0000sequential event datasets with small patient source populations when using\u0000minimal external information. Notably, our empirical findings highlight that\u0000TrialSynth not only outperforms existing clinical sequence-generating methods\u0000but also produces data with superior utility while empirically preserving\u0000patient privacy.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}