Machine LearningPub Date : 2024-03-26DOI: 10.1007/s10994-024-06531-0
{"title":"Structure discovery in PAC-learning by random projections","authors":"","doi":"10.1007/s10994-024-06531-0","DOIUrl":"https://doi.org/10.1007/s10994-024-06531-0","url":null,"abstract":"<h3>Abstract</h3> <p>High dimensional learning is data-hungry in general; however, many natural data sources and real-world learning problems posses some hidden low-complexity structure that permit effective learning from relatively small sample sizes. We are interested in the general question of how to discover and exploit such hidden benign traits when problem-specific prior knowledge is insufficient. In this work, we address this question through random projection’s ability to expose structure. We study both compressive learning and high dimensional learning from this angle by introducing the notions of compressive distortion and compressive complexity. We give user-friendly PAC bounds in the agnostic setting that are formulated in terms of these quantities, and we show that our bounds can be tight when these quantities are small. We then instantiate these quantities in several examples of particular learning problems, demonstrating their ability to discover interpretable structural characteristics that make high dimensional instances of these problems solvable to good approximation in a random linear subspace. In the examples considered, these turn out to resemble some familiar benign traits such as the margin, the margin distribution, the intrinsic dimension, the spectral decay of the data covariance, or the norms of parameters—while our general notions of compressive distortion and compressive complexity serve to unify these, and may be used to discover benign structural traits for other PAC-learnable problems.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"45 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140311133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine LearningPub Date : 2024-03-26DOI: 10.1007/s10994-023-06499-3
Feliu Serra-Burriel, Pedro Delicado, Fernando M. Cucchietti, Eduardo Graells-Garrido, Alex Gil, Imanol Eguskiza
{"title":"When are they coming? Understanding and forecasting the timeline of arrivals at the FC Barcelona stadium on match days","authors":"Feliu Serra-Burriel, Pedro Delicado, Fernando M. Cucchietti, Eduardo Graells-Garrido, Alex Gil, Imanol Eguskiza","doi":"10.1007/s10994-023-06499-3","DOIUrl":"https://doi.org/10.1007/s10994-023-06499-3","url":null,"abstract":"<p>Futbol Club Barcelona operates the largest stadium in Europe (with a seating capacity of almost one hundred thousand people) and manages recurring sports events. These are influenced by multiple conditions (time and day of the week, weather, adversary) and affect city dynamics—e.g., peak demand for related services like public transport and stores. We study fine grain audience entrances at the stadium segregated by visitor type and gate to gain insights and predict the arrival behavior of future games, with a direct impact on the organizational performance and productivity of the business. We can forecast the timeline of arrivals at gate level 72 h prior to kickoff, facilitating operational and organizational decision-making by anticipating potential agglomerations and audience behavior. Furthermore, we can identify patterns for different types of visitors and understand how relevant factors affect them. These findings directly impact commercial and business interests and can alter operational logistics, venue management, and safety.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"72 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140310938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine LearningPub Date : 2024-03-26DOI: 10.1007/s10994-024-06533-y
Taeyoung Kim, Myungjoo Kang
{"title":"Bounding the Rademacher complexity of Fourier neural operators","authors":"Taeyoung Kim, Myungjoo Kang","doi":"10.1007/s10994-024-06533-y","DOIUrl":"https://doi.org/10.1007/s10994-024-06533-y","url":null,"abstract":"<p>Recently, several types of neural operators have been developed, including deep operator networks, graph neural operators, and Multiwavelet-based operators. Compared with these models, the Fourier neural operator (FNO), a physics-inspired machine learning method, is computationally efficient and can learn nonlinear operators between function spaces independent of a certain finite basis. This study investigated the bounding of the Rademacher complexity of the FNO based on specific group norms. Using capacity based on these norms, we bound the generalization error of the model. In addition, we investigate the correlation between the empirical generalization error and the proposed capacity of FNO. We infer that the type of group norm determines the information about the weights and architecture of the FNO model stored in capacity. The experimental results offer insight into the impact of the number of modes used in the FNO model on the generalization error. The results confirm that our capacity is an effective index for estimating generalization errors.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"42 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140316731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine LearningPub Date : 2024-03-26DOI: 10.1007/s10994-024-06530-1
Alejandro Kuratomi, Ioanna Miliou, Zed Lee, Tony Lindgren, Panagiotis Papapetrou
{"title":"Ijuice: integer JUstIfied counterfactual explanations","authors":"Alejandro Kuratomi, Ioanna Miliou, Zed Lee, Tony Lindgren, Panagiotis Papapetrou","doi":"10.1007/s10994-024-06530-1","DOIUrl":"https://doi.org/10.1007/s10994-024-06530-1","url":null,"abstract":"<p>Counterfactual explanations modify the feature values of an instance in order to alter its prediction from an undesired to a desired label. As such, they are highly useful for providing trustworthy interpretations of decision-making in domains where complex and opaque machine learning algorithms are utilized. To guarantee their quality and promote user trust, they need to satisfy the <i>faithfulness</i> desideratum, when supported by the data distribution. We hereby propose a counterfactual generation algorithm for mixed-feature spaces that prioritizes faithfulness through <i>k-justification</i>, a novel counterfactual property introduced in this paper. The proposed algorithm employs a graph representation of the search space and provides counterfactuals by solving an integer program. In addition, the algorithm is classifier-agnostic and is not dependent on the order in which the feature space is explored. In our empirical evaluation, we demonstrate that it guarantees k-justification while showing comparable performance to state-of-the-art methods in <i>feasibility</i>, <i>sparsity</i>, and <i>proximity</i>.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"47 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140311335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine LearningPub Date : 2024-03-22DOI: 10.1007/s10994-024-06517-y
Nuwan Gunasekara, Bernhard Pfahringer, Heitor Gomes, Albert Bifet
{"title":"Gradient boosted trees for evolving data streams","authors":"Nuwan Gunasekara, Bernhard Pfahringer, Heitor Gomes, Albert Bifet","doi":"10.1007/s10994-024-06517-y","DOIUrl":"https://doi.org/10.1007/s10994-024-06517-y","url":null,"abstract":"<p>Gradient Boosting is a widely-used machine learning technique that has proven highly effective in batch learning. However, its effectiveness in stream learning contexts lags behind bagging-based ensemble methods, which currently dominate the field. One reason for this discrepancy is the challenge of adapting the booster to new concept following a concept drift. Resetting the entire booster can lead to significant performance degradation as it struggles to learn the new concept. Resetting only some parts of the booster can be more effective, but identifying which parts to reset is difficult, given that each boosting step builds on the previous prediction. To overcome these difficulties, we propose Streaming Gradient Boosted Trees (<span>Sgbt</span>), which is trained using weighted squared loss elicited in <span>XGBoost</span>. <span>Sgbt</span> exploits trees with a replacement strategy to detect and recover from drifts, thus enabling the ensemble to adapt without sacrificing the predictive performance. Our empirical evaluation of <span>Sgbt</span> on a range of streaming datasets with challenging drift scenarios demonstrates that it outperforms current state-of-the-art methods for evolving data streams.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"25 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140205735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine LearningPub Date : 2024-03-22DOI: 10.1007/s10994-024-06532-z
{"title":"Optimal clustering from noisy binary feedback","authors":"","doi":"10.1007/s10994-024-06532-z","DOIUrl":"https://doi.org/10.1007/s10994-024-06532-z","url":null,"abstract":"<h3>Abstract</h3> <p>We study the problem of clustering a set of items from binary user feedback. Such a problem arises in crowdsourcing platforms solving large-scale labeling tasks with minimal effort put on the users. For example, in some of the recent reCAPTCHA systems, users clicks (binary answers) can be used to efficiently label images. In our inference problem, items are grouped into initially unknown non-overlapping clusters. To recover these clusters, the learner sequentially presents to users a finite list of items together with a question with a binary answer selected from a fixed finite set. For each of these items, the user provides a noisy answer whose expectation is determined by the item cluster and the question and by an item-specific parameter characterizing the <em>hardness</em> of classifying the item. The objective is to devise an algorithm with a minimal cluster recovery error rate. We derive problem-specific information-theoretical lower bounds on the error rate satisfied by any algorithm, for both uniform and adaptive (list, question) selection strategies. For uniform selection, we present a simple algorithm built upon the K-means algorithm and whose performance almost matches the fundamental limits. For adaptive selection, we develop an adaptive algorithm that is inspired by the derivation of the information-theoretical error lower bounds, and in turn allocates the budget in an efficient way. The algorithm learns to select items hard to cluster and relevant questions more often. We compare the performance of our algorithms with or without the adaptive selection strategy numerically and illustrate the gain achieved by being adaptive.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"24 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140204495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TOCOL: improving contextual representation of pre-trained language models via token-level contrastive learning","authors":"Keheng Wang, Chuantao Yin, Rumei Li, Sirui Wang, Yunsen Xian, Wenge Rong, Zhang Xiong","doi":"10.1007/s10994-023-06512-9","DOIUrl":"https://doi.org/10.1007/s10994-023-06512-9","url":null,"abstract":"<p>Self-attention, which allows transformers to capture deep bidirectional contexts, plays a vital role in BERT-like pre-trained language models. However, the maximum likelihood pre-training objective of BERT may produce an anisotropic word embedding space, which leads to biased attention scores for high-frequency tokens, as they are very close to each other in representation space and thus have higher similarities. This bias may ultimately affect the encoding of global contextual information. To address this issue, we propose TOCOL, a <b>TO</b>ken-Level <b>CO</b>ntrastive <b>L</b>earning framework for improving the contextual representation of pre-trained language models, which integrates a novel self-supervised objective to the attention mechanism to reshape the word representation space and encourages PLM to capture the global semantics of sentences. Results on the GLUE Benchmark show that TOCOL brings considerable improvement over the original BERT. Furthermore, we conduct a detailed analysis and demonstrate the robustness of our approach for low-resource scenarios.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"9 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140168796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stress detection with encoding physiological signals and convolutional neural network","authors":"Michela Quadrini, Antonino Capuccio, Denise Falcone, Sebastian Daberdaku, Alessandro Blanda, Luca Bellanova, Gianluca Gerard","doi":"10.1007/s10994-023-06509-4","DOIUrl":"https://doi.org/10.1007/s10994-023-06509-4","url":null,"abstract":"<p>Stress is a significant and growing phenomenon in the modern world that leads to numerous health problems. Robust and non-invasive method developments for early and accurate stress detection are crucial in enhancing people’s quality of life. Previous researches show that using machine learning approaches on physiological signals is a reliable stress predictor by achieving significant results. However, it requires determining features by hand. Such a selection is a challenge in this context since stress determines nonspecific human responses. This work overcomes such limitations by considering STREDWES, an approach for Stress Detection from Wearable Sensors Data. STREDWES encodes signal fragments of physiological signals into images and classifies them by a Convolutional Neural Network (CNN). This study aims to study several encoding methods, including the Gramian Angular Summation/Difference Field method and Markov Transition Field, to evaluate the best way to encode signals into images in this domain. Such a study is performed on the NEURO dataset. Moreover, we investigate the usefulness of STREDWES in real scenarios by considering the SWELL dataset and a personalized approach. Finally, we compare the proposed approach with its competitors by considering the WESAD dataset. It outperforms the others.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"8 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140152360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Glacier: guided locally constrained counterfactual explanations for time series classification","authors":"Zhendong Wang, Isak Samsten, Ioanna Miliou, Rami Mochaourab, Panagiotis Papapetrou","doi":"10.1007/s10994-023-06502-x","DOIUrl":"https://doi.org/10.1007/s10994-023-06502-x","url":null,"abstract":"<p>In machine learning applications, there is a need to obtain predictive models of high performance and, most importantly, to allow end-users and practitioners to understand and act on their predictions. One way to obtain such understanding is via counterfactuals, that provide sample-based explanations in the form of recommendations on which features need to be modified from a test example so that the classification outcome of a given classifier changes from an undesired outcome to a desired one. This paper focuses on the domain of time series classification, more specifically, on defining counterfactual explanations for univariate time series. We propose <span>Glacier</span>, a model-agnostic method for generating locally-constrained counterfactual explanations for time series classification using gradient search either on the original space or on a latent space that is learned through an auto-encoder. An additional flexibility of our method is the inclusion of constraints on the counterfactual generation process that favour applying changes to particular time series points or segments while discouraging changing others. The main purpose of these constraints is to ensure more reliable counterfactuals, while increasing the efficiency of the counterfactual generation process. Two particular types of constraints are considered, i.e., example-specific constraints and global constraints. We conduct extensive experiments on 40 datasets from the UCR archive, comparing different instantiations of <span>Glacier</span> against three competitors. Our findings suggest that <span>Glacier</span> outperforms the three competitors in terms of two common metrics for counterfactuals, i.e., proximity and compactness. Moreover, <span>Glacier</span> obtains comparable counterfactual validity compared to the best of the three competitors. Finally, when comparing the unconstrained variant of <span>Glacier</span> to the constraint-based variants, we conclude that the inclusion of example-specific and global constraints yields a good performance while demonstrating the trade-off between the different metrics.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"24 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140125822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine LearningPub Date : 2024-03-05DOI: 10.1007/s10994-024-06516-z
{"title":"Neural network relief: a pruning algorithm based on neural activity","authors":"","doi":"10.1007/s10994-024-06516-z","DOIUrl":"https://doi.org/10.1007/s10994-024-06516-z","url":null,"abstract":"<h3>Abstract</h3> <p>Current deep neural networks (DNNs) are overparameterized and use most of their neuronal connections during inference for each task. The human brain, however, developed specialized regions for different tasks and performs inference with a small fraction of its neuronal connections. We propose an iterative pruning strategy introducing a simple importance-score metric that deactivates unimportant connections, tackling overparameterization in DNNs and modulating the firing patterns. The aim is to find the smallest number of connections that is still capable of solving a given task with comparable accuracy, i.e. a simpler subnetwork. We achieve comparable performance for LeNet architectures on MNIST, and significantly higher parameter compression than state-of-the-art algorithms for VGG and ResNet architectures on CIFAR-10/100 and Tiny-ImageNet. Our approach also performs well for the two different optimizers considered—Adam and SGD. The algorithm is not designed to minimize FLOPs when considering current hardware and software implementations, although it performs reasonably when compared to the state of the art.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"26 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140044384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}