Georgios Mavroudeas, M. Magdon-Ismail, Xiao Shou, Kristin P. Bennett
{"title":"HMM-Boost: Improved Time Series State Prediction Via Supervised Hidden Markov Models: Case Studies in Epileptic Seizure and Complex Care Management","authors":"Georgios Mavroudeas, M. Magdon-Ismail, Xiao Shou, Kristin P. Bennett","doi":"10.1109/ICDMW58026.2022.00050","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00050","url":null,"abstract":"We give a method for time series state prediction with a lazy teacher who only partially labels states, in particular only those states of an extreme nature. Hence, the labeling is not only lazy, but biased. Our method has two stages: (i) Impute new state labels for unlabeled states using a relabeling Hidden Markov Model, and in so doing treat the labeling bias. (ii) Use a supervised framework with the relabeled data. Our method is general, agnostic to the application and the supervised framework being used. We show compelling results in synthetic data and two real applications: epilepsy and complex care management. Our HMM-relabeling approach allows us to tackle time series with extremely sparse labels.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132177771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Augmenting Graph Convolution with Distance Preserving Embedding for Improved Learning","authors":"Guojing Cong, Seung-Hwan Lim, Steven Young","doi":"10.1109/ICDMW58026.2022.00012","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00012","url":null,"abstract":"Graph convolution incorporates topological information of a graph into learning. Message passing corresponds to traversal of a local neighborhood in classical graph algorithms. We show that incorporating additional global structures, such as shortest paths, through distance preserving embedding can improve performance. Our approach, Gavotte, significantly improves the performance of a range of popular graph neu-ral networks such as GCN, GA T,Graph SAGE, and GCNII for transductive learning. Gavotte also improves the performance of graph neural networks for full-supervised tasks, albeit to a smaller degree. As high-quality embeddings are generated by Gavotte as a by-product, we leverage clustering algorithms on these embed dings to augment the training set and introduce Gavotte+. Our results of Gavotte+ on datasets with very few labels demonstrate the advantage of augmenting graph convolution with distance preserving embedding.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131834463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Agrim Sachdeva, Ben Lazarine, Ruchik Dama, S. Samtani, Hongyi Zhu
{"title":"Identifying Patterns of Vulnerability Incidence in Foundational Machine Learning Repositories on GitHub: An Unsupervised Graph Embedding Approach","authors":"Agrim Sachdeva, Ben Lazarine, Ruchik Dama, S. Samtani, Hongyi Zhu","doi":"10.1109/ICDMW58026.2022.00084","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00084","url":null,"abstract":"The rapid pace of the development of artificial intelligence (AI) solutions is enabled by leveraging foundational tools and frameworks that allow AI developers to focus on application logic and rapid prototyping. However, the security vulnerabilities present in foundation repositories might cause irreparable damage due to the AI solutions built using these libraries being deployed in production environments. Our research leverages source code hosted on the prevailing social coding platform GitHub to identify vulnerabilities in foundational repositories commonly used for modern AI development (Linux, BERT, PyTorch, and Transformers), as well as the AI repositories that utilize foundation repositories as dependencies. Using an unsupervised graph embedding approach, we generate graph embeddings that capture vulnerability information and the relationships between repositories. Based on these embeddings, we performed clustering as our downstream task to group similarly vulnerable repositories. Our research identifies patterns and similarities between repositories and will help develop effective mitigation of vulnerabilities present in groups of repositories based on foundational AI repositories. We also discuss the implications of identifying such clusters of vulnerable repositories.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114357632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SV-Learn: Learning Matrix Singular Values with Neural Networks","authors":"Derek Xu, William Shiao, Jia Chen, E. Papalexakis","doi":"10.1109/ICDMW58026.2022.00039","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00039","url":null,"abstract":"The singular value decomposition (SVD) factors a matrix into three separate matrices: two (semi-)unitary matrices whose columns are left/right singular vectors and one diagonal matrix whose diagonal entries are singular values. Typically, performing SVD on big matrices is taxing due to its compu-tational complexity in the cubic order of its dimensions. With the advances and rapid growth of deep learning techniques in a broad spectrum of applications, a fundamental question arises: can deep neural networks learn the singular values of a matrix? To answer this question, we propose a novel algorithm, namely SV-Iearn, to predict the singular values of a given input matrix by leveraging the advances of neural networks. Numerical results demonstrate that our proposed method outperforms the competing alternatives in terms of achieving lower normalized mean square error on singular value prediction when using real-world datasets. Further, the predicted singular values combined with singular vectors of an input data allow us to reconstruct the input matrices with promising performance.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125052442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"cSmartML-Glassbox: Increasing Transparency and Controllability in Automated Clustering","authors":"Radwa El Shawi, S. Sakr","doi":"10.1109/ICDMW58026.2022.00015","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00015","url":null,"abstract":"Machine learning algorithms have been widely employed in various applications and fields. Novel technologies in automated machine learning (AutoML) ease algorithm selection and hyperparameter optimization complexity. AutoML frame-works have achieved notable success in hyperparameter tuning and surpassed the performance of human experts. However, depending on such frameworks as black-box can leave machine learning practitioners without insights into the inner working of the AutoML process and hence influence their trust in the models produced. In addition, excluding humans from the loop creates several limitations. For example, most of the current AutoML frameworks ignore the user preferences on defining or controlling the search space, which consequently can impact the performance of the models produced and the acceptance of these models by the end-users. The research in the area of transparency and controllability of AutoML has attracted much interest lately, both in academia and industry. However, existing tools are usually restricted to supervised learning tasks such as classification and regression, while unsupervised learning, particularly clustering, remains a largely unexplored problem. Motivated by these shortcomings, we design and implement cSmartML-GlassBox, an interactive visualization tool that en-ables users to refine the search space of AutoML and analyze the results. cSmartML-GlassBox is equipped with a recommendation engine to recommend a time budget that is likely adequate for a new dataset to obtain well-performing pipeline. In addition, the tool supports multi-granularity visualization to enable machine learning practitioners to monitor the AutoML process, analyze the explored configurations and refine/control the search space. Furthermore, cSmartML-GlassBox is equipped with a logging mechanism such that repeated runs on the same dataset can be more effective by avoiding evaluating the same previously considered configurations. We demonstrate the effectiveness and usability of the cSmartML-GlassBox through a user evaluation study with 23 participants and an expert-based usability study based on four experts. We find that the proposed tool increases users' understanding and trust in the AutoML frameworks.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129571501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Venkata M. V. Gunturi, Rakesh Rajeev, Vipul Bondre, Aaditya Barnwal, Samir Jain, Ashank Anshuman, Manish Gupta
{"title":"A Case Study on Periodic Spatio- Temporal Hotspot Detection in Azure Traffic Data","authors":"Venkata M. V. Gunturi, Rakesh Rajeev, Vipul Bondre, Aaditya Barnwal, Samir Jain, Ashank Anshuman, Manish Gupta","doi":"10.1109/ICDMW58026.2022.00135","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00135","url":null,"abstract":"Given a spatio-temporal event framework E and a collection of time-stamped events A (over E), the goal of the periodic spatio-temporal hotspot detection (PST-Hotspot) problem is to determine spatial regions which show high “intensity” of events at certain periodic intervals. The output of the PST-Hotspot detection problem consists of the following: (a) a col-lection of spatial regions (which show high intensity of events) and, (b) their respective time intervals of high activity and periodicity values (e.g., daily, weekday-only, etc). PST-Hotspot detection poses significant challenge for designing a suitable interest measure. The aim over here is to design a mathematical representation of a PST-Hotspot such that it can differentiate interesting periodic patterns from trivial persistent patterns in the dataset. The current state of the art in the area of spatial and spatio-temporal hotspot detection focus on non-periodic patterns. In contrast, our proposed approach is able to determine periodic hotspots. We experimentally evaluated our proposed algorithm using real Azure traffic dataset from the Indian region.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127129500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guixiang Wang, Hongwei Yin, Wenjun Hu, Y. Liu, Ruiqin Wang
{"title":"Joint Low-rank and Orthogonal Deep Multi-view Subspace Clustering based on Local Fusion","authors":"Guixiang Wang, Hongwei Yin, Wenjun Hu, Y. Liu, Ruiqin Wang","doi":"10.1109/ICDMW58026.2022.00017","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00017","url":null,"abstract":"In recent years, a number of multi-view clustering methods have been proposed through a global fusion paradigm. These methods take the entire sample space as the fusion object, where the global complementarity between views is explored and exploited to improve the clustering performance. However, local structures with strong or weak clustering capacity could coexist in each view. The traditional global fusion paradigm ignores the differences in clustering capacity of local structures, which makes it impossible to explore and exploit local complementarity between views. In this paper, a novel deep multi view subspace clustering method based on local fusion is proposed to solve this problem. First, a low rank self-expression layer is inserted into the deep autoencoder to eliminate the influence of noises when obtaining local cluster structure. Then, the fusion object is refined from the entire sample space to the local cluster structure, where a self-weighted strategy is designed to assign contribution weight according to the clustering capacity of the local cluster structure. Meanwhile, we joint orthogonal constraint to enhance the discriminative of local cluster structure that is more suitable for downstream clustering task. Experiments on several real-world datasets show that the proposed method achieves better clustering performance than most traditional multi-view clustering methods based on global fusion.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127398369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incremental Learning in Time-series Data using Reinforcement Learning","authors":"Mustafa Shuqair, J. Jimenez-shahed, B. Ghoraani","doi":"10.1109/ICDMW58026.2022.00115","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00115","url":null,"abstract":"System monitoring has become an area of interest with the increasing growth in wearable sensors and continuous monitoring tools. However, the generalizability of the classification models to unseen incoming data remains challenging. This paper proposes a novel architecture based on reinforcement learning (RL) to incre-mentally learn patterns of time-series data and detect changes in the system state. Our rationale is that RL's ability to learn from past experiences can help increase the performance and generalizability of classification models in time-series monitoring applications. Our novel definition of the environment consists of a set of one-class anomaly detectors to define environment states based on the dynamics of the incoming data and a reward function to reward the RL agent according to its actions. A deep RL agent incrementally learns to perform continuous, binary classification predictions according to the environment states and the received reward. We applied the proposed model for detecting response to medication (ON or OFF) in patients with Parkinson's disease (PD). The PD dataset consisted of 170 minutes of time-series movement signals collected from 12 patients using two wearable sensors. Our proposed model, with a testing accuracy of 77.95%, outperformed Adaptive Boosting, Multi-layer Perceptron, and Support Vector Machines with 53.10%, 44.92%, and 52.70% testing accuracy, respectively. The proposed model had a slight decline in the F-score, decreasing from 88.15% validation score to 78.42% in testing, a significantly slight decline compared to the other three models. These evidence the potential of the proposed RL-based classifier in time-series monitoring applications as a highly generalizable model for unseen incoming data.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127319013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Image Processing Techniques to Identify and Quantify Spatiotemporal Carbon Cycle Extremes","authors":"Bharat Sharma, J. Kumar, A. Ganguly, F. Hoffman","doi":"10.1109/ICDMW58026.2022.00148","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00148","url":null,"abstract":"Rising atmospheric carbon dioxide due to human activities through fossil fuel emissions and land use changes have increased climate extremes such as heat waves and droughts that have led to and are expected to increase the occurrence of carbon cycle extremes. Carbon cycle extremes represent large anomalies in the carbon cycle that are associated with gains or losses in carbon uptake. Carbon cycle extremes could be continuous in space and time and cross political boundaries. Here, we present a methodology to identify large spatiotemporal extremes (STEs) in the terrestrial carbon cycle using image processing tools for feature detection. We characterized the STE events based on neighborhood structures that are three-dimensional adjacency matrices for the detection of spatiotemporal manifolds of carbon cycle extremes. We found that the area affected and carbon loss during negative carbon cycle extremes were consistent with continuous neighborhood structures. In the gross primary production data we used, 100 carbon cycle STEs accounted for more than 75% of all the negative carbon cycle extremes. This paper presents a comparative analysis of the magnitude of carbon cycle STEs and attribution of those STEs to climate drivers as a function of neighborhood structures for two observational datasets and an Earth system model simulation.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126411172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Huang, Yu Yan, Xiao Zheng, Xiwen Qu, Xudong Hong
{"title":"Discovering Unknown Labels for Multi-Label Image Classification","authors":"Jun Huang, Yu Yan, Xiao Zheng, Xiwen Qu, Xudong Hong","doi":"10.1109/ICDMW58026.2022.00108","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00108","url":null,"abstract":"A multi-label learning (MLL) method can simul-taneously process the instances with multiple labels, and many well-known methods have been proposed to solve various MLL-related problems. The existing MLL methods are mainly applied under the assumption of a fixed label set, i.e., the class labels are all observed for the training data. However, in many real-world applications, there may be some unknown labels outside of this set, especially for large-scale and complex datasets. In this paper, a multi-label classification model based on deep learning is proposed to discover the unknown labels for multi-label image classification. It can simultaneously predict known and unknown labels for unseen images. Besides, an attention mechanism is introduced into the model, where the attention maps of unknown labels can be used to observe the corresponding objects of an image and to get the semantic information of these unknown labels.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126573765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}