2021 IEEE International Conference on Data Mining (ICDM)最新文献_第6页

A new multiple instance algorithm using structural information 一种新的基于结构信息的多实例算法

2021 IEEE International Conference on Data Mining (ICDM) Pub Date : 2021-12-01 DOI: 10.1109/ICDM51629.2021.00204

Xiaoyan Zhu, Ting Wang, Jiayin Wang, Ying Xu, Yuqian Liu

引用次数: 0

A Lookahead Algorithm for Robust Subspace Recovery 一种鲁棒子空间恢复的前瞻算法

2021 IEEE International Conference on Data Mining (ICDM) Pub Date : 2021-12-01 DOI: 10.1109/ICDM51629.2021.00175

Guihong Wan, H. Schweitzer

引用次数: 1

Group-Level Cognitive Diagnosis: A Multi-Task Learning Perspective 群体水平认知诊断:多任务学习视角

2021 IEEE International Conference on Data Mining (ICDM) Pub Date : 2021-12-01 DOI: 10.1109/ICDM51629.2021.00031

Jie Huang, Qi Liu, Fei Wang, Zhenya Huang, Songtao Fang, Runze Wu, Enhong Chen, Yu Su, Shijin Wang

{"title":"Group-Level Cognitive Diagnosis: A Multi-Task Learning Perspective","authors":"Jie Huang, Qi Liu, Fei Wang, Zhenya Huang, Songtao Fang, Runze Wu, Enhong Chen, Yu Su, Shijin Wang","doi":"10.1109/ICDM51629.2021.00031","DOIUrl":"https://doi.org/10.1109/ICDM51629.2021.00031","url":null,"abstract":"Most cognitive diagnosis research in education has been concentrated on individual assessment, aiming at discovering the latent characteristics of students. However, in many real-world scenarios, group-level assessment is an important and meaningful task, e.g., class assessment in different regions can discover the difference of teaching level in different contexts. In this work, we consider assessing cognitive ability for a group of students, which aims to mine groups’ proficiency on specific knowledge concepts. The significant challenge in this task is the sparsity of group-exercise response data, which seriously affects the assessment performance. Existing works either do not make effective use of additional student-exercise response data or fail to reasonably model the relationship between group ability and individual ability in different learning contexts, resulting in sub-optimal diagnosis results. To this end, we propose a general Multi-Task based Group-Level Cognitive Diagnosis (MGCD) framework, which is featured with three special designs: 1) We jointly model student-exercise responses and group-exercise responses in a multi-task manner to alleviate the sparsity of group-exercise responses; 2) We design a context-aware attention network to model the relationship between student knowledge state and group knowledge state in different contexts; 3) We model an interpretable cognitive layer to obtain student ability, group ability and exercise factors (e.g., difficulty), and then we leverage neural networks to learn complex interaction functions among them. Extensive experiments on real-world datasets demonstrate the generality of MGCD and the effectiveness of our attention design and multi-task learning.","PeriodicalId":320970,"journal":{"name":"2021 IEEE International Conference on Data Mining (ICDM)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114401566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

DAC-ML: Domain Adaptable Continuous Meta-Learning for Urban Dynamics Prediction 面向城市动态预测的领域自适应连续元学习

2021 IEEE International Conference on Data Mining (ICDM) Pub Date : 2021-12-01 DOI: 10.1109/ICDM51629.2021.00102

Xin Zhang, Yanhua Li, Xun Zhou, Oren Mangoubi, Ziming Zhang, Vincent Filardi, Jun Luo

{"title":"DAC-ML: Domain Adaptable Continuous Meta-Learning for Urban Dynamics Prediction","authors":"Xin Zhang, Yanhua Li, Xun Zhou, Oren Mangoubi, Ziming Zhang, Vincent Filardi, Jun Luo","doi":"10.1109/ICDM51629.2021.00102","DOIUrl":"https://doi.org/10.1109/ICDM51629.2021.00102","url":null,"abstract":"Given the underlying road network of an urban area, the problem of urban dynamics prediction aims to capture the patterns of urban dynamics and to forecast short-term urban traffic status continuously from the historical observations. This problem is of fundamental importance to urban traffic management, planning, and various business services. However, predicting urban dynamics is challenging due to the highly dynamic (i.e., varying across geographical locations and evolving over time) and uncertain (i.e., affected by unexpected factors) nature of urban traffic systems. Recent works adopt meta-learning approaches to capture irregular and rare patterns but make unrealistic assumptions such as single-domain uncertainties and explicit temporal task segmentation. In this paper, we solve the urban dynamics prediction problem from the Bayesian meta-learning perspective and propose a novel domain adaptable continuous meta-learning approach (DAC-ML) that does not require task segmentation. Trained on a sequence of spatial-temporal urban dynamics data, DAC-ML aims to detect and infer unobserved latent variations (from task and domain levels) and generalize well in a sequential prediction setting, where the underlying data generating process varies over time. Experimental results on three real-world datasets demonstrate that DAC-ML can outperform baselines in urban dynamics prediction, especially when obvious urban dynamics and temporal uncertainties are present.","PeriodicalId":320970,"journal":{"name":"2021 IEEE International Conference on Data Mining (ICDM)","volume":"532 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116494069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Adversarial Learning of Balanced Triangles for Accurate Community Detection on Signed Networks 基于平衡三角形对抗学习的签名网络社区精确检测

2021 IEEE International Conference on Data Mining (ICDM) Pub Date : 2021-12-01 DOI: 10.1109/ICDM51629.2021.00137

Yoonsuk Kang, Woncheol Lee, Yeon-Chang Lee, Kyungsik Han, Sang-Wook Kim

{"title":"Adversarial Learning of Balanced Triangles for Accurate Community Detection on Signed Networks","authors":"Yoonsuk Kang, Woncheol Lee, Yeon-Chang Lee, Kyungsik Han, Sang-Wook Kim","doi":"10.1109/ICDM51629.2021.00137","DOIUrl":"https://doi.org/10.1109/ICDM51629.2021.00137","url":null,"abstract":"In this paper, we propose a framework for embedding-based community detection on signed networks. It first represents all the nodes of a signed network as vectors in low-dimensional embedding space and conducts a clustering algorithm (e.g., k-means) on vectors, thereby detecting a community structure in the network. When performing the embedding process, our framework learns only the edges belonging to balanced triangles whose edge signs follow the balance theory, significantly excluding noise edges in learning. To address the sparsity of balanced triangles in a signed network, our framework learns not only the edges in balanced real-triangles but those in balanced virtual-triangles that are produced by our generator. Finally, our framework employs adversarial learning to generate more-realistic balanced virtual-triangles with less noise edges. Through extensive experiments using seven real-world networks, we validate the effectiveness of (1) learning edges belonging to balanced real/virtual-triangles and (2) employing adversarial learning for signed network embedding. We show that our framework consistently and significantly outperforms the state-of-the-art community detection methods in all datasets.","PeriodicalId":320970,"journal":{"name":"2021 IEEE International Conference on Data Mining (ICDM)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128500168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Towards Stochastic Neural Network via Feature Distribution Calibration 基于特征分布标定的随机神经网络

2021 IEEE International Conference on Data Mining (ICDM) Pub Date : 2021-12-01 DOI: 10.1109/ICDM51629.2021.00186

Han Yang, Min Wang, Yun Zhou, Yongxin Yang

{"title":"Towards Stochastic Neural Network via Feature Distribution Calibration","authors":"Han Yang, Min Wang, Yun Zhou, Yongxin Yang","doi":"10.1109/ICDM51629.2021.00186","DOIUrl":"https://doi.org/10.1109/ICDM51629.2021.00186","url":null,"abstract":"Stochastic neural network (SNN) has attracted increasing attention in recent years, which benefits several important tasks by modeling samples uncertainly, such as adversarial defense, label noise robustness, and model calibration. The current implementations of existing stochastic neural networks are mainly Gaussian noise injection, e.g., deep Variational Information Bottleneck (VIB) uses fixed Gaussian prior to derive noise injection, simple and effective stochastic neural network (SE-SNN) uses a non-informative Gaussian prior to implement it. However, Gaussian distribution assumption is insufficient to model more complex distributions of data in practical, such as the skewed distribution or multi-modal distribution. In this paper, we relax the strict Gaussian prior assumption, and propose a novel distribution calibrated stochastic neural network (DCSNN) which integrates two successive steps. These two steps are as follows: 1) The trained feature vector is preprocessed to make its feature distribution closer to the Gaussian-like distribution. 2) Gaussian distribution’s mean and variance are used to model the sample’s activation indeterminacy. The experimental results show that, compared with the existing methods, our proposed method can achieve state-of-the-art results in a variety of datasets, backbone architectures and multiple applications.","PeriodicalId":320970,"journal":{"name":"2021 IEEE International Conference on Data Mining (ICDM)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134166263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Compressibility of Distributed Document Representations 分布式文档表示的可压缩性

2021 IEEE International Conference on Data Mining (ICDM) Pub Date : 2021-12-01 DOI: 10.1109/ICDM51629.2021.00166

Blaž Škrlj, Matej Petković

{"title":"Compressibility of Distributed Document Representations","authors":"Blaž Škrlj, Matej Petković","doi":"10.1109/ICDM51629.2021.00166","DOIUrl":"https://doi.org/10.1109/ICDM51629.2021.00166","url":null,"abstract":"Contemporary natural language processing (NLP) revolves around learning from latent document representations, generated either implicitly by neural language models or explicitly by methods such as doc2vec or similar. One of the key properties of the obtained representations is their dimension. Whilst the commonly adopted dimensions of 256 and 768 offer sufficient performance on many tasks, it is many times unclear whether the default dimension is the most suitable choice for the subsequent downstream learning tasks. Furthermore, representation dimensions are seldom subject to hyperparameter tunning due to computational constraints. The purpose of this paper is to demonstrate that a surprisingly simple and efficient recursive compression procedure can be sufficient to both significantly compress the initial representation, but also potentially improve its performance when considering the task of text classification. Having smaller and less noisy representations is the desired property during deployment, as orders of magnitude smaller models can significantly reduce the computational overload and with it the deployment costs. We propose CORE, a straightforward, compression-agnostic framework suitable for representation compression. The CORE’S performance is showcased and studied on a collection of 17 real-life corpora from biomedical, news, social media, and literary domains. We explored CORE’S behavior when considering contextual and non-contextual document representations, different compression levels, and 9 different compression algorithms. Current results based on more than 100,000 compression experiments indicate that recursive Singular Value Decomposition offers a very good trade-off between the compression efficiency and performance, making CORE useful in many existing, representation-dependent NLP pipelines.","PeriodicalId":320970,"journal":{"name":"2021 IEEE International Conference on Data Mining (ICDM)","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116135601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Isolation Kernel Density Estimation 隔离核密度估计

2021 IEEE International Conference on Data Mining (ICDM) Pub Date : 2021-12-01 DOI: 10.1109/ICDM51629.2021.00073

K. Ting, Takashi Washio, Jonathan R. Wells, Hang Zhang

引用次数: 2

Federated Principal Component Analysis for Genome-Wide Association Studies 全基因组关联研究的联合主成分分析

2021 IEEE International Conference on Data Mining (ICDM) Pub Date : 2021-12-01 DOI: 10.1109/ICDM51629.2021.00127

Anne Hartebrodt, Reza Nasirigerdeh, David B. Blumenthal, Richard Röttger

{"title":"Federated Principal Component Analysis for Genome-Wide Association Studies","authors":"Anne Hartebrodt, Reza Nasirigerdeh, David B. Blumenthal, Richard Röttger","doi":"10.1109/ICDM51629.2021.00127","DOIUrl":"https://doi.org/10.1109/ICDM51629.2021.00127","url":null,"abstract":"Federated learning (FL) has emerged as a privacy-aware alternative to centralized data analysis, especially for biomedical analyses such as genome-wide association studies (GWAS). The data remains with the owner, which enables studies previously impossible due to privacy protection regulations. Principal component analysis (PCA) is a frequent preprocessing step in GWAS, where the eigenvectors of the sample-by-sample covariance matrix are used as covariates in the statistical tests. Therefore, a federated version of PCA suitable for vertical data partitioning is required for federated GWAS. Existing federated PCA algorithms exchange the complete sample eigenvectors, a potential privacy breach. In this paper, we present a federated PCA algorithm for vertically partitioned data which does not exchange the sample eigenvectors and is hence suitable for federated GWAS. We show that it outperforms existing federated solutions in terms of convergence behavior and scalability. Additionally, we provide a user-friendly privacy-aware web tool to promote acceptance of federated PCA among GWAS researchers.","PeriodicalId":320970,"journal":{"name":"2021 IEEE International Conference on Data Mining (ICDM)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125880933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Fast computation of distance-generalized cores using sampling 基于采样的距离广义核的快速计算

2021 IEEE International Conference on Data Mining (ICDM) Pub Date : 2021-12-01 DOI: 10.1007/s10115-023-01830-9

Nikolaj Tatti

引用次数: 0