2010 IEEE International Conference on Data Mining最新文献

筛选
英文 中文
Category Mining by Heterogeneous Data Fusion Using PdLSI Model in a Retail Service 基于PdLSI模型的零售服务异构数据融合分类挖掘
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.83
Tsukasa Ishigaki, T. Takenaka, Y. Motomura
{"title":"Category Mining by Heterogeneous Data Fusion Using PdLSI Model in a Retail Service","authors":"Tsukasa Ishigaki, T. Takenaka, Y. Motomura","doi":"10.1109/ICDM.2010.83","DOIUrl":"https://doi.org/10.1109/ICDM.2010.83","url":null,"abstract":"This paper describes an appropriate category discovery method that simultaneously involves a customer's lifestyle category and item category for the sustainable management of retail services, designated as ``category mining''. Category mining is realized using a large-scale ID-POS data and customer's questionnaire responses with respect to their lifestyle. For the heterogeneous data fusion, we propose a probabilistic double-latent semantic indexing (PdLSI) model that is an extension of PLSI model. In the PdLSI model, customers and items are classified probabilistically into some latent lifestyle categories and latent item category. Then, understanding of relation between the latent categories and various purchased situations is realized using Bayesian network modeling. This method provides useful knowledge based on a large-scale data for efficient customer relationship management and category management, and can be applicable for other service industries.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117243111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Homotopy Regularization for Boosting 助推的同伦正则化
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.14
Zheng Wang, Yangqiu Song, Changshui Zhang
{"title":"Homotopy Regularization for Boosting","authors":"Zheng Wang, Yangqiu Song, Changshui Zhang","doi":"10.1109/ICDM.2010.14","DOIUrl":"https://doi.org/10.1109/ICDM.2010.14","url":null,"abstract":"In this paper, we present a homotopy regularization algorithm for boosting. We introduce a regularization term with adaptive weight into the boosting framework and compose a homotopy objective function. Optimization of this objective approximately composes a solution path for the regularized boosting. Following this path, we can find suitable solution efficiently using early stopping. Experiments show that this adaptive regularization method gives a more efficient parameter selection strategy than regularized boosting and semi supervised boosting algorithms, and significantly improves the performances of traditional AdaBoost and related methods.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125887588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Patterns on the Connected Components of Terabyte-Scale Graphs 太字节规模图的连接组件模式
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.121
U. Kang, Mary McGlohon, L. Akoglu, C. Faloutsos
{"title":"Patterns on the Connected Components of Terabyte-Scale Graphs","authors":"U. Kang, Mary McGlohon, L. Akoglu, C. Faloutsos","doi":"10.1109/ICDM.2010.121","DOIUrl":"https://doi.org/10.1109/ICDM.2010.121","url":null,"abstract":"How do connected components evolve? What are the regularities that govern the dynamic growth process and the static snapshot of the connected components? In this work, we study patterns in connected components of large, real-world graphs. First, we study one of the largest static Web graphs with billions of nodes and edges and analyze the regularities among the connected components using GFD(Graph Fractal Dimension) as our main tool. Second, we study several time evolving graphs and find dynamic patterns and rules that govern the dynamics of connected components. We analyze the growth rates of top connected components and study their relation over time. We also study the probability that a newcomer absorbs to disconnected components as a function of the current portion of the disconnected components and the degree of the newcomer. Finally, we propose a generative model that explains both the dynamic growth process and the static regularities of connected components.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126180900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Adaptive Distances on Sets of Vectors 向量集上的自适应距离
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.45
Adam Woznica, Alexandros Kalousis
{"title":"Adaptive Distances on Sets of Vectors","authors":"Adam Woznica, Alexandros Kalousis","doi":"10.1109/ICDM.2010.45","DOIUrl":"https://doi.org/10.1109/ICDM.2010.45","url":null,"abstract":"Recently, there has been a growing interest in learning distances directly from training data. While the previous works focused mainly on adapting distance measures over vectorial data, it is a well-known fact that many real-world data could not be easily represented as fixed length tuples of constants. In this paper we address this limitation and propose a novel class of distance learning techniques for learning problems in which instances are set of vectors, examples of such problems include, among others, automatic image annotation and graph classification. We investigate the behavior of the adaptive set distances on a number of artificial and real-world problems and demonstrate that they improve over the standard set distances.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129721831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Mining Public Transport Usage for Personalised Intelligent Transport Systems 挖掘个性化智能交通系统的公共交通使用
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.46
N. Lathia, Jon E. Froehlich, L. Capra
{"title":"Mining Public Transport Usage for Personalised Intelligent Transport Systems","authors":"N. Lathia, Jon E. Froehlich, L. Capra","doi":"10.1109/ICDM.2010.46","DOIUrl":"https://doi.org/10.1109/ICDM.2010.46","url":null,"abstract":"Traveller information, route planning, and service updates have become essential components of public transport systems: they help people navigate built environments by providing access to information regarding delays and service disruptions. However, one aspect that these systems lack is a way of tailoring the information they offer in order to provide personalised trip time estimates and relevant notifications to each traveller. Mining each user’s travel history, collected by automated ticketing systems, has the potential to address this gap. In this work, we analyse one such dataset of travel history on the London underground. We then propose and evaluate methods to (a) predict personalised trip times for the system users and (b) rank stations based on future mobility patterns, in order to identify the subset of stations that are of greatest interest to the user and thus provide useful travel updates.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128349320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Anonymizing Temporal Data 匿名化时态数据
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.96
Ke Wang, Yabo Xu, R. C. Wong, A. Fu
{"title":"Anonymizing Temporal Data","authors":"Ke Wang, Yabo Xu, R. C. Wong, A. Fu","doi":"10.1109/ICDM.2010.96","DOIUrl":"https://doi.org/10.1109/ICDM.2010.96","url":null,"abstract":"Temporal data are time-critical in that the snapshot at each timestamp must be made available to researchers in a timely fashion. However, due to the limited data, each snapshot likely has a skewed distribution on sensitive values, which renders classical anonymization methods not possible. In this work, we propose the “reposition model” to allow a record to be published within a close proximity of original timestamp. We show that reposition over a small proximity of timestamp is sufficient for reducing the skewness of a snapshot, therefore, minimizing the impact on window queries. We formalize the optimal reposition problem and present a linear-time solution. The contribution of this work is that it enables classical methods on temporal data.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131093950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Spatiotemporal Event Detection in Mobility Network 移动网络中的时空事件检测
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.29
S. T. Au, Rong Duan, Heeyoung Kim, Guangqin Ma
{"title":"Spatiotemporal Event Detection in Mobility Network","authors":"S. T. Au, Rong Duan, Heeyoung Kim, Guangqin Ma","doi":"10.1109/ICDM.2010.29","DOIUrl":"https://doi.org/10.1109/ICDM.2010.29","url":null,"abstract":"Learning and identifying events in network traffic is crucial for service providers to improve their mobility network performance. In fact, large special events attract cell phone users to relative small areas, which causes sudden surge in network traffic. To handle such increased load, it is necessary to measure the increased network traffic and quantify the impact of the events, so that relevant resources can be optimized to enhance the network capability. However, this problem is challenging due to several issues: (1) Multiple periodic temporal traffic patterns (i.e., nonhomogeneous process) even for normal traffic, (2) Irregularly distributed spatial neighbor information, (3) Different temporal patterns driven by different events even for spatial neighborhoods, (4) Large scale data set. This paper proposes a systematic event detection method that deals with the above problems. With the additivity property of Poisson process, we propose an algorithm to integrate spatial information by aggregating the behavior of temporal data under various areas. Markov Modulated Nonhomogeneous Poisson Process (MMNHPP) is employed to estimate the probability with which event happens, when and where the events take place, and assess the spatial and temporal impacts of the events. Localized events are then ranked globally for prioritizing more significant events. Synthetic data are generated to illustrate our procedure and validate the performance. An industrial example from a telecommunication company is also presented to show the effectiveness of the proposed method.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130411867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Discovering Overlapping Groups in Social Media 发现社交媒体中的重叠群体
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.48
Xufei Wang, Lei Tang, Huiji Gao, Huan Liu
{"title":"Discovering Overlapping Groups in Social Media","authors":"Xufei Wang, Lei Tang, Huiji Gao, Huan Liu","doi":"10.1109/ICDM.2010.48","DOIUrl":"https://doi.org/10.1109/ICDM.2010.48","url":null,"abstract":"The increasing popularity of social media is shortening the distance between people. Social activities, e.g., tagging in Flickr, book marking in Delicious, twittering in Twitter, etc. are reshaping people’s social life and redefining their social roles. People with shared interests tend to form their groups in social media, and users within the same community likely exhibit similar social behavior (e.g., going for the same movies, having similar political viewpoints), which in turn reinforces the community structure. The multiple interactions in social activities entail that the community structures are often overlapping, i.e., one person is involved in several communities. We propose a novel co-clustering framework, which takes advantage of networking information between users and tags in social media, to discover these overlapping communities. In our method, users are connected via tags and tags are connected to users. This explicit representation of users and tags is useful for understanding group evolution by looking at who is interested in what. The efficacy of our method is supported by empirical evaluation in both synthetic and online social networking data.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125341008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 162
Algorithm for Discovering Low-Variance 3-Clusters from Real-Valued Datasets 从实值数据集中发现低方差3-聚类的算法
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.77
Zhen Hu, R. Bhatnagar
{"title":"Algorithm for Discovering Low-Variance 3-Clusters from Real-Valued Datasets","authors":"Zhen Hu, R. Bhatnagar","doi":"10.1109/ICDM.2010.77","DOIUrl":"https://doi.org/10.1109/ICDM.2010.77","url":null,"abstract":"The concept of Triclusters has been investigated recently in the context of two relational datasets that share labels along one of the dimensions. By simultaneously processing two datasets to unveil triclusters, new useful knowledge and insights can be obtained. However, some recently reported methods are either closely linked to specific problems or constrain datasets to have some specific distributions. Algorithms for generating triclusters whose cell-values demonstrate simple well known statistical properties, such as upper bounds on standard deviations, are needed for many applications. In this paper we present a 3-Clustering algorithm that searches for meaningful combinations of biclusters in two related datasets. The algorithm can handle situations involving: (i) datasets in which a few data objects may be present in only one dataset and not in both datasets, (ii) the two datasets may have different numbers of objects and/or attributes, and (iii) the cell-value distributions in two datasets may be different. In our formulation the cell-values of each selected tricluster, formed by two independent biclusters, are such that the standard deviations in each bicluster obeys an upper bound and the sets of objects in the two biclusters overlap to the maximum possible extent. We present validation of our algorithm by presenting the properties of the 3-Clusters discovered from a synthetic dataset and from a real world cross-species genomic dataset. The results of our algorithm unveil interesting insights for the cross-species genomic domain.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126457313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
A Pairwise-Systematic Microaggregation for Statistical Disclosure Control 统计披露控制的双系统微聚集
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.111
M. E. Kabir, Hua Wang, Yanchun Zhang
{"title":"A Pairwise-Systematic Microaggregation for Statistical Disclosure Control","authors":"M. E. Kabir, Hua Wang, Yanchun Zhang","doi":"10.1109/ICDM.2010.111","DOIUrl":"https://doi.org/10.1109/ICDM.2010.111","url":null,"abstract":"Microdata protection in statistical databases has recently become a major societal concern and has been intensively studied in recent years. Statistical Disclosure Control (SDC) is often applied to statistical databases before they are released for public use. Micro aggregation for SDC is a family of methods to protect micro data from individual identification. SDC seeks to protect micro data in such a way that can be published and mined without providing any private information that can be linked to specific individuals. Micro aggregation works by partitioning the micro data into groups of at least k records and then replacing the records in each group with the centroid of the group. An optimal micro aggregation method must minimize the information loss resulting from this replacement process. The challenge is how to minimize the information loss during the micro aggregation process. This paper presents a pair wise systematic (P-S) micro aggregation method to minimize the information loss. The proposed technique simultaneously forms two distant groups at a time with the corresponding similar records together in a systematic way and then anonymized with the centroid of each group individually. The structure of P-S problem is defined and investigated and an algorithm of the proposed problem is developed. The performance of the P-S algorithm is compared against the most recent micro aggregation methods. Experimental results show that P-S algorithm incurs less than half information loss than the latest micro aggregation methods for all of the test situations.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125181922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信