Seventh IEEE International Conference on Data Mining (ICDM 2007)最新文献

筛选
英文 中文
Exploration of Link Structure and Community-Based Node Roles in Network Analysis 网络分析中链路结构与社区节点角色的探索
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.37
J. Scripps, P. Tan, A. Esfahanian
{"title":"Exploration of Link Structure and Community-Based Node Roles in Network Analysis","authors":"J. Scripps, P. Tan, A. Esfahanian","doi":"10.1109/ICDM.2007.37","DOIUrl":"https://doi.org/10.1109/ICDM.2007.37","url":null,"abstract":"Communities are nodes in a network that are grouped together based on a common set of properties. While the communities and link structures are often thought to be in alignment, it may not be the case when the communities are defined using other external criterion. In this paper we provide a new way to measure the alignment. We also provide a new metric that can be used to estimate the number of communities to which a node is attached. This metric, along with degree, is used to assign a community-based role to nodes. We demonstrate the usefulness of the community-based node roles by applying them to the influence maximization problem.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130980730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Local Word Bag Model for Text Categorization 文本分类的局部词袋模型
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.69
Wen Pu, Ning Liu, Shuicheng Yan, Jun Yan, Kunqing Xie, Zheng Chen
{"title":"Local Word Bag Model for Text Categorization","authors":"Wen Pu, Ning Liu, Shuicheng Yan, Jun Yan, Kunqing Xie, Zheng Chen","doi":"10.1109/ICDM.2007.69","DOIUrl":"https://doi.org/10.1109/ICDM.2007.69","url":null,"abstract":"Many text processing applications adopted the bag of words (BOW) model representation of documents, in which each document is represented as a vector of weighted terms or n-grams, and then the cosine distance between two vectors is used as the similarity measurement. Although the great success in information retrieval and text categorization, the conventional BOW model ignores the detailed local text information, i.e. the co-occurrence pattern of words at sentence or paragraph level. In this paper, we propose a novel approach to represent a document as a set of local tf-idf vectors, or what we called local word bags (LWB). By encapsulating local information distributed around a document into multiple LWBs, we can measure the similarity of two documents via the partial match of their corresponding local bags. To perform the matching efficiently, we introduce the local word bag kernel (LWB kernel), a variant of VG-Pyramid match kernel. The new kernel enables the discriminative machine learning methods like SVM to compute the partial matching between two sets of LWBs in linear time after an one time hierarchical clustering procedure over all local bags at the initialization stage. Experiments on real world datasets demonstrate the effectiveness of our new approach.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123141427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Structure-Based Statistical Features and Multivariate Time Series Clustering 基于结构的统计特征与多元时间序列聚类
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.103
Xiaozhe Wang, Anthony Wirth, Liang Wang
{"title":"Structure-Based Statistical Features and Multivariate Time Series Clustering","authors":"Xiaozhe Wang, Anthony Wirth, Liang Wang","doi":"10.1109/ICDM.2007.103","DOIUrl":"https://doi.org/10.1109/ICDM.2007.103","url":null,"abstract":"We propose a new method for clustering multivariate time series. A univariate time series can be represented by a fixed-length vector whose components are statistical features of the time series, capturing the global structure. These descriptive vectors, one for each component of the multivariate time series, are concatenated, before being clustered using a standard fast clustering algorithm such as k-means or hierarchical clustering. Such statistical feature extraction also serves as a dimension-reduction procedure for multivariate time series. We demonstrate the effectiveness and simplicity of our proposed method by clustering human motion sequences: dynamic and high-dimensional multivariate time series. The proposed method based on univariate time series structure and statistical metrics provides a novel, yet simple and flexible way to cluster multivariate time series data efficiently with promising accuracy. The success of our method on the case study suggests that clustering may be a valuable addition to the tools available for human motion pattern recognition research.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122850343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks 趋势母题:动态复杂网络分析的图挖掘方法
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.92
R. Jin, Scott McCallen, E. Almaas
{"title":"Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks","authors":"R. Jin, Scott McCallen, E. Almaas","doi":"10.1109/ICDM.2007.92","DOIUrl":"https://doi.org/10.1109/ICDM.2007.92","url":null,"abstract":"Complex networks have been used successfully in scientific disciplines ranging from sociology to microbiology to describe systems of interacting units. Until recently, studies of complex networks have mainly focused on their network topology. However, in many real world applications, the edges and vertices have associated attributes that are frequently represented as vertex or edge weights. Furthermore, these weights are often not static, instead changing with time and forming a time series. Hence, to fully understand the dynamics of the complex network, we have to consider both network topology and related time series data. In this work, we propose a motif mining approach to identify trend motifs for such purposes. Simply stated, a trend motif describes a recurring subgraph where each of its vertices or edges displays similar dynamics over a user- defined period. Given this, each trend motif occurrence can help reveal significant events in a complex system; frequent trend motifs may aid in uncovering dynamic rules of change for the system, and the distribution of trend motifs may characterize the global dynamics of the system. Here, we have developed efficient mining algorithms to extract trend motifs. Our experimental validation using three disparate empirical datasets, ranging from the stock market, world trade, to a protein interaction network, has demonstrated the efficiency and effectiveness of our approach.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125871494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Active Learning from Data Streams 从数据流主动学习
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.101
Xingquan Zhu, Peng Zhang, Xiaodong Lin, Yong Shi
{"title":"Active Learning from Data Streams","authors":"Xingquan Zhu, Peng Zhang, Xiaodong Lin, Yong Shi","doi":"10.1109/ICDM.2007.101","DOIUrl":"https://doi.org/10.1109/ICDM.2007.101","url":null,"abstract":"In this paper, we address a new research problem on active learning from data streams where data volumes grow continuously and labeling all data is considered expensive and impractical. The objective is to label a small portion of stream data from which a model is derived to predict newly arrived instances as accurate as possible. In order to tackle the challenges raised by data streams' dynamic nature, we propose a classifier ensembling based active learning framework which selectively labels instances from data streams to build an accurate classifier. A minimal variance principle is introduced to guide instance labeling from data streams. In addition, a weight updating rule is derived to ensure that our instance labeling process can adaptively adjust to dynamic drifting concepts in the data. Experimental results on synthetic and real-world data demonstrate the performances of the proposed efforts in comparison with other simple approaches.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132647474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 116
Efficient Algorithms for Mining Significant Substructures in Graphs with Quality Guarantees 具有质量保证的图中重要子结构的高效挖掘算法
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.11
Huahai He, Ambuj K. Singh
{"title":"Efficient Algorithms for Mining Significant Substructures in Graphs with Quality Guarantees","authors":"Huahai He, Ambuj K. Singh","doi":"10.1109/ICDM.2007.11","DOIUrl":"https://doi.org/10.1109/ICDM.2007.11","url":null,"abstract":"Graphs have become popular for modeling scientific data in recent years. As a result, techniques for mining graphs are extremely important for understanding inherent data and domain characteristics. One such exploratory mining paradigm is the k-MST (minimum spanning tree over k vertices) problem that can be used to discover significant local substructures. In this paper, we present an efficient approximation algorithm for the k-MST problem in large graphs. The algorithm has an O(radic/k) approximation ratio and O(n log n + in log m log k + nk2 log k) running time, where n and m are the number of vertices and edges respectively. Experimental results on synthetic graphs and protein interaction networks show that the algorithm is scalable to large graphs and useful for discovering biological pathways. The highlight of the algorithm is that it offers both analytical guarantees and empirical evidence of good running time and quality.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130627255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Transitional Patterns and Their Significant Milestones 过渡模式及其重要里程碑
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.87
Qian Wan, Aijun An
{"title":"Transitional Patterns and Their Significant Milestones","authors":"Qian Wan, Aijun An","doi":"10.1109/ICDM.2007.87","DOIUrl":"https://doi.org/10.1109/ICDM.2007.87","url":null,"abstract":"Mining frequent patterns in transaction databases has been studied extensively in data mining research. However, most of the existing frequent pattern mining algorithms do not consider the time stamps associated with the transactions. In this paper, we extend the existing frequent pattern mining framework to take into account the time stamp of each transaction and discover patterns whose frequency dramatically changes over time. We define a new type of patterns, called transitional patterns, to capture the dynamic behavior of frequent patterns in a transaction database. Transitional patterns include both positive and negative transitional patterns. Their frequencies increase/decrease dramatically at some time points of a transaction database. We introduce the concept of significant milestones for a transitional pattern, which are time points at which the frequency of the pattern changes most significantly. Moreover, we develop an algorithm to mine from a transaction database the set of transitional patterns along with their significant milestones. Our experimental studies on real-world databases illustrate that mining positive and negative transitional patterns is highly promising as a practical and useful approach to discovering novel and interesting knowledge from large databases.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132495890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Finding Predictive Runs with LAPS 利用LAPS找到预测运行
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.84
Suhrid Balakrishnan, D. Madigan
{"title":"Finding Predictive Runs with LAPS","authors":"Suhrid Balakrishnan, D. Madigan","doi":"10.1109/ICDM.2007.84","DOIUrl":"https://doi.org/10.1109/ICDM.2007.84","url":null,"abstract":"We present an extension to the Lasso [6] for binary classification problems with ordered attributes. Inspired by the Fused Lasso [5] and the Group Lasso [7, 3] models, we aim to both discover and model runs (contiguous subgroups of the variables) that are highly predictive. We call the extended model LAPS (the Lasso with Attribute Partition Search). Such problems commonly arise in financial and medical domains, where predictors are time series variables, for example. This paper outlines the formulation of the problem, an algorithm to obtain the model coefficients and experiments showing applicability to practical problems of this type.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"265 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132814461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Mechanism Design for Clustering Aggregation by Selfish Systems 自私系统聚类聚集的机制设计
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.80
Pinata Winoto, Yiu-ming Cheung, Jiming Liu
{"title":"Mechanism Design for Clustering Aggregation by Selfish Systems","authors":"Pinata Winoto, Yiu-ming Cheung, Jiming Liu","doi":"10.1109/ICDM.2007.80","DOIUrl":"https://doi.org/10.1109/ICDM.2007.80","url":null,"abstract":"We propose a market mechanism that can be implemented on clustering aggregation problem among selfish systems, which tend to lie about their correct clustering during aggregation process. Our study is the preliminary step toward the development of robust distributed data mining among selfish systems.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"20 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128449849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Analyzing and Detecting Review Spam 分析和检测评论垃圾邮件
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.68
Nitin Jindal, B. Liu
{"title":"Analyzing and Detecting Review Spam","authors":"Nitin Jindal, B. Liu","doi":"10.1109/ICDM.2007.68","DOIUrl":"https://doi.org/10.1109/ICDM.2007.68","url":null,"abstract":"Mining of opinions from product reviews, forum posts and blogs is an important research topic with many applications. However, existing research has been focused on extraction, classification and summarization of opinions from these sources. An important issue that has not been studied so far is the opinion spam or the trustworthiness of online opinions. In this paper, we study this issue in the context of product reviews. To our knowledge, there is still no published study on this topic, although Web page spam and email spam have been investigated extensively. We will see that review spam is quite different from Web page spam and email spam, and thus requires different detection techniques. Based on the analysis of 5.8 million reviews and 2.14 million reviewers from amazon.com, we show that review spam is widespread. In this paper, we first present a categorization of spam reviews and then propose several techniques to detect them.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134608965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 232
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信