Seventh IEEE International Conference on Data Mining (ICDM 2007)最新文献

筛选
英文 中文
A Cascaded Approach to Biomedical Named Entity Recognition Using a Unified Model 使用统一模型的生物医学命名实体识别的级联方法
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.20
Shing-Kit Chan, Wai Lam, Xiaofeng Yu
{"title":"A Cascaded Approach to Biomedical Named Entity Recognition Using a Unified Model","authors":"Shing-Kit Chan, Wai Lam, Xiaofeng Yu","doi":"10.1109/ICDM.2007.20","DOIUrl":"https://doi.org/10.1109/ICDM.2007.20","url":null,"abstract":"We propose a cascaded approach for extracting biomedical named entities from text documents using a unified model. Previous works often ignore the high computational cost incurred by a single-phase approach. We alleviate this problem by dividing the named entity extraction task into a segmentation task and a classification task, reducing the computational cost by an order of magnitude. A unified model, which we term \"maximum-entropy margin-based\" (MEMB), is used in both tasks. The MEMB model considers the error between a correct and an incorrect output during training and helps improve the performance of extracting sparse entity types that occur in biomedical literature. We report experimental evaluations on the GENIA corpus available from the BioNLP/NLPBA (2004) shared task, which demonstrate the state-of-the-art performance achieved by the proposed approach.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129711303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Lightweight Distributed Trust Propagation 轻量级分布式信任传播
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.64
D. Quercia, S. Hailes, L. Capra
{"title":"Lightweight Distributed Trust Propagation","authors":"D. Quercia, S. Hailes, L. Capra","doi":"10.1109/ICDM.2007.64","DOIUrl":"https://doi.org/10.1109/ICDM.2007.64","url":null,"abstract":"Using mobile devices, such as smart phones, people may create and distribute different types of digital content (e.g., photos, videos). One of the problems is that digital content, being easy to create and replicate, may likely swamp users rather than informing them. To avoid that, users may organize content producers that they know and trust in a web of trust. Users may then reason about this web of trust to form opinions about content producers with whom they have never interacted before. These opinions will then determine whether content is accepted. The process of forming opinions is called trust propagation. We design a mechanism for mobile devices that effectively propagates trust and that is lightweight and distributed (as opposed to previous work that focuses on centralized propagation). This mechanism uses a graph-based learning technique. We evaluate the effectiveness (predictive accuracy) of this mechanism against a large real-world data set. We also evaluate the computational cost of a J2ME implementation on a mobile phone.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"24 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120859924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Zonal Co-location Pattern Discovery with Dynamic Parameters 基于动态参数的区域共定位模式发现
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.102
Mete Celik, James M. Kang, S. Shekhar
{"title":"Zonal Co-location Pattern Discovery with Dynamic Parameters","authors":"Mete Celik, James M. Kang, S. Shekhar","doi":"10.1109/ICDM.2007.102","DOIUrl":"https://doi.org/10.1109/ICDM.2007.102","url":null,"abstract":"Zonal co-location patterns represent subsets of feature- types that are frequently located in a subset of space (i.e., zone). Discovering zonal spatial co-location patterns is an important problem with many applications in areas such as ecology, public health, and homeland defense. However, discovering these patterns with dynamic parameters (i.e., repeated specification of zone and interest measure values according to user preferences) is computationally complex due to the repetitive mining process. Also, the set of candidate patterns is exponential in the number of feature types, and spatial datasets are huge. Previous studies have focused on discovering global spatial co-location patterns with a fixed interest measure threshold. In this paper, we propose an indexing structure for co-location patterns and propose algorithms (Zoloc-Miner) to discover zonal co- location patterns efficiently for dynamic parameters. Extensive experimental evaluation shows our proposed approaches are scalable, efficient, and outperform naive alternatives.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"309 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122421515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Change-Point Detection in Time-Series Data Based on Subspace Identification 基于子空间识别的时间序列数据变化点检测
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.78
Y. Kawahara, T. Yairi, K. Machida
{"title":"Change-Point Detection in Time-Series Data Based on Subspace Identification","authors":"Y. Kawahara, T. Yairi, K. Machida","doi":"10.1109/ICDM.2007.78","DOIUrl":"https://doi.org/10.1109/ICDM.2007.78","url":null,"abstract":"In this paper, we propose series of algorithms for detecting change points in time-series data based on subspace identification, meaning a geometric approach for estimating linear state-space models behind time-series data. Our algorithms are derived from the principle that the subspace spanned by the columns of an observability matrix and the one spanned by the subsequences of time-series data are approximately equivalent. In this paper, we derive a batch-type algorithm applicable to ordinary time-series data, i.e. consisting of only output series, and then introduce the online version of the algorithm and the extension to be available with input-output time-series data. We illustrate the effectiveness of our algorithms with comparative experiments using some artificial and real datasets.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131175256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 119
Using Significant, Positively Associated and Relatively Class Correlated Rules for Associative Classification of Imbalanced Datasets 利用显著、正相关和相对类相关规则对不平衡数据集进行关联分类
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.63
Florian Verhein, S. Chawla
{"title":"Using Significant, Positively Associated and Relatively Class Correlated Rules for Associative Classification of Imbalanced Datasets","authors":"Florian Verhein, S. Chawla","doi":"10.1109/ICDM.2007.63","DOIUrl":"https://doi.org/10.1109/ICDM.2007.63","url":null,"abstract":"The application of association rule mining to classification has led to a new family of classifiers which are often referred to as \"associative classifiers (ACs)\". An advantage of ACs is that they are rule-based and thus lend themselves to an easier interpretation. Rule-based classifiers can play a very important role in applications such as medical diagnosis and fraud detection where \"imbalanced data sets\" are the norm and not the exception. The focus of this paper is to extend and modify ACs for classification on imbalanced data sets using only statistical techniques. We combine the use of statistically significant rules with a new measure, the Class Correlation Ratio (CCR), to build an AC which we call SPARCCC. Experiments show that in terms of classification quality, SPARCCC performs comparably on balanced datasets and outperforms other AC techniques on imbalanced data sets. It also has a significantly smaller rule base and is much more computationally efficient.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125304369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Latent Dirichlet Conditional Naive-Bayes Models 潜在狄利克雷条件朴素贝叶斯模型
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.55
A. Banerjee, Hanhuai Shan
{"title":"Latent Dirichlet Conditional Naive-Bayes Models","authors":"A. Banerjee, Hanhuai Shan","doi":"10.1109/ICDM.2007.55","DOIUrl":"https://doi.org/10.1109/ICDM.2007.55","url":null,"abstract":"In spite of the popularity of probabilistic mixture models for latent structure discovery from data, mixture models do not have a natural mechanism for handling sparsity, where each data point only has a few non-zero observations. In this paper, we introduce conditional naive-Bayes (CNB) models, which generalize naive-Bayes mixture models to naturally handle sparsity by conditioning the model on observed features. Further, we present latent Dirichlet conditional naive-Bayes (LD-CNB) models, which constitute a family of powerful hierarchical Bayesian models for latent structure discovery from sparse data. The proposed family of models are quite general and can work with arbitrary regular exponential family conditional distributions. We present a variational inference based EM algorithm for learning along with special case analyses for Gaussian and discrete distributions. The efficacy of the proposed models are demonstrated by extensive experiments on a wide variety of different datasets.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114075865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Weighted Additive Criterion for Linear Dimension Reduction 线性降维的加权加性准则
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.81
Jing Peng, S. Robila
{"title":"Weighted Additive Criterion for Linear Dimension Reduction","authors":"Jing Peng, S. Robila","doi":"10.1109/ICDM.2007.81","DOIUrl":"https://doi.org/10.1109/ICDM.2007.81","url":null,"abstract":"Linear discriminant analysis (LDA) for dimension reduction has been applied to a wide variety of face recognition tasks. However, it has two major problems. First, it suffers from the small sample size problem when dimensionality is greater than the sample size. Second, it creates subspaces that favor well separated classes over those that are not. In this paper, we propose a simple weighted criterion for linear dimension reduction that addresses the above two problems associated with LDA. In addition, there are well established numerical procedures such as semi-definite programming for efficiently computing the proposed criterion. We demonstrate the efficacy of our proposal and compare it against other competing techniques using a number of examples.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115378517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Supervised Learning by Training on Aggregate Outputs 基于总输出训练的监督学习
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.50
D. Musicant, J. Christensen, Jamie F. Olson
{"title":"Supervised Learning by Training on Aggregate Outputs","authors":"D. Musicant, J. Christensen, Jamie F. Olson","doi":"10.1109/ICDM.2007.50","DOIUrl":"https://doi.org/10.1109/ICDM.2007.50","url":null,"abstract":"Supervised learning is a classic data mining problem where one wishes to be be able to predict an output value associated with a particular input vector. We present a new twist on this classic problem where, instead of having the training set contain an individual output value for each input vector, the output values in the training set are only given in aggregate over a number of input vectors. This new problem arose from a particular need in learning on mass spectrometry data, but could easily apply to situations when data has been aggregated in order to maintain privacy. We provide a formal description of this new problem for both classification and regression. We then examine how k-nearest neighbor, neural networks, and support vector machines can be adapted for this problem.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128658910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 91
Understanding Discrete Classifiers with a Case Study in Gene Prediction 以基因预测为例理解离散分类器
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.40
M. Subianto, A. Siebes
{"title":"Understanding Discrete Classifiers with a Case Study in Gene Prediction","authors":"M. Subianto, A. Siebes","doi":"10.1109/ICDM.2007.40","DOIUrl":"https://doi.org/10.1109/ICDM.2007.40","url":null,"abstract":"The requirement that the models resulting from data mining should be understandable is an uncontroversial requirement. In the data mining literature, however, it plays hardly any role, if at all. In practice, though, understandability is often even more important than, e.g., accuracy. Understandability does not mean that models should be simple. It means that one should be able to understand the predictions of models. In this paper we introduce tools to understand arbitrary classifiers defined on discrete data. More in particular, we introduce Explanations that provide insight at a local level. They explain why a classifier classifies a data point as it does. For global insight, we introduce attribute weights. The higher the weight of an attribute, the more often it is decisive in the classification of a data point. To illustrate our tools, we describe a case study in the prediction of small genes. This is a notoriously hard problem in bioinformatics.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128666443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Confident Identification of Relevant Objects Based on Nonlinear Rescaling Method and Transductive Inference 基于非线性重标法和转换推理的相关目标自信识别
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.24
S. Ho, R. Polyak
{"title":"Confident Identification of Relevant Objects Based on Nonlinear Rescaling Method and Transductive Inference","authors":"S. Ho, R. Polyak","doi":"10.1109/ICDM.2007.24","DOIUrl":"https://doi.org/10.1109/ICDM.2007.24","url":null,"abstract":"We present a novel machine learning algorithm to identify relevant objects from a large amount of data. This approach is driven by linear discrimination based on nonlinear rescaling (NR) method and transductive inference. The NR algorithm for linear discrimination (NRLD) computes both the primal and the dual approximation at each step. The dual variables associated with the given labeled data-set provide important information about the objects in the data-set and play the key role in ordering these objects. A confidence score based on a transductive inference procedure using NRLD is used to rank and identify the relevant objects from a pool of unlabeled data. Experimental results on an unbalanced protein data-set for the drug target prioritization and identification problem are used to illustrate the feasibility of the proposed identification algorithm.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130195682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信