Duen Horng Chau, C. Faloutsos, Hanghang Tong, Jason I. Hong, Brian Gallagher, Tina Eliassi-Rad
{"title":"GRAPHITE: A Visual Query System for Large Graphs","authors":"Duen Horng Chau, C. Faloutsos, Hanghang Tong, Jason I. Hong, Brian Gallagher, Tina Eliassi-Rad","doi":"10.1109/ICDMW.2008.99","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.99","url":null,"abstract":"We present Graphite, a system that allows the user to visually construct a query pattern, finds both its exact and approximate matching subgraphs in large attributed graphs, and visualizes the matches. For example, in a social network where a person's occupation is an attribute, the user can draw a 'star' query for \"finding a CEO who has interacted with a Secretary, a Manager, and an Accountant, or a structure very similar to this\". Graphite uses the G-Ray algorithm to run the query against a user-chosen data graph, gaining all of its benefits, namely its high speed, scalability, and its ability to find both exact and near matches. Therefore, for the example above, Graphite tolerates indirect paths between, say, the CEO and the Accountant, when no direct path exists. Graphite uses fast algorithms to estimate node proximities when finding matches, enabling it to scale well with the graph database size.We demonstrate Graphitepsilas usage and benefits using the DBLP author-publication graph, which consists of 356 K nodes and 1.9 M edges. A demo video of Graphite can be downloaded at http://www.cs.cmu.edu/~dchau/graphite/graphite.mov.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125279680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Method for Multi-view Face Clustering in Video Sequence","authors":"Panpan Huang, Yunhong Wang, Ming Shao","doi":"10.1109/ICDMW.2008.63","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.63","url":null,"abstract":"In the problem of face clustering with multi-views, the similarity between faces of different persons with similar pose is usually greater than the similarity between multi-view faces of the same person. This may exert a tremendous impact on the clustering result that sent back to the user. To solve this problem, we should do pose clustering first and then within each dasiapose grouppsila, clustering images of different individuals. Gabor filters have been used to detect the eyes in the face image. The coordinate of the eyes have been extracted as an input feature for the dasiapose clusteringpsila. After doing this, images of the similar pose will be in the same cluster. PCA/ LBP and kmeans algorithms have been used in each pose cluster for clustering of different individuals. The precision of face classification with clustering is enhanced. The proposed clustering algorithms can be applied to and face indexing or face recognition system.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125399243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ontology-Based Protein-Protein Interactions Extraction from Literature Using the Hidden Vector State Model","authors":"Yulan He, K. Nakata, Deyu Zhou","doi":"10.1109/ICDMW.2008.11","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.11","url":null,"abstract":"This paper proposes a novel framework of incorporating protein-protein interactions (PPI) ontology knowledge into PPI extraction from biomedical literature in order to address the emerging challenges of deep natural language understanding. It is built upon the existing work on relation extraction using the hidden vector state (HVS) model. The HVS model belongs to the category of statistical learning methods. It can be trained directly from un-annotated data in a constrained way whilst at the same time being able to capture the underlying named entity relationships. However, it is difficult to incorporate background knowledge or non-local information into the HVS model. This paper proposes to represent the HVS model as a conditionally trained undirected graphical model in which non-local features derived from PPI ontology through inference would be easily incorporated. The seamless fusion of ontology inference with statistical learning produces a new paradigm to information extraction.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125415976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Sánchez, M. Martín-Bautista, Ignacio J. Blanco, Consuelo Justicia de la Torre
{"title":"Text Knowledge Mining: An Alternative to Text Data Mining","authors":"D. Sánchez, M. Martín-Bautista, Ignacio J. Blanco, Consuelo Justicia de la Torre","doi":"10.1109/ICDMW.2008.57","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.57","url":null,"abstract":"In this paper we introduced an alternative view of text mining and we review several alternative views proposed by different authors. We propose a classification of text mining techniques into two main groups: techniques based on inductive inference, that we call text data mining (TDM, comprising most of the existing proposals in the literature), and techniques based on deductive or abductive inference, that we call text knowledge mining (TKM). To our knowledge, the TKM view of text mining is new though, as we shall show, several existing techniques could be considered in this group. We discuss about the possibilities and challenges of TKM techniques. We also discuss about the application of existing theories in possible future research in this field.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122018561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Independence and Contingency Matrix","authors":"S. Tsumoto, S. Hirano","doi":"10.1109/ICDMW.2008.94","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.94","url":null,"abstract":"This paper shows the meaning of Pearson residuals as an indicator of statistical independence. While information granules of statistical independence of two variables can be viewed as determinants of 2times2-submatrices, those of three variables consist of several combinations of linear equations which will become residuals for odds ratio (outer products) when they are equal to 0. Interestingly, the residuals can be an expansion series of the product of marginal distributions and the residuals for odds ratio (outer products).","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124864043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. May, D. Hecker, Christine Kopp, S. Scheider, Daniel Schulz
{"title":"A Vector-Geometry Based Spatial kNN-Algorithm for Traffic Frequency Predictions","authors":"M. May, D. Hecker, Christine Kopp, S. Scheider, Daniel Schulz","doi":"10.1109/ICDMW.2008.35","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.35","url":null,"abstract":"We introduce s-kNN, a nearest neighbor based spatial data mining algorithm. It belongs to the class of vector-geometry based algorithms that reason on complex spatial objects instead of point measurements. In contrast to most methods in this class, it does on the fly spatial computations that cannot be replaced by a pre-processing step without sacrificing efficiency. The key is a partial evaluation scheme for efficient computations. The algorithm is fully integrated into an object-relational spatial database. It is the basis for traffic frequency predictions (vehicles and pedestrians) for all German cities larger than 50,000 inhabitants and is the basis for pricing of posters in Germany.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124497156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining Behavioral and Social Network Data for Online Advertising","authors":"A. Bagherjeiran, R. Parekh","doi":"10.1109/ICDMW.2008.70","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.70","url":null,"abstract":"There are two main requirements for effective advertising in social networks. The first is that links in the social network are relevant to the targeted ads. The second is that social information can be easily incorporated with existing targeting methods to predict response rates. Our purpose in this paper is to investigate these requirements. We measure the relevance of a social network, the Yahoo! Instant Messenger graph, to classes of ads. We investigate the degree to which social network information complements existing user-profile information for targeting. We find that there is significant evidence in our social network of homophily, that links in the network indicate similar ad-relevant interests. We propose an ensemble classifier to combine existing user-only models with social network features to improve response predictions.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122815101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Region Classification with Decision Trees","authors":"J. V. Prehn, E. Smirnov","doi":"10.1109/ICDMW.2008.19","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.19","url":null,"abstract":"The region-classification task is to construct class regions containing the correct classes of the objects being classified with a given probability. To turn a point classifier into a region classifier, the conformal framework is used . However, applying the framework requires a non-conformity function. This function estimates the instances' non-conformity for the point classifier used. This paper studies how to turn decision trees into region classifiers. It considers two non-conformity functions. The first one is a general non-conformity function applicable to any point classifier . The second function is a specific non-conformity function for decision trees . Our main contribution is twofold. First we show, contrary to , that the general function outperforms the specific one for decision-tree region classifiers in terms of validity and efficiency of the class regions. Second, we show how the decision-tree complexity influences the quality of the class regions based on these two functions.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121490285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Study on the Reliability of Case-Based Reasoning Systems","authors":"Ke Wang, J. Liu, Weimin Ma","doi":"10.1109/ICDMW.2008.33","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.33","url":null,"abstract":"Case-based reasoning (CBR) is a methodology for problem solving, which suggests a solution to a new problem based on the previously-solved problems and their associated solutions. A key issue in this methodology is that can we always trust the solutions suggested by a case-based reasoning system? This paper studies the reliability of CBR systems at an overall level first. Factors affecting the reliability of a CBR system are discussed in this section, especially the property that whether its case library is compatible with the foundational assumption that \"similar problems have similar solutions.\" After that, the reliability of an individual suggested solution is studied. Some existing approaches which can be employed to estimate the reliability of a single solution are compared in this section. To illustrate these ideas, some experiments and their results are also discussed in this paper. It is shown that if a case library attains a high compatibility, then a satisfactory result can be expected, and the reliability of a CBR system at an overall level can be improved by identifying the reliable solutions.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122320423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining Correlated Pairs of Patterns in Multidimensional Structured Databases","authors":"Tomonobu Ozaki, T. Ohkawa","doi":"10.1109/ICDMW.2008.25","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.25","url":null,"abstract":"Structured data is becoming increasingly abundant in many application domains recently. In this paper, as one of the correlation mining, we propose new data mining problems of finding frequent and correlated pairs of patterns in structured databases. First, we consider the problem of finding all frequent and correlated pattern pairs in two dimensional structured databases. Then, two kinds of top-k mining problems are studied. To solve these problems efficiently, we develop a series of algorithms having powerful pruning capabilities. We also discuss the applicability of the proposed algorithms to the discovery of pattern pairs in single and multidimensional structured databases. The effectiveness of proposed algorithms is assessed through the experiments with synthetic and real world datasets.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"16 26","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120844780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}