Jiyang Chen, Justin Fagnan, R. Goebel, Reihaneh Rabbany, Farzad Sangi, M. Takaffoli, Eric Verbeek, Osmar R Zaiane
{"title":"Meerkat: Community Mining with Dynamic Social Networks","authors":"Jiyang Chen, Justin Fagnan, R. Goebel, Reihaneh Rabbany, Farzad Sangi, M. Takaffoli, Eric Verbeek, Osmar R Zaiane","doi":"10.1109/ICDMW.2010.40","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.40","url":null,"abstract":"Meerkat is a tool for visualization and community mining of social networks. It is being developed to offer novel algorithms and functionality that other tools do not possess. Meerkat’s features include navigation through graphical representations of networks, network querying and filtering, a multitude of graphical layout algorithms, community mining using recently developed algorithms, and dynamic network event analysis using recently published algorithms. These features will allow more insightful exploratory analysis and more robust inferences about communities and the significance of entity relationships. Meerkat is under active development, and future features will include additional options for community mining and visualization, focusing on algorithms and user interface designs not existing in other social network analysis tools.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130215595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Frequent Closed Itemset Mining with Privacy Preserving for Distributed Databases","authors":"Shin-ya Kuno, K. Doi, Akihiro Yamamoto","doi":"10.1109/ICDMW.2010.135","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.135","url":null,"abstract":"In the present paper we introduce closed item sets into frequent item set mining from horizontally-partitioned transaction databases with preserving privacy. Closed item sets were originally from the research area of Formal Concept Analysis, and it is shown that even if results of frequent item set mining are restricted to closed item sets, all frequent item sets can be recovered from the results. This property suggests that using closed item sets would contribute to decreasing the cost of communication among distributed sites where a piece of horizontally-partitioned database is stored. We present a mining procedure revising and amalgamating two previous works: one is for mining closed item sets from horizontally-partitioned databases, and the other is for privacy preserving mining of item sets from such databases. We analyze the procedure on both of the viewpoint of communication cost and that of security. We also show results of some experimental practice of applying the procedure to a well-known dataset.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134565816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Girish Keshav Palshikar, H. Vin, Mohammed Mudassar, M. Natu
{"title":"Domain-Driven Data Mining for IT Infrastructure Support","authors":"Girish Keshav Palshikar, H. Vin, Mohammed Mudassar, M. Natu","doi":"10.1109/ICDMW.2010.132","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.132","url":null,"abstract":"Support analytics (i.e., statistical analysis, modeling and mining of customer/operations support tickets data) is important in service industries. In this paper, we adopt a domain-driven data mining approach to support analytics with a focus on IT infrastructure Support (ITIS) services. We identify specific business questions and then propose algorithms for answering them. The questions are: (1) How to reduce the overall workload? (2) How to improve efforts spent in ticket processing? (3) How to improve compliance to service level agreements? We propose novel formalizations of these notions and propose rigorous statistics-based algorithms for these questions. The approach is domain-driven in the sense that the results produced are directly usable by and easy to understand for end-users having no expertise in data-mining, do not require any experimentation and often discover novel and non-obvious answers. All this helps in better acceptance among end-users and more active use of the results produced. The algorithms have been implemented and have produced satisfactory results on more than 25 real-life ITIS datasets, one of which we use for illustration.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131764889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying Similar Neighborhood Structures in Private Social Networks","authors":"L. Singh, Clare Schramm","doi":"10.1109/ICDMW.2010.165","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.165","url":null,"abstract":"Many social networks being analyzed today are generated from sources with privacy concerns. A number of network centrality measures have been introduced to better quantify various social dynamics of interest to social scientists. In this paper, we propose an approximation of a social network that allows for certain centrality measures to be calculated while hiding information about the full network. Our approximation is not a perturbed graph, but rather a generalize trie structure containing a network hop expansion set for each node in the graph. We show that a network with certain topological structures, naturally hides nodes and increases the number of candidate nodes in each equivalence class. The storage of our graph approximation naturally clusters nodes of the network with similar graph expansion structure and therefore, can also be used as the basis for identifying ’like’ nodes in terms of similar structural position in the network. For branches of the trie that are not private enough, we introduce heuristics that locally merges segments of the trie to enforce k-node anonymity.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132701896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"XML Documents Clustering Using Tensor Space Model -- A Preliminary Study","authors":"Sangeetha Kutty, R. Nayak, Yuefeng Li","doi":"10.1109/ICDMW.2010.106","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.106","url":null,"abstract":"A hierarchical structure is used to represent the content of the semi-structured documents such as XML and XHTML. The traditional Vector Space Model (VSM) is not sufficient to represent both the structure and the content of such web documents. Hence in this paper, we introduce a novel method of representing the XML documents in Tensor Space Model (TSM) and then utilize it for clustering. Empirical analysis shows that the proposed method is scalable for a real-life dataset as well as the factorized matrices produced from the proposed method helps to improve the quality of clusters due to the enriched document representation with both the structure and the content information.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133196152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of Collaborative Writing Processes Using Hidden Markov Models and Semantic Heuristics","authors":"Vilaythong Southavilay, K. Yacef, R. Calvo","doi":"10.1109/ICDMW.2010.118","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.118","url":null,"abstract":"In this paper we are interested in discovering collaborative writing patterns in student data collected from a system we designed to support student collaborative writing, and which has been used by over 1,000 students in the past year. A particular functionality that we are investigating is the extraction and display to learners and teachers of the process followed during the course of the writing. We used a heuristic to derive semantic interpretation of specific sequences of raw data and Markov models (MM) to derive the processes. We propose two models, a Heuristic MM and a Hidden MM for analysing student’s writing behavior. We also refined the semantic preprocessing by adding the notion of pauses between activities. We illustrate our approach and compare these models using real data from two groups of high and low performance level and highlight the different information they each provide.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"361 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115919475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining Research Topics Evolving Over Time Using a Diachronic Multi-source Approach","authors":"Jean-Charles Lamirel, Ghada Safi, Navesh Priyankar, Pascal Cuxac","doi":"10.1109/ICDMW.2010.198","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.198","url":null,"abstract":"The acquisition of new scientific knowledge and the evolution of the needs of the society regularly call into question the orientations of research. Means to recall and visualize these evolutions are thus necessary. The existing tools for research survey give only one fixed vision of the research activity, which does not allow performing tasks of dynamic topic mining. The objective of this paper is thus to propose a new incremental approach in order to follow the evolution of research themes and research groups for a scientific discipline given in terms of emergence or decline. These behaviors are detectable by various methods of filtering. However, our choice is made on the exploitation of neural clustering methods in a multi-view context. This new approach makes it possible to take into account the incremental and chronological aspect of information by opening the way to the detection of convergences and divergences of research themes and groups.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114891379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gaussian Processes for Dispatching Rule Selection in Production Scheduling: Comparison of Learning Techniques","authors":"B. Scholz-Reiter, Jens Heger, T. Hildebrandt","doi":"10.1109/ICDMW.2010.19","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.19","url":null,"abstract":"Decentralized scheduling with dispatching rules is applied in many fields of logistics and production, especially in semiconductor manufacturing, which is characterized by high complexity and dynamics. Many dispatching rules have been found, which perform well on different scenarios, however no rule has been found, which outperforms other rules across various objectives. To tackle this drawback, approaches, which select dispatching rules depending on the current system conditions, have been proposed. Most of these use learning techniques to switch between rules regarding the current system status. Since the study of Rasmussen [1] has shown that Gaussian processes as a machine learning technique have outperformed other techniques like neural networks under certain conditions, we propose to use them for the selection of dispatching rules in dynamic scenarios. Our analysis has shown that Gaussian processes perform very well in this field of application. Additionally, we showed that the prediction quality Gaussian processes provide could be used successfully.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"55 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116423845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed Classification on Peers with Variable Data Spaces and Distributions","authors":"Quach Vinh Thanh, V. Gopalkrishnan, Hock Hee Ang","doi":"10.1109/ICDMW.2010.125","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.125","url":null,"abstract":"The promise of distributed classification is to improve the classification accuracy of peers on their respective local data, using the knowledge of other peers in the distributed network. Though in reality, data across peers may be drastically different from each other (in the distribution of observations and/or the labels), current explorations implicitly assume that all learning agents receive data from the same distribution. We remove this simplifying assumption by allowing peers to draw from arbitrary data distributions and be based on arbitrary spaces, thus formalizing the general problem of distributed classification. We find that this problem is difficult because it does not admit state-of-the-art solutions in distributed classification. We also discuss the relation between the general problem and transfer learning, and show that transfer learning approaches cannot be trivially fitted to solve the problem. Finally, we present a list of open research problems in this challenging field.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124813549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Shi, Yanan Cai, Philip S. Yu, Zhenyu Yan, Bin Wu
{"title":"A Comparison of Objective Functions in Network Community Detection","authors":"C. Shi, Yanan Cai, Philip S. Yu, Zhenyu Yan, Bin Wu","doi":"10.1109/ICDMW.2010.107","DOIUrl":"https://doi.org/10.1109/ICDMW.2010.107","url":null,"abstract":"Community detection, as an important unsupervised learning problem in social network analysis, has attracted great interests in various research areas. Many objective functions for community detection that can capture the intuition of communities have been introduced from different research fields. Based on the classical single objective optimization framework, this paper compares a variety of these objective functions and explores the characteristics of communities they can identify. Experiments show most objective functions have the resolution limit and their communities structure have many different characteristics.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128529161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}