Ilaria Bordino, P. Boldi, D. Donato, Massimo Santini, S. Vigna
{"title":"Temporal Evolution of the UK Web","authors":"Ilaria Bordino, P. Boldi, D. Donato, Massimo Santini, S. Vigna","doi":"10.1109/ICDMW.2008.88","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.88","url":null,"abstract":"Recently, a new temporal dataset has been made public: it is made of a series of twelve 100 M pages snapshots of the .uk domain. The Web graphs of the twelve snapshots have been merged into a single time-aware graph that provide constant-time access to temporal information. In this paper we present the first statistical analysis performed on this graph, with the goal of checking whether the information contained in the graph is reliable (i.e. whether it depends essentially on appearance and disappearance of pages and links, or on the crawler behaviour). We perform a number of tests that show that the graph is actually reliable, and provide the first public data on the evolution of the Web that use a large scale and a significant diversity in the sites considered.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126297103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rules Extraction from Multiple Decisions Ordered Information Tables","authors":"Bin Shen, Min Yao, Zhaohui Wu","doi":"10.1109/ICDMW.2008.75","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.75","url":null,"abstract":"Ordered information table is one of the most important research areas of granular computing. In this thesis, we introduce multiple decisions ordered information tables based on the concept of ordered information tables. Multiple decisions ordered information tables are used to describe the actual multiple decision attributes situation of reality. We study the process of rule extraction from multiple decisions ordered information tables thoroughly and several concepts about this process are proposed and discussed. At last, an example of multiple decisions ordered information tables is used to illustrate the basic ideas. These ideas and methods are quite useful for KDD, DM and GC.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121628319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kernels for the Investigation of Localized Spatiotemporal Transitions of Drought with Support Vector Machines","authors":"Matthew W. Collier, A. McGovern","doi":"10.1109/ICDMW.2008.71","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.71","url":null,"abstract":"We present and discuss several spatiotemporal kernels designed to mine real-life and simulated data in support of drought prediction. We implement and empirically validate these kernels for support vector machines. Issues related to the nature of geographic data such as autocorrelation and directionality are investigated.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127569961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Sequential Pattern Mining Algorithm Based on the 2-Sequence Matrix","authors":"C. Hsieh, Don-Lin Yang, Jungpin Wu","doi":"10.1109/ICDMW.2008.82","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.82","url":null,"abstract":"Sequential pattern mining has become more and more popular in recent years due to its wide applications and the fact that it can find more information than association rules. Two famous algorithms in sequential pattern mining are AprioriAll and PrefixSpan. These two algorithms not only need to scan a database or projected-databases many times, but also require setting a minimal support threshold to prune infrequent data to obtain useful sequential patterns efficiently. In addition, they must rescan the database if new items or sequences are added. In this paper, we propose a novel algorithm called efficient sequential pattern enumeration (ESPE) to solve the above problems. In addition, our method can be applied in many applications, such as for the itemsets appearing at the same time in a sequence. In our experiments, we show that the performance of ESPE is better than the other two methods using various datasets.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134466231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Kamaliha, Fatemeh Riahi, Vahed Qazvinian, Jafar Adibi
{"title":"Characterizing Network Motifs to Identify Spam Comments","authors":"E. Kamaliha, Fatemeh Riahi, Vahed Qazvinian, Jafar Adibi","doi":"10.1109/ICDMW.2008.72","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.72","url":null,"abstract":"Personal blogs are one of the most interconnected and socially networked type of social media. The capability of placing \"comments'' on blog posts makes the blogosphere rather a complex environment.In this paper, we study the behavior of bloggers who place comments on others' posts and examine if it is possible to detect spam comments.We look at the functionality of different network motif profiles in the comment network, and identify certain subgraphs that associate with spam comments. We illustrate that some of these patterns and their statistical features could be exploited to classify comments and bloggers to spammers and non-spammers. Our preliminary results are encouraging and show reasonable results on rich and dense blog networks.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130036764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Word Sense Discovery for Web Information Retrieval","authors":"Tomasz Nykiel, H. Rybinski","doi":"10.1109/ICDMW.2008.10","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.10","url":null,"abstract":"Word meaning disambiguation has always been an important problem in many computer science tasks, such as information retrieval and extraction. One of the problems,faced in automatic word sense discovery, is the number of different senses a word can have. Often, senses are dominated by some other, more frequent ones. Discovering such dominated meanings can significantly improve quality of many text-related algorithms. In particular, Web search quality can be leveraged. In the paper, we present a novel approach for discovering word senses. The method is based on concise representations of frequent patterns. The method attempts to discover not only word senses that are dominating, but also senses that are dominated and under represented in the repository.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133002549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Search Algorithm for Content-Based Image Retrieval with User Feedback","authors":"A. Leung, P. Auer","doi":"10.1109/ICDMW.2008.90","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.90","url":null,"abstract":"We propose a probabilistic model for the relevance feedback of users looking for target images. This model takes into account user errors and user uncertainty about distinguishing similarly relevant images. Based on this model, we have developed an algorithm, which selects images to be presented to the user for further relevance feedback until a satisfactory image is found. In each query session, the algorithm maintains weights on the images in the database which reflect the assumed relevance of the images. Relevance feedback is used to modify these weights. As a second ingredient, the algorithm uses a minimax principle to select images for presentation to the user: any response of the user will provide significant information about his query, such that relatively few feedback rounds are sufficient to find a satisfactory image. We have implemented this algorithm and have conducted experiments on both simulated data and real data which show promising results.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130476358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Case Study on Classification Reliability","authors":"H. Dai","doi":"10.1109/ICDMW.2008.97","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.97","url":null,"abstract":"The reliability of an induced classifier can be affected by several factors including the data oriented factors and the algorithm oriented factors. In some cases, the reliability could also be affected by knowledge oriented factors. In this paper, we analyze three special cases to examine the reliability of the discovered knowledge. Our case study results show that (1) in the cases of mining from low quality data, rough classification approach is more reliable than exact approach which in general tolerate to low quality data; (2) Without sufficient large size of the data, the reliability of the discovered knowledge will be decreased accordingly; (3) The reliability of point learning approach could easily be misled by noisy data. It will in most cases generate an unreliable interval and thus affect the reliability of the discovered knowledge. It is also reveals that the inexact field is a good learning strategy that could model the potentials and to improve the discovery reliability.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115656501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Ohsawa, Y. Maeno, Akihiro Takaichi, Yoko Nishihara
{"title":"Innovation Game as Workplace for Sensing Values in Design and Market","authors":"Y. Ohsawa, Y. Maeno, Akihiro Takaichi, Yoko Nishihara","doi":"10.1109/ICDMW.2008.46","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.46","url":null,"abstract":"The \"value\" in this paper can be dealt with as a new variable which business workers create from their interaction with the dynamic environment, on which they redesign products and the market sustainably. Here we first show how data mining and data visualization can provide useful tools for aiding marketerspsila/designerspsila sensitivity of emerging values of consumers/users. By visualizing the data, human can find the relations between existing entities, and create new combination of products via the found relations. Then Innovation Game is introduced as an environment for the communication to elevate userspsila ability to combine existing values of products to create newly valuable products. The players called innovators present combinatorial ideas from prepared basic ideas, and sell the ideas to each other and their stocks to players called investors. As a result, latent opportunities of business are revealed for the market of ideas and designs.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121283242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Post-Processing of Discovered Association Rules Using Ontologies","authors":"Claudia Marinica, F. Guillet, H. Briand","doi":"10.1109/ICDMW.2008.87","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.87","url":null,"abstract":"In Data Mining, the usefulness of association rules is strongly limited by the huge amount of delivered rules. In this paper we propose a new approach to prune and filter discovered rules. Using Domain Ontologies, we strengthen the integration of user knowledge in the post-processing task. Furthermore, an interactive and iterative framework is designed to assist the user along the analyzing task. On the one hand, we represent user domain knowledge using a Domain Ontology over database. On the other hand, a novel technique is suggested to prune and to filter discovered rules. The proposed framework was applied successfully over the client database provided by Nantes Habitat.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131773820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}