S. Basta, Fabio Fassetti, M. Guarascio, G. Manco, F. Giannotti, D. Pedreschi, L. Spinsanti, Gianfilippo Papi, S. Pisani
{"title":"High Quality True-Positive Prediction for Fiscal Fraud Detection","authors":"S. Basta, Fabio Fassetti, M. Guarascio, G. Manco, F. Giannotti, D. Pedreschi, L. Spinsanti, Gianfilippo Papi, S. Pisani","doi":"10.1109/ICDMW.2009.59","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.59","url":null,"abstract":"In this paper we describe an experience resulting from the collaboration among Data Mining researchers, domain experts of the Italian Revenue Agency, and IT professionals, aimed at detecting fraudulent VAT credit claims. The outcome is an auditing methodology based on a rule-based system, which is capable of trading among conflicting issues, such as maximizing audit benefits, minimizing false positive audit predictions, or deterring probable upcoming frauds. We describe the methodology in detail, and illustrate its practical effectiveness compared to classical predictive systems from the literature.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132403573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accurate Discovery of Valid Convoys from Moving Object Trajectories","authors":"Hyunjin Yoon, C. Shahabi","doi":"10.1109/ICDMW.2009.71","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.71","url":null,"abstract":"Given a set of moving object trajectories, it is of interest to find a group of objects, called a convoy, that are spatially density-connected for a certain duration of time. However, existing convoy discovery algorithms have a critical problem of accuracy; they tend to both miss larger convoys and retrieve invalid ones where the density-connectivity among the objects is not completely satisfied. We propose a new valid convoy discovery algorithm, called VCoDA, for the accurate discovery of valid convoys from moving object trajectories. Specifically, VCoDA first retrieves all partially connected convoys while guaranteeing no false dismissal of any valid convoys and then validates their density-connectivity to eventually obtain a complete set of valid convoys. Our extensive experiments on three real-world datasets demonstrate the effectiveness of our technique; VCoDA improves the precision by a factor of 3 on average and the recall by up to 2 orders of magnitude as compared to an existing method.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"45 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132803977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. F. Ibrahim, Weam M. Minshawi, Isra Yosef Ekkab, Nehal Mahmoud Al-Jurf, Afnan Salem Babrahim, Samar Faisl Al-Halees
{"title":"Enhancing the DBSCAN and Agglomerative Clustering Algorithms to Solve Network Planning Problem","authors":"L. F. Ibrahim, Weam M. Minshawi, Isra Yosef Ekkab, Nehal Mahmoud Al-Jurf, Afnan Salem Babrahim, Samar Faisl Al-Halees","doi":"10.1109/ICDMW.2009.98","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.98","url":null,"abstract":"With existing telephone networks nearing saturation and demand for wire and wireless services continuing to grow, telecommunication engineers are looking at technologies that will deliver sites and can satisfy the required demand and grade of service constraints while achieving minimum possible costs. The city data is given as a map of streets, intersection nodes coordinates, distribution of the subscribers’ loads within the city and the location of base station in mobile network in this city. The available cable sizes, the cost per unit for each size and the maximum distance of wire that satisfied the allowed grade of service. NetPlan (Network Planning package) is developed in the spirit of DBSCAN and Agglomerative clustering algorithms. In this paper we studied the problem of congestion in Multi Service Access Node (MSAN) due to the increasing the number of subscribers which cause degradation in grade of service and in some time impossible to add new subscribers. The NetPlan algorithm is introduced to solve this problem. This algorithm is Density-based clustering algorithm using physical shortest paths available routes and the subscriber loads. In other hand decreasing the cost also is our deal in this paper so in the second phase in clustering process we modify the agglomerative algorithm that merge the neighboring cluster which satisfying certain condition. Experimental results and analysis indicate that the combination to algorithms was effective, leads to minimum costs for network construction and make the best grade of service.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133286202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toolkit-Based High-Performance Data Mining of Large Data on MapReduce Clusters","authors":"D. Wegener, M. Mock, Deyaa Adranale, S. Wrobel","doi":"10.1109/ICDMW.2009.34","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.34","url":null,"abstract":"The enormous growth of data in a variety of applications has increased the need for high performance data mining based on distributed environments. However, standard data mining toolkits per se do not allow the usage of computing clusters. The success of MapReduce for analyzing large data has raised a general interest in applying this model to other, data intensive applications. Unfortunately current research has not lead to an integration of GUI based data mining toolkits with distributed file system based MapReduce systems. This paper defines novel principles for modeling and design of the user interface, the storage model and the computational model necessary for the integration of such systems. Additionally, it introduces a novel system architecture for interactive GUI based data mining of large data on clusters based on MapReduce that overcomes the limitations of data mining toolkits. As an empirical demonstration we show an implementation based on Weka and Hadoop.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128478150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Discovery of Closed Hyperclique Patterns in Multidimensional Structured Databases","authors":"Tomonobu Ozaki, T. Ohkawa","doi":"10.1109/ICDMW.2009.10","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.10","url":null,"abstract":"Structured data is becoming increasingly abundant in many application domains recently. Furthermore, more complex but valuable databases will be obtained by combining plural structured databases. In this paper, we focus on \"Multidimensional Structured Databases'' as one of the typical examples of such complex databases, and propose a new data mining problem of finding closed hyperclique patterns, i.e., closed sets of correlated patterns, in them. To solve this problem efficiently, an algorithm named CHPMS is proposed which effectively utilizes the generality ordering and the properties of correlation and closedness. The effectiveness of the proposed algorithm is confirmed through the experiments with real world datasets.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115472504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Determining the Training Window for Small Sample Size Classification with Concept Drift","authors":"I. Žliobaitė, L. Kuncheva","doi":"10.1109/ICDMW.2009.20","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.20","url":null,"abstract":"We consider classification of sequential data in the presence of frequent and abrupt concept changes. The current practice is to use the data after the change to train a new classifier. However, if the window with the new data is too small, the classifier will be undertrained and hence less accurate that the \"old'' classifier. Here we propose a method (called WR*) for resizing the training window after detecting a concept change. Experiments with synthetic and real data demonstrate the advantages of WR* over other window resizing methods.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125354806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aditya Prakash, Mukund Seshadri, Ashwin Sridharan, S. Machiraju, C. Faloutsos
{"title":"EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs","authors":"Aditya Prakash, Mukund Seshadri, Ashwin Sridharan, S. Machiraju, C. Faloutsos","doi":"10.1007/978-3-642-13672-6_42","DOIUrl":"https://doi.org/10.1007/978-3-642-13672-6_42","url":null,"abstract":"","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125748120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-granularity Visualization of Trajectory Clusters Using Sub-trajectory Clustering","authors":"Cheng Chang, Baoyao Zhou","doi":"10.1109/ICDMW.2009.24","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.24","url":null,"abstract":"With the surging of the requirements of location-based services, mining various interesting patterns from the spatial data becomes more and more important. In this paper, we propose an approach for visualizing the trajectory clustering results based on sub-trajectory clusters discovered from large-scale trajectory data. At first, we segment each trajectory into a set of sub-trajectories by detecting its corner points. And then, we choose Fréchet distance to compute the similarity between sub-trajectories, and use a density-based clustering method to cluster sub-trajectories and get an augmented order of the sub-trajectories. The visualization method can support multi-granularity views of the generated sub-trajectory clusters. Experiments have demonstrated the applicability and benefits of the proposed approach.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125179877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Timothy A. Supinie, A. McGovern, John K. Williams, Jennifer Abernethy
{"title":"Spatiotemporal Relational Random Forests","authors":"Timothy A. Supinie, A. McGovern, John K. Williams, Jennifer Abernethy","doi":"10.1109/ICDMW.2009.89","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.89","url":null,"abstract":"We introduce and validate Spatiotemporal Relational Random Forests, which are random forests created with spatiotemporal relational probability trees. We build on the documented success of random forests by bringing spatiotemporal capabilities to the trees, enabling them to identify critical spatial, temporal, and spatiotemporal features in the data. We validate our results on simulated data and real-world convectively-induced turbulence data from a commercial airline flying in the continental United States.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124468621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Set-Based Boosting for Instance-Level Transfer","authors":"Eric Eaton, Marie desJardins","doi":"10.1109/ICDMW.2009.97","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.97","url":null,"abstract":"The success of transfer to improve learning on a target task is highly dependent on the selected source data. Instance-based transfer methods reuse data from the source tasks to augment the training data for the target task. If poorly chosen, this source data may inhibit learning, resulting in negative transfer. The current best performing algorithm for instance-based transfer, TrAdaBoost, performs poorly when given irrelevant source data. We present a novel set-based boosting technique for instance-based transfer. The proposed algorithm, TransferBoost, boosts both individual instances and collective sets of instances from each source task. In effect, TransferBoost boosts each source task, assigning higher weight to those source tasks which show positive transferability to the target task, and then adjusts the weights of the instances within each source task via AdaBoost. The results demonstrate that TransferBoost significantly improves transfer performance over existing instance-based algorithms when given a mix of relevant and irrelevant source data.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124487789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}