Lianhua Chi, Saket K. Sathe, Bong-Koo Han, Yun Wang
{"title":"A Novel Method for Assessing Event Impacts on Event-Driven Time Series","authors":"Lianhua Chi, Saket K. Sathe, Bong-Koo Han, Yun Wang","doi":"10.1109/ICDMW.2016.0080","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0080","url":null,"abstract":"Many real-world applications, such as service execution, data centre monitoring, remote sensing, traffic control, customer behaviour, have to deal with the time series which include the values occurring at random time points driven by events. These kinds of time series are sometimes also referred to as event-driven time series. Although estimating the correlation between two time series has been well studied, the correlation between events and time series has been understudied. This paper introduces a novel method for assessing event impacts on event-driven time series. In this paper, we estimate the actual event impact time on a time series using a novel and generic algorithm SPEAK. Furthermore, we propose a novel metric Ascore to qualitatively and quantitatively measure the event impact. Our experiments on real-world datasets suggest the combination of Ascore and SPEAK achieved much more accurate results compared to benchmarks.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129826745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
André Petermann, Martin Junghanns, Stephan Kemper, Kevin Gómez, Niklas Teichmann, E. Rahm
{"title":"Graph Mining for Complex Data Analytics","authors":"André Petermann, Martin Junghanns, Stephan Kemper, Kevin Gómez, Niklas Teichmann, E. Rahm","doi":"10.1109/ICDMW.2016.0193","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0193","url":null,"abstract":"Complex data analytics that involve data mining often comprise not only a single algorithm but also further data processing steps, for example, to restrict the search space or to filter the result. We demonstrate graph mining with Gradoop, the first scalable system supporting declarative analytical programs composed from multiple graph operations. We use a business intelligence example including frequent subgraph mining to highlight the analytical capabilities enabled by such programs. The results can be visualized and, to show its ease of use, the program can be modified on visitors request. Gradoop is built on top of state-of-the-art big data technology and out-of-the-box horizontally scalable. Its source code is publicly available and designed for easy extensibility. We offer to the graph mining community, to apply Gradoop in large scale use cases and to contribute further algorithms.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128560122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Schelter, F. Biessmann, Malisa Zobel, Nedelina Teneva
{"title":"Structural Patterns in the Rise of Germany’s New Right on Facebook","authors":"Sebastian Schelter, F. Biessmann, Malisa Zobel, Nedelina Teneva","doi":"10.1109/ICDMW.2016.0069","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0069","url":null,"abstract":"In the last years a new right-wing, populist and eurosceptic party emerged in Germany, the 'Alternative für Deutschland'. Topics that were used by the party to draw attention to their program included the Euro-crisis and the so-called 'refugee crisis'. We investigate some aspects of social media use of the AfD. Our goal is to relate the rise of this party to some quantitative measures of their social media usage. A particular focus will be placed on users that interact with AfD content as well as with content of other parties. Our analysis is based on more than 11 million interactions of more than one million users with the public Facebook pages of the major German political parties during the years 2014 and 2015. Investigating the time courses of social media activity and user interaction, we find that the rise of the AfD can be associated with an amount of social media coverage and user engagement that is unprecedented in the German political landscape. One main effect of this campaign is a substantial increase of user interactions with other parties from the right spectrum, suggesting a migration of voters from established right wing parties. In order to better interpret the dynamics of social media activity, we relate the analysed metrics to the textual content of the posts and classical survey data. These results suggest that the intense use of social media platforms poses a major success factor of this newly emerging right-wing party.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124556409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Data-Driven Football Player Assessment","authors":"R. Stanojevic, L. Gyarmati","doi":"10.1109/ICDMW.2016.0031","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0031","url":null,"abstract":"Understanding the value of a football player is a challenging problem. Player valuation is not only critical for scouting, bidding and negotiation processes but also attracts a large media and fan interest. Due to the complexities which arise from the fact that player pool is distributed over hundreds of different leagues and many different playing positions, many clubs hire domain experts (often retired professional players) in order to evaluate the value of potential players. We argue that such human-based scouting has several drawbacks including high cost, inability to scale to thousands of active players and inevitable subjective biases. In this paper we present a methodology for data-driven player market value estimation which tackles these drawbacks. To examine the quality of the proposed methodology and demonstrate that our data-driven valuation outperforms widely used transfermarkt.com market value estimates in predicting the team performance.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124583976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Exchange Platform to Fight Insurance Fraud on Blockchain","authors":"Indrani Nath","doi":"10.1109/ICDMW.2016.0121","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0121","url":null,"abstract":"The Paper introduces the concept of Blockchain and its application in sharing fraud intelligence data in Insurance marketplace.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"365 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124588113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Caroline Fenlon, L. O’Grady, M. Doherty, S. Butler, L. Shalloo, J. Dunnion
{"title":"Regression Techniques for Modelling Conception in Seasonally Calving Dairy Cows","authors":"Caroline Fenlon, L. O’Grady, M. Doherty, S. Butler, L. Shalloo, J. Dunnion","doi":"10.1109/ICDMW.2016.0172","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0172","url":null,"abstract":"Reproductive performance is important for the economic efficiency of pasture-based dairy farms. In these seasonal calving systems, a concise period of breeding is essential to ensure the alignment of peak grass availability with peak lactating cow energy demands. Trials and statistical analysis have identified the factors affecting overall reproductive performance, but few studies have analysed performance at the individual service level. In this paper, four binary models of service outcome are described, incorporating age, stage of lactation, calving events, and measures of energy balance and milk production. Random effects at the cow, sire and herd level were included. Logistic regression and generalised additive models were created, both as stand-alone predictors and using ensemble learning in the form of bagging. The four models were evaluated in terms of calibration and discrimination using an external dataset of nine dairy herds representing the typical Irish pasture-based system. Logistic regression (with and without bagging) and generalised additive modelling with bagging all performed satisfactorily and would be useful as stand-alone models or in whole-farm simulation. Logistic regression is suggested as the most useful model for farmers and their advisers due to ease of interpretation. This model will be used as part of a PhD project to create simulation software for seasonally calving dairy animals.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121246241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SemStim: Exploiting Knowledge Graphs for Cross-Domain Recommendation","authors":"B. Heitmann, Conor Hayes","doi":"10.1109/ICDMW.2016.0145","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0145","url":null,"abstract":"In this paper we introduce SemStim, an unsupervised graph-based algorithm that addresses the cross-domain recommendation task. In this task, preferences from one conceptual domain (e.g. movies) are used to recommend items belonging to another domain (e.g. music). SemStim exploits the semantic links found in a knowledge graph (e.g. DBpedia), to connect domains and thus generate recommendations. As a key benefit, our algorithm does not require (1) ratings in the target domain, thus mitigating the cold-start problem and (2) overlap between users or items from the source and target domains. In contrast, current state-of-the-art personalisation approaches either have an inherent limitation to one domain or require rating data in the source and target domains. We evaluate SemStim by comparing its accuracy to state-of-the-art algorithms for the top-k recommendation task, for both single-domain and cross-domain recommendations. We show that SemStim enables cross-domain recommendation, and that in addition, it has a significantly better accuracy than the baseline algorithms.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116334344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying Warning Behaviors of Violent Lone Offenders in Written Communication","authors":"Lisa Kaati, A. Shrestha, Tony Sardella","doi":"10.1109/ICDMW.2016.0152","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0152","url":null,"abstract":"Violent lone offenders such as school shooters and lone actor terrorists pose a threat to the modern society but since they act alone or with minimal help form others they are very difficult to detect. Previous research has shown that violent lone offenders show signs of certain psychological warning behaviors that can be viewed as indicators of an increasing or accelerating risk of committing targeted violence. In this work, we use a machine learning approach to identify potential violent lone offenders based on their written communication. The aim of this work is to capture psychological warning behaviors in written text and identify texts written by violent lone offenders. We use a set of features that are psychologically meaningful based on the different categories in the text analysis tool Linguistic Inquiry and Word Count (LIWC). Our study only contains a small number of known perpetrators and their written communication but the results are promising and there are many interesting directions for future work in this area.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122459738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification Rule Mining Supported by Ontology for Discrimination Discovery","authors":"B. Luong, S. Ruggieri, F. Turini","doi":"10.1109/ICDMW.2016.0128","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0128","url":null,"abstract":"Discrimination discovery from data consists of designing data mining methods for the actual discovery of discriminatory situations and practices hidden in a large amount of historical decision records. Approaches based on classification rule mining consider items at a flat concept level, with no exploitation of background knowledge on the hierarchical and inter-relational structure of domains. On the other hand, ontologies are a widespread and ever increasing means for expressing such a knowledge. In this paper, we propose a framework for discrimination discovery from ontologies, where contexts of prima-facie evidence of discrimination are summarized in the form of generalized classification rules at different levels of abstraction. Throughout the paper, we adopt a motivating and intriguing case study based on discriminatory tariffs applied by the U. S. Harmonized Tariff Schedules on imported goods.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126291503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selecting the Top-k Discriminative Features Using Principal Component Analysis","authors":"Aminata Kane, Nematollaah Shiri","doi":"10.1109/ICDMW.2016.0096","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0096","url":null,"abstract":"Feature selection is important for dimensionality reduction, analysis, and pattern discovery applications. We consider multivariate time series data and propose an unsupervised learning technique to identify the top-k discriminative features. The proposed technique uses statistics drawn from the Principal Component Analysis (PCA) of the input data to leverage the relative importance of the principal components along with the coefficients within the principal directions of the data to uncover the ranking of the features. We conduct numerous experiments using various benchmark datasets to study the performance of the proposed technique in terms of the discriminant power of the selected features and its ability to minimize the original data reconstruction error. Compared to major existing techniques, our results indicate increased accuracy and efficiency. We also show that our technique yields improved classification accuracy.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126634325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}