{"title":"A Bootstrap Method for Automatic Rule Acquisition on Emotion Cause Extraction","authors":"Shuntaro Yada, K. Ikeda, K. Hoashi, K. Kageura","doi":"10.1109/ICDMW.2017.60","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.60","url":null,"abstract":"Emotion cause extraction is one of the promising research topics in sentiment analysis, but has not been well-investigated so far. This task enables us to obtain useful information for sentiment classification and possibly to gain further insights about human emotion as well. This paper proposes a bootstrapping technique to automatically acquire conjunctive phrases as textual cue patterns for emotion cause extraction. The proposed method first gathers emotion causes via manually given cue phrases. It then acquires new conjunctive phrases from emotion phrases that contain similar emotion causes to previously gathered ones. In existing studies, the cost for creating comprehensive cue phrase rules for building emotion cause corpora was high because of their dependencies both on languages and on textual natures. The contribution of our method is its ability to automatically create the corpora from just a few cue phrases as seeds. Our method can expand cue phrases at low cost and acquire a large number of emotion causes of the promising quality compared to human annotations.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"250 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122517636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. J. Oentaryo, Xavier Jayaraj Siddarth Ashok, Ee-Peng Lim, Philips Kokoh Prasetyo
{"title":"On Analyzing Job Hop Behavior and Talent Flow Networks","authors":"R. J. Oentaryo, Xavier Jayaraj Siddarth Ashok, Ee-Peng Lim, Philips Kokoh Prasetyo","doi":"10.1109/ICDMW.2017.172","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.172","url":null,"abstract":"Analyzing job hopping behavior is important for the understanding of job preference and career progression of working individuals. When analyzed at the workforce population level, job hop analysis helps to gain insights of talent flow and organization competition. Traditionally, surveys are conducted on job seekers and employers to study job behavior. While surveys are good at getting direct user input to specially designed questions, they are often not scalable and timely enough to cope with fast-changing job landscape. In this paper, we present a data science approach to analyze job hops performed by about 490,000 working professionals located in a city using their publicly shared profiles. We develop several metrics to measure how much work experience is needed to take up a job and how recent/established the job is, and then examine how these metrics correlate with the propensity of hopping. We also study how job hop behavior is related to job promotion/demotion. Finally, we perform network analyses at the job and organization levels in order to derive insights on talent flow as well as job and organizational competitiveness.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122891398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Detection of Anomalous Heterogeneous Graphs with Streaming Edges","authors":"L. Akoglu","doi":"10.1109/ICDMW.2017.133","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.133","url":null,"abstract":"Given a stream of heterogeneous edges, comprising different types of nodes and edges, which arrive in an interleaved fashion to multiple different graphs evolving simultaneously, how can we spot the anomalous graphs in real-time using only constant memory? This problem is motivated by and generalizes from its application in security to host-level advanced persistent threat (APT) detection. In this talk, I will introduce STREAMSPOT, a clustering based anomaly detection approach for streaming heterogeneous graphs that addresses challenges in two key fronts: (1) heterogeneity, and (2) streaming nature. Specifically, we introduce a new similarity function for heterogeneous graphs that compares two graphs based on their relative frequency of local substructures, represented as short strings. This function lends itself to a vector representation of each graph, which is (a) fast to compute, and (b) amenable to a sketched version with bounded size that preserves the aforementioned similarity. STREAMSPOT exhibits desirable properties that a streaming application requires–it is (i) fully-streaming; processing the stream one edge at a time as it arrives, (ii) memory-efficient; requiring constant space for the sketches and the clustering, (iii) fast; taking constant time to update the graph sketches and the cluster summaries that can process over 100K edges per second, and (iv) online; scoring and flagging anomalies in real time. Experiments on datasets containing simulated system-call flow graphs from normal browser activity and various attack scenarios (ground truth) show that STREAMSPOT is high-performance; achieving above 95% detection accuracy with small delay, and competitive response time and memory usage.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121902706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Rossi, E. Perri, A. Trecroci, Marco Savino, G. Alberti, M. Iaia
{"title":"GPS Data Reflect Players’ Internal Load in Soccer","authors":"A. Rossi, E. Perri, A. Trecroci, Marco Savino, G. Alberti, M. Iaia","doi":"10.1109/ICDMW.2017.122","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.122","url":null,"abstract":"The use of RPE as a measure of Internal load has become a common methodology used in team sports owing to its low cost. The aim of this study was to build a machine learning process able to describe the players' RPE by the external load extracted from the GPS. In this paper, we propose a multidimensional approach to assess the RPE in professional soccer which is based on GPS measurements and machine learning. By using GPS tracking technology, we collect data describing the training workload of players in a professional soccer club during a season. We show that our Ordinal predictor is both accurate and precise in medium RPE value (i.e., between 4 and 7) but it is not consistent in etreme value (i.e., below 4 and above 7). Our approach is a preliminary study that suggest that it is possible to predict players' RPE from GPS training and match data. However, these are not the only information needed to understand the players' effort perceived after a trainings or matches.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122068535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Familiarity and Strangeness of Objects: A MoDAT Requirement for Shikake Design","authors":"N. Matsumura","doi":"10.1109/ICDMW.2017.84","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.84","url":null,"abstract":"In this paper, we will first explain the FS (familiarity and strangeness) model as a requirement for attracting people's attention and bringing about analogical thinking. After introducing the idea of shikake (triggers for behavior change) and its requirements, we propose the inclusion of the FS model as an attribute of MoDAT in order to encourage MoDAT participants to come up with new shikake ideas.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128287444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Adaptive Modeling Framework for Bivariate Data Streams with Applications to Change Detection in Cyber-Physical Systems","authors":"Joshua Plasse, J. Noble, Kary L. Myers","doi":"10.1109/ICDMW.2017.151","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.151","url":null,"abstract":"Cyber-physical systems - systems that incorporate physical devices with cyber components - are appearing in diverse applications, and due to advances in data acquisition, are accompanied with large amounts of data. The interplay between the cyber and the physical components leaves such systems vulnerable to faults and intrusions, motivating the development of a general model that can efficiently and continuously monitor a cyber-physical system. To be of practical value, the model should be adaptive and equipped with the ability to detect changes in the system. This paper makes three contributions: (1) a new adaptive modeling framework for monitoring an arbitrary cyber-physical system in real-time using a flexible statistical distribution called the normal-gamma; (2) a novel streaming validation procedure, demonstrated on data streams from a cyber-physical system at Los Alamos National Laboratory, to justify the use of the normal-gamma and our new adaptive modeling approach; and (3) a new online change detection algorithm demonstrated on synthetic normal-gamma data streams.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128616435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sentiment Extraction from Consumer-Generated Noisy Short Texts","authors":"Hardik Meisheri, Kunal Ranjan, Lipika Dey","doi":"10.1109/ICDMW.2017.58","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.58","url":null,"abstract":"Sentiment analysis or recognizing emotions from short and noisy text from social networks such as twitter has been a challenging task. Most of the existing models use word level embeddings for the final classification of the sentiments. This paper proposes a novel representation of short text derived from a combination of word embeddings and character embeddings using Bidirectional LSTM (BiLSTM). Along with this, we use attention mechanism that learns to focus on sentiment specific words. Robust representation of short text can be applied for sentiment classification as well as predicting intensity of sentiments. This paper presents evaluation of proposed model on classification as well as regression dataset. Results show significant improvement in results as compared to baselines of respective datasets.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124663424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hussein T. Al-Natsheh, Lucie Martinet, Fabrice Muhlenbach, Fabien Rico, D. Zighed
{"title":"Semantic Search-by-Examples for Scientific Topic Corpus Expansion in Digital Libraries","authors":"Hussein T. Al-Natsheh, Lucie Martinet, Fabrice Muhlenbach, Fabien Rico, D. Zighed","doi":"10.1109/ICDMW.2017.103","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.103","url":null,"abstract":"In this article we address the problem of expanding the set of papers that researchers encounter when conducting bibliographic research on their scientific work. Using classical search engines or recommender systems in digital libraries, some interesting and relevant articles could be missed if they do not contain the same search key-phrases that the researcher is aware of. We propose a novel model that is based on a supervised active learning over a semantic features transformation of all articles of a given digital library. Our model, named Semantic Search-by-Examples (SSbE), shows better evaluation results over a similar purpose existing method, More-Like-This query, based on the feedback annotation of two domain experts in our experimented use-case. We also introduce a new semantic relatedness evaluation measure to avoid the need of human feedback annotation after the active learning process. The results also show higher diversity and overlapping with related scientific topics which we think can better foster transdisciplinary research.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126301704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Camino, R. State, Leandro Montero, Petko Valtchev
{"title":"Finding Suspicious Activities in Financial Transactions and Distributed Ledgers","authors":"R. Camino, R. State, Leandro Montero, Petko Valtchev","doi":"10.1109/ICDMW.2017.109","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.109","url":null,"abstract":"Banks and financial institutions around the world must comply with several policies for the prevention of money laundering and in order to combat the financing of terrorism. Nowadays, there is a raise in the popularity of novel financial technologies such as digital currencies, social trading platforms and distributed ledger payments, but there is a lack of approaches to enforce the aforementioned regulations accordingly. Software tools are developed to detect suspicious transactions usually based on knowledge from experts in the domain, but as new criminal tactics emerge, detection mechanisms must be updated. Suspicious activity examples are scarce or nonexistent, hindering the use of supervised machine learning methods. In this paper, we describe a methodology for analyzing financial information without the use of ground truth. A user suspicion ranking is generated in order to facilitate human expert validation using an ensemble of anomaly detection algorithms. We apply our procedure over two case studies: one related to bank fund movements from a private company and the other concerning Ripple network transactions. We illustrate how both examples share interesting similarities and that the resulting user ranking leads to suspicious findings, showing that anomaly detection is a must in both traditional and modern payment systems.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133857406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Phan, T. Idé, J. Kalagnanam, M. Menickelly, K. Scheinberg
{"title":"A Novel l0-Constrained Gaussian Graphical Model for Anomaly Localization","authors":"D. Phan, T. Idé, J. Kalagnanam, M. Menickelly, K. Scheinberg","doi":"10.1109/ICDMW.2017.115","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.115","url":null,"abstract":"We consider the problem of anomaly localization in a sensor network for multivariate time-series data by computing anomaly scores for each variable separately. To estimate the sparse Gaussian graphical models (GGMs) learned from different sliding windows of the dataset, we propose a new model wherein we constrain sparsity directly through L0 constraint and apply an additional L2 regularization in the objective. We then introduce a proximal gradient algorithm to efficiently solve this difficult nonconvex problem. Numerical evidence is provided to show the benefits of using our model and method over the usual convex relaxations for learning sparse GGMs using a real dataset.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114428820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}