{"title":"On Utility of Temporal Embeddings in Skill Matching","authors":"Manisha Verma, Nathan Francis","doi":"10.1109/ICDMW.2017.37","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.37","url":null,"abstract":"Candidates routinely use a set of key phrases or keywords to succinctly describe their expertise or skillset. This is useful for both matching candidate profiles to jobs and for comparing different candidates. Constant development of businesses and labour market has dynamic impact on importance of such skills, where importance of each skill may evolve with time. At any given time, some skills may be more important than others due to seasonality in job markets. While, existing approaches consider lexical or semantic match between candidate profile and each skill, they do not consider the time biased importance of the skill for ranking. Word embeddings have emerged as an effective tool to represent vocabulary in lower dimensional space. In this work, we exploit word embedding models that also encodes time information or seasonality of key phrases. In this work, we explore utility of these time biased skill embeddings in ranking both skills and candidates. Our experiments indicate that incorporation of skill trends improves candidate-skill matching performance.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116888188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting Users’ Search Behavior Using Stochastic Multi-mode Network Models","authors":"Shohei Umehara, K. Eguchi","doi":"10.1109/ICDMW.2017.25","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.25","url":null,"abstract":"Multidimensional relationships can be represented as a multi-mode network or graph, where each vertex or node corresponds to an object, and each edge or link is attributed to one of the multiple types of relationships between a pair of objects. Web search log includes users' search behavior and can also be represented as such a multi-mode network, where each vertex corresponds to a query and each attributed edge corresponds to a relationship between a pair of queries. The relational attributes can be derived from multiple assumptions, for instance, two queries are considered to be related to each other when two different users input those queries and click through from respective search result lists to the sameWeb pages. In order to analyze such complex data, this paper proposes a new multi-mode block model based on latent variable modeling. We evaluate the effectiveness of our multi-mode block model through experiments on the task of predicting queries related to each given query using real search query log.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116818347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guangtao Wang, Jiayu Zhou, Jingjie Ni, Tingjin Luo, Wei Long, Hai Zhen, G. Cong, Jieping Ye
{"title":"Robust Self-Tuning Sparse Subspace Clustering","authors":"Guangtao Wang, Jiayu Zhou, Jingjie Ni, Tingjin Luo, Wei Long, Hai Zhen, G. Cong, Jieping Ye","doi":"10.1109/ICDMW.2017.117","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.117","url":null,"abstract":"Sparse subspace clustering (SSC) is an effective approach to cluster high-dimensional data. However, how to adaptively select the number of clusters/eigenvectors for different data sets, especially when the data are corrupted by noise, is a big challenge in SSC and also an open problem in field of data mining. In this paper, considering the fact that the eigenvectors are robust to noise, we develop a self-adaptive search method to select cluster number for SSC by exploiting the cluster-separation information from eigenvectors. Our method solves the problem by identifying the cluster centers over eigenvectors. We first design a new density based metric, called centrality coefficient gap, to measure such separation information, and estimate the cluster centers by maximizing the gap. After getting the cluster centers, it is straightforward to group the remaining points into respective clusters which contain their nearest neighbors with higher density. This leads to a new clustering algorithm in which the final randomly initialized k-means stage in traditional SSC is eliminated. We theoretically verify the correctness of the proposed method on noise-free data. Extensive experiments on synthetic and real-world data corrupted by noise demonstrate the robustness and effectiveness of the proposed method comparing to the well-established competitors.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124604054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anna Sapienza, Alessandro Bessi, Saranya Damodaran, P. Shakarian, Kristina Lerman, Emilio Ferrara
{"title":"Early Warnings of Cyber Threats in Online Discussions","authors":"Anna Sapienza, Alessandro Bessi, Saranya Damodaran, P. Shakarian, Kristina Lerman, Emilio Ferrara","doi":"10.1109/ICDMW.2017.94","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.94","url":null,"abstract":"We introduce a system for automatically generating warnings of imminent or current cyber-threats. Our system leverages the communication of malicious actors on the darkweb, as well as activity of cyber security experts on social media platforms like Twitter. In a time period between September, 2016 and January, 2017, our method generated 661 alerts of which about 84% were relevant to current or imminent cyber-threats. In the paper, we first illustrate the rationale and workflow of our system, then we measure its performance. Our analysis is enriched by two case studies: the first shows how the method could predict DDoS attacks, and how it would have allowed organizations to prepare for the Mirai attacks that caused widespread disruption in October 2016. Second, we discuss the method's timely identification of various instances of data breaches.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124770308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing False Positives of User-to-Entity First-Access Alerts for User Behavior Analytics","authors":"Baoming Tang, Qiaona Hu, Derek Lin","doi":"10.1109/ICDMW.2017.111","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.111","url":null,"abstract":"Detecting security threats from compromised account or malicious insider by leveraging enterprise traffic logs is the goal of user behavior-based analytics. For its ease of interpretation, a common analytic indicator used in the industry for user behavior analytics is whether a user accesses a network entity, such as a machine or process, for the first time. While this popular indicator does correlate well with the threat activities, it has the potential of generating volumes of false positives. This creates a problem for an analytic system of which the first-time access alerting capability is a part. We believe that the false positive rate from the indicator can be reduced by learning from users' historical entity access patterns and user context information. If the first-time access is expected, then its corresponding alert is suppressed. In this paper, we propose a user-to-entity prediction score which uses a recommender system for learning user data. In particular, we use factorization machines, along with necessary data normalization steps, to make predictions on real-world enterprise logs. We demonstrate this novel method is capable of reducing false positives of users' first-time entity access alerts in user behavior analytics applications.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128331869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shehroz S. Khan, Tong Zhu, B. Ye, Alex Mihailidis, A. Iaboni, Kristine Newman, A. Wang, L. Martin
{"title":"DAAD: A Framework for Detecting Agitation and Aggression in People Living with Dementia Using a Novel Multi-modal Sensor Network","authors":"Shehroz S. Khan, Tong Zhu, B. Ye, Alex Mihailidis, A. Iaboni, Kristine Newman, A. Wang, L. Martin","doi":"10.1109/ICDMW.2017.98","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.98","url":null,"abstract":"With an increase in the population of older adults, the number of cases with dementia also increases. People living with dementia (PLwD) exhibit various behavioral and psychological experiences; agitation and aggression being the most common. Aggressive patients with dementia can harm themselves, other patients and the staff. In the past, researchers have used actigraphy to detect incidences of agitation and aggression in persons with dementia. However, actigraphy based solutions only consider body movement based parameters. In this paper, we present a novel multi-modal sensing framework currently being installed and tested at Toronto Rehabilitation Institute, Canada. This framework uses video cameras, wearable device (for both movement and physiological data), motion and door sensors, and pressure mats to collect various types of data that may be used to Detect and predict incidences of Agitation and Aggression in people with Dementia (DAAD). In this paper, we discuss the data collection, data processing and data fusion aspects using each of the sensors. Using the DAAD sensing platform, we present two pilot studies to demonstrate its effective functioning. We also discuss the challenges experienced with respect to ethics, hardware installation, software issues and data management.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129561590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aggregation and Disaggregation of Information: A Holistic View","authors":"Yuyue Chen, Chuanren Liu","doi":"10.1109/ICDMW.2017.157","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.157","url":null,"abstract":"Data-driven analytics and decision-making have been essential for numerous applications in our society. To transform the data into a source of rich intelligence and support decision-making, data-driven analytics often need to aggregate intelligence from multiple sources and disaggregate signals into significant constituents. Though many existing approaches perform these two tasks respectively, there are few attempts to study them with a holistic view. This dissertation exploits the intrinsic connections between intelligence aggregation and signal disaggregation by developing novel models to capture and leverage various types of non-IIDness and inter-correlations in the data from complex systems. Our preliminary results show that, by identifying non-IIDness of information sources, our approach outperforms alternative methods for intelligence aggregation tasks. Also, by viewing disaggregation as the inverse function of aggregation and incorporating various types of inter-correlations in complex systems, we can also improve the performance for signal disaggregation tasks. Given these promising results, we will further improve the effectiveness and efficiency of our framework on large-scale data from different application fields.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132446814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Word Semantics Based 3-D Convolutional Neural Networks for News Recommendation","authors":"Vaibhav Kumar, Dhruv Khattar, Shashank Gupta, Vasudeva Varma","doi":"10.1109/ICDMW.2017.105","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.105","url":null,"abstract":"Deep neural networks have yielded immense success in speech recognition, computer vision and natural language processing. However, the exploration of deep neural networks for content based recommendation has received a relatively less amount of inspection. Also, different recommendation scenarios have their own issues which creates the need for different approaches for recommendation. One of the problems with news recommendation is that of handling temporal changes in user interests. Hence, modelling temporal behaviour in the domain of news recommendation becomes very important. In this work, we propose a recommendation model which uses semantic similarity between words as input to a 3-D Convolutional Neural Network in order to extract the temporal news reading pattern of the users. This in turn improves the quality of recommendations. We compare our model to a set of established baselines and the experimental results show that our model performs better than the state-of-the-art by 5.8% (Hit Ratio@10).","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132470090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Entity Recommendation Via Integrating Multiple Types of Implicit Feedback in Heterogeneous Information Network","authors":"Xiaotong Suo, Fang Wei, K. Yu","doi":"10.1109/ICDMW.2017.108","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.108","url":null,"abstract":"Recently, heterogeneous information network(HIN) analysis has attracted a lot of attentions. One of the HIN application is recommendation. Due to HIN containing multiple different objects and links and rich semantic meanings, it is promising to generate better recommendation. Previous studies on movie recommendation have combined the single implicit feedback information with heterogeneous information network to create an efficient recommendation. In this paper, we combined multiple types of implicit feedback data with heterogeneous information network to achieve better movie recommendation. We propose the latent features of multiple types implicit feedback matrix along different types of meta path to connect users and movies. We define a recommendation model and use Bayesian ranking optimization techniques to estimate the proposed model. Empirical studies on Douban dataset show that our approach can make better recommendation than previous works.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130466268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating Personality from Social Media Posts","authors":"N. Alsadhan, D. Skillicorn","doi":"10.1109/ICDMW.2017.51","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.51","url":null,"abstract":"An individual's personality determines the probable repertoire of their reactions to a particular situation. A social robot is much more effective if it is able to learn and so take into account the properties of the humans around it, including personalities. We investigate how well personality can be estimated based on modest amounts of speech or writing, which a social robot might (over)hear. Such a technique also permits humans to be able to infer the personalities of other humans 'at a distance' based on their writing in political, hiring, negotiation, and other relationship settings. We design and implement a technique for predicting personality from small amounts of text, with accuracies comparable to inter-human agreement and substantially better than previous algorithmic approaches (except for a few that use much richer data). The technique works for both of the popular personality typologies, the Big Five and the Myers-Briggs. Because the approach does not require a lexicon, it is language independent. We illustrate using eight different languages, including Arabic.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115252217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}