2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)最新文献

筛选
英文 中文
Customer Simulation for Direct Marketing Experiments 直销实验的顾客模拟
Yegor Tkachenko, Mykel J. Kochenderfer, Krzysztof Kluza
{"title":"Customer Simulation for Direct Marketing Experiments","authors":"Yegor Tkachenko, Mykel J. Kochenderfer, Krzysztof Kluza","doi":"10.1109/DSAA.2016.59","DOIUrl":"https://doi.org/10.1109/DSAA.2016.59","url":null,"abstract":"Optimization of control policies for corporate customer relationship management (CRM) systems can boost customer satisfaction, reduce attrition, and increase expected lifetime value of the customer base. However, evaluation of these policies is often complicated. Policies can be evaluated with real-life marketing interactions, but such evaluation can be prohibitively expensive and time consuming. Customer simulators learned from data are an inexpensive alternative suitable for rapid campaign tests. We summarize the literature on the evaluation of direct marketing policies through simulation and propose a decomposition of the problem into distinct tasks: (a) generation of the initial client database snapshot and (b) propagation of clients through time in response to company actions. We present open-source simulators trained and validated on two direct marketing data sets of varying size and complexity.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"271 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131552204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Sparse Linear Discriminant Analysis in Structured Covariates Space 结构化协变量空间中的稀疏线性判别分析
S. Safo, Q. Long
{"title":"Sparse Linear Discriminant Analysis in Structured Covariates Space","authors":"S. Safo, Q. Long","doi":"10.1002/sam.11376","DOIUrl":"https://doi.org/10.1002/sam.11376","url":null,"abstract":"Classification with high dimensional variables is a popular goal in many modern statistical studies. Fisher's linear discriminant analysis (LDA) is a common and effective tool for classifying entities into existing groups. It is well known that classification using Fisher's discriminant for high dimensional data is as bad as random guessing due to the many noise features that increases misclassification rate. Recently, it is being acknowledged that complex biological mechanisms occur through multiple features working together, though individually these features may contribute to noise accumulation in the data. In view of these, it is important to perform classification with discriminant vectors that use a subset of important variables, while also utilizing prior biological relationships among features. We tackle this problem in this article and propose methods that incorporate variable selection into the classification problem, for the identification of important biomarkers. Furthermore, we incorporate into the LDA problem prior information on the relationships among variables using undirected graphs in order to identify functionally meaningful biomarkers. We compare our methods to existing sparse LDA approaches via simulation studies and real data analysis.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121807498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Senpy: A Pragmatic Linked Sentiment Analysis Framework Senpy:一个语用关联情感分析框架
J. F. Sánchez-Rada, C. Iglesias, Ignacio Corcuera, Óscar Araque
{"title":"Senpy: A Pragmatic Linked Sentiment Analysis Framework","authors":"J. F. Sánchez-Rada, C. Iglesias, Ignacio Corcuera, Óscar Araque","doi":"10.1109/DSAA.2016.79","DOIUrl":"https://doi.org/10.1109/DSAA.2016.79","url":null,"abstract":"Sentiment and emotion analysis technologies have quickly gained momentum in industry and academia. This popularity has spawned a myriad of service and tools. Due to the lack of common interfaces and models, each of these services imposes specific interfaces and representation models. Heterogeneity makes it costly to integrate different services, evaluate them or switch between them. This work aims to remedy heterogeneity by providing an extensible framework and an API aligned with the NIF service specification. It also includes a reference implementation, a first step towards a successful and cost-effective adoption. The specific contributions in this paper are: (i) the Senpy framework, (ii) an architecture for the framework that follows a plug-in approach, (iii) a reference open source implementation of the architecture, (iv) the use and validation of the framework and architecture in a big data sentiment analysis European project. Our aim is to foster the development of a new generation of emotion aware services by isolating the development of new algorithms from the representation of results and the deployment of services.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122000326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Data-Driven Sales Leads Prediction for Everything-as-a-Service in the Cloud 数据驱动的销售线索预测云中的一切即服务
Chul Sung, Bo Zhang, Chunhui Y. Higgins, Y. Choe
{"title":"Data-Driven Sales Leads Prediction for Everything-as-a-Service in the Cloud","authors":"Chul Sung, Bo Zhang, Chunhui Y. Higgins, Y. Choe","doi":"10.1109/DSAA.2016.83","DOIUrl":"https://doi.org/10.1109/DSAA.2016.83","url":null,"abstract":"A cloud platform website, offering a catalog of services, operates under a freemium business model or a free trial business model, aggressively marketing to customers who have previously visited. In such a cloud platform or service business, accurate identification of high profile customers is central to the success for the business. However, there are several limitations of existing approaches because of the following challenges: (1) heavy customer traffic flows, (2) the noise in user behaviors, (3) a lack of collaboration across stakeholders, (4) class imbalanced customer data (few paying customers vs. high numbers of freemium or trial customers), and (5) unpredictable business environments. In this paper, we propose a data-driven iterative sales lead prediction framework for cloud everything as a service (XaaS), including a cloud platform or software. In this framework, from the BizDevOps process we collaborate to extract business insights from multiple business stakeholders. From these business insights, we calculate service usage scores using our RFDL (Recency, Frequency, Duration, and Lifetime) analysis and estimate sales lead prediction based on the usage scores in a supervised manner. Our framework adapts to a continuously changing environment through iterations of the whole process, maintains its performance of sales lead prediction, and finally shares the prediction results to the sales or marketing team effectively. A three-month pilot implementation of the framework led to more than 300 paying customers and more than $200K increase in revenue. We expect our scalable, iterative sales lead prediction approach to be widely applicable to online or cloud business domains where there is a constant flux of customer traffic.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126832973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Efficient Sampling-Based ADMM for Distributed Data 基于高效采样的分布式数据ADMM
Jun-Kun Wang, Shou-de Lin
{"title":"Efficient Sampling-Based ADMM for Distributed Data","authors":"Jun-Kun Wang, Shou-de Lin","doi":"10.1109/DSAA.2016.41","DOIUrl":"https://doi.org/10.1109/DSAA.2016.41","url":null,"abstract":"This paper presents two strategies to speed up the alternating direction method of multipliers (ADMM) for distributed data. In the first method, inspired by stochastic gradient descent, each machine uses only a subset of its data at the first few iterations, speeding up those iterations. A key result is in proving that despite this approximation, our method enjoys the same convergence rate in terms of the number of iterations as the standard ADMM, and hence is faster overall. The second method also follows the idea of sampling a subset of the data to update the model before the communication of each round. It converts an objective to the approximated dual form and performs ADMM on the dual. The method turns out to be a distributed variant of the recently proposed SDCA-ADMM. Yet, compared to the straightforward distributed implementation of SDCA-ADMM, the proposed method enjoys less frequent communication between machines, better memory usage, and lighter computational demand. Experiments demonstrate the effectiveness of our two strategies.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115241083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Role of Mentions on Tweet Virality 论提及在推特病毒式传播中的作用
Soumajit Pramanik, Qinna Wang, Maximilien Danisch, Sumanth Bandi, Anand Kumar, Jean-Loup Guillaume, Bivas Mitra
{"title":"On the Role of Mentions on Tweet Virality","authors":"Soumajit Pramanik, Qinna Wang, Maximilien Danisch, Sumanth Bandi, Anand Kumar, Jean-Loup Guillaume, Bivas Mitra","doi":"10.1109/DSAA.2016.28","DOIUrl":"https://doi.org/10.1109/DSAA.2016.28","url":null,"abstract":"In this paper, we investigate the role of mentions on tweet propagation. We propose a novel tweet propagation model SIR_MF based on a multiplex network framework, that allows to analyze the effects of mentioning on final retweet count. The basic bricks of this model are supported by a comprehensive study of multiple real datasets and simulations of the model show a nice agreement with the empirically observed tweet popularity. Studies and experiments also reveal that follower count, retweet rate & profile similarity are important factors in gaining tweet popularity and allow to better understand the impact of the mention strategies on the retweet count. Interestingly, we analytically identify a critical retweet rate regulating the role of mention on the tweet popularity. Finally, our data driven simulation demonstrates that the proposed mention recommendation heuristic \"Easy-Mention\" outperforms the benchmark \"Whom-To-Mention\" algorithm.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116282213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Temporal Network Change Detection Using Network Centralities 利用网络中心性进行时态网络变化检测
Yoshitaro Yonamoto, K. Morino, K. Yamanishi
{"title":"Temporal Network Change Detection Using Network Centralities","authors":"Yoshitaro Yonamoto, K. Morino, K. Yamanishi","doi":"10.1109/DSAA.2016.13","DOIUrl":"https://doi.org/10.1109/DSAA.2016.13","url":null,"abstract":"In this paper, we propose a novel change detection method for temporal networks. In usual change detection algorithms, change scores are generated from an observed time series. When this change score reaches a threshold, an alert is raised to declare the change. Our method aggregates these change scores and alerts based on network centralities. Many types of changes in a network can be discovered from changes to the network structure. Thus, nodes and links should be monitored in order to recognize changes. However, it is difficult to focus on the appropriate nodes and links when there is little information regarding the dataset. Network centrality such as PageRank measures the importance of nodes in a network based on certain criteria. Therefore, it is natural to apply network centralities in order to improve the accuracy of change detection methods. Our analysis reveals how and when network centrality works well in terms of change detection. Based on this understanding, we propose an aggregating algorithm that emphasizes the appropriate network centralities. Our evaluation of the proposed aggregation algorithm showed highly accurate predictions for an artificial dataset and two real datasets. Our method contributes to extending the field of change detection in temporal networks by utilizing network centralities.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"303 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116329489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Distributed Decision Tree Algorithm and Its Implementation on Big Data Platforms 一种分布式决策树算法及其在大数据平台上的实现
Jingxiang Chen, Tao Wang, Ralph Abbey, J. Pingenot
{"title":"A Distributed Decision Tree Algorithm and Its Implementation on Big Data Platforms","authors":"Jingxiang Chen, Tao Wang, Ralph Abbey, J. Pingenot","doi":"10.1109/DSAA.2016.64","DOIUrl":"https://doi.org/10.1109/DSAA.2016.64","url":null,"abstract":"Decision tree algorithms are very popular in the field of data mining. This paper proposes a distributed decision tree algorithm and shows examples of its implementation on big data platforms. The major contribution of this paper is the novel KS-Tree algorithm which builds a decision tree in a distributed environment. KS-Tree is applied to some real world data mining problems and compared with state-of-the-art decision tree techniques that are implemented in R and Apache Spark. The results show that KS-Tree can achieve better results, especially with large data sets. Furthermore, we demonstrate that KS-Tree can be applied to various data mining tasks, such as variable selection.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128271426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Behavior-Oriented Time Segmentation for Mining Individualized Rules of Mobile Phone Users 面向行为的手机用户个性化规则挖掘时间分割
Iqbal H. Sarker, A. Colman, M. A. Kabir, Jun Han
{"title":"Behavior-Oriented Time Segmentation for Mining Individualized Rules of Mobile Phone Users","authors":"Iqbal H. Sarker, A. Colman, M. A. Kabir, Jun Han","doi":"10.1109/DSAA.2016.60","DOIUrl":"https://doi.org/10.1109/DSAA.2016.60","url":null,"abstract":"Mobile or cellular phones can record various types of context data related to a user's phone call activities. In this paper, we present an approach to discovering individualized behavior rules for mobile users from their phone call records, based on the temporal context in which a user accepts, rejects or misses a call. One of the determinants of an individual's phone behavior is the various activities undertaken at various times of a day and days of the week. In many cases, such behavior will follow temporal patterns. Currently, researchers modeling user behavior using temporal context statically segment time into arbitrary categories (e.g., morning, evening) or periods (e.g., 1 hour). However, such time categorization does not necessarily map to the patterns of individual user activity and subsequent behavior. Therefore, we propose a behavior-oriented time segmentation (BOTS) technique that dynamically identifies diverse time segments for an individual user's behaviors based on the phone call records. Experiments on real datasets show that our proposed technique better captures the user's dominant call response behavior at various times of the day and week, thereby enabling more appropriate rules to be created for the purpose of automated handling of incoming calls, in an intelligent call interruption management system.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122848600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Anomaly Detection in Automobile Control Network Data with Long Short-Term Memory Networks 基于长短期记忆网络的汽车控制网络数据异常检测
Adrian Taylor, Sylvain P. Leblanc, N. Japkowicz
{"title":"Anomaly Detection in Automobile Control Network Data with Long Short-Term Memory Networks","authors":"Adrian Taylor, Sylvain P. Leblanc, N. Japkowicz","doi":"10.1109/DSAA.2016.20","DOIUrl":"https://doi.org/10.1109/DSAA.2016.20","url":null,"abstract":"Modern automobiles have been proven vulnerable to hacking by security researchers. By exploiting vulnerabilities in the car's external interfaces, such as wifi, bluetooth, and physical connections, they can access a car's controller area network (CAN) bus. On the CAN bus, commands can be sent to control the car, for example cutting the brakes or stopping the engine. While securing the car's interfaces to the outside world is an important part of mitigating this threat, the last line of defence is detecting malicious behaviour on the CAN bus. We propose an anomaly detector based on a Long Short-Term Memory neural network to detect CAN bus attacks. The detector works by learning to predict the next data word originating from each sender on the bus. Highly surprising bits in the actual next word are flagged as anomalies. We evaluate the detector by synthesizing anomalies with modified CAN bus data. The synthesized anomalies are designed to mimic attacks reported in the literature. We show that the detector can detect anomalies we synthesized with low false alarm rates. Additionally, the granularity of the bit predictions can provide forensic investigators clues as to the nature of flagged anomalies.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128077029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 268
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信