2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)最新文献

A Multi-Granularity Pattern-Based Sequence Classification Framework for Educational Data 基于多粒度模式的教育数据序列分类框架

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-12-26 DOI: 10.1109/DSAA.2016.46

Mohammad Jaber, P. Wood, P. Papapetrou, A. González‐Marcos

引用次数: 6

Task Composition in Crowdsourcing 众包中的任务构成

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-12-22 DOI: 10.1109/DSAA.2016.27

S. Amer-Yahia, Éric Gaussier, V. Leroy, Julien Pilourdault, R. M. Borromeo, Motomichi Toyama

{"title":"Task Composition in Crowdsourcing","authors":"S. Amer-Yahia, Éric Gaussier, V. Leroy, Julien Pilourdault, R. M. Borromeo, Motomichi Toyama","doi":"10.1109/DSAA.2016.27","DOIUrl":"https://doi.org/10.1109/DSAA.2016.27","url":null,"abstract":"Crowdsourcing has gained popularity in a variety of domains as an increasing number of jobs are \"taskified\" and completed independently by a set of workers. A central process in crowdsourcing is the mechanism through which workers find tasks. On popular platforms such as Amazon Mechanical Turk, tasks can be sorted by dimensions such as creation date or reward amount. Research efforts on task assignment have focused on adopting a requester-centric approach whereby tasks are proposed to workers in order to maximize overall task throughput, result quality and cost. In this paper, we advocate the need to complement that with a worker-centric approach to task assignment, and examine the problem of producing, for each worker, a personalized summary of tasks that preserves overall task throughput. We formalize task composition for workers as an optimization problem that finds a representative set of k valid and relevant Composite Tasks (CTs). Validity enforces that a composite task complies with the task arrival rate and satisfies the worker's expected wage. Relevance imposes that tasks match the worker's qualifications. We show empirically that workers' experience is greatly improved due to task homogeneity in each CT and to the adequation of CTs with workers' skills. As a result task throughput is improved.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131729438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Maritime Pattern Extraction from AIS Data Using a Genetic Algorithm 基于遗传算法的AIS数据海事模式提取

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-17 DOI: 10.1109/DSAA.2016.73

Andrej Dobrkovic, M. Iacob, J. Hillegersberg

{"title":"Maritime Pattern Extraction from AIS Data Using a Genetic Algorithm","authors":"Andrej Dobrkovic, M. Iacob, J. Hillegersberg","doi":"10.1109/DSAA.2016.73","DOIUrl":"https://doi.org/10.1109/DSAA.2016.73","url":null,"abstract":"The long term prediction of maritime vessels' destinations and arrival times is essential for making an effective logistics planning. As ships are influenced by various factors over a long period of time, the solution cannot be achieved by analyzing sailing patterns of each entity separately. Instead, an approach is required, that can extract maritime patterns for the area in question and represent it in a form suitable for querying all possible routes any vessel in that region can take. To tackle this problem we use a genetic algorithm (GA) to cluster vessel position data obtained from the publicly available Automatic Identification System (AIS). The resulting clusters are treated as route waypoints (WP), and by connecting them we get nodes and edges of a directed graph depicting maritime patterns. Since standard clustering algorithms have difficulties in handling data with varying density, and genetic algorithms are slow when handling large data volumes, in this paper we investigate how to enhance the genetic algorithm to allow fast and accurate waypoint identification. We also include a quad tree structure to preprocess data and reduce the input for the GA. When the route graph is created, we add post processing to remove inconsistencies caused by noise in the AIS data. Finally, we validate the results produced by the GA by comparing resulting patterns with known inland water routes for two Dutch provinces.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"829 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116422551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

What Did I Do Wrong in My MOBA Game? Mining Patterns Discriminating Deviant Behaviours 我在MOBA游戏中做错了什么?识别异常行为的挖掘模式

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-17 DOI: 10.1109/DSAA.2016.75

Olivier Cavadenti, Víctor Codocedo, Jean-François Boulicaut, Mehdi Kaytoue-Uberall

{"title":"What Did I Do Wrong in My MOBA Game? Mining Patterns Discriminating Deviant Behaviours","authors":"Olivier Cavadenti, Víctor Codocedo, Jean-François Boulicaut, Mehdi Kaytoue-Uberall","doi":"10.1109/DSAA.2016.75","DOIUrl":"https://doi.org/10.1109/DSAA.2016.75","url":null,"abstract":"The success of electronic sports (eSports), where professional gamers participate in competitive leagues and tournaments, brings new challenges for the video game industry. Other than fun, games must be difficult and challenging for eSports professionals but still easy and enjoyable for amateurs. In this article, we consider Multi-player Online Battle Arena games (MOBA) and particularly, \"Defense of the Ancients 2\", commonly known simply as DOTA2. In this context, a challenge is to propose data analysis methods and metrics that help players to improve their skills. We design a data mining-based method that discovers strategic patterns from historical behavioral traces: Given a model encoding an expected way of playing (the norm), we are interested in patterns deviating from the norm that may explain a game outcome from which player can learn more efficient ways of playing. The method is formally introduced and shown to be adaptable to different scenarios. Finally, we provide an experimental evaluation over a dataset of 10 000 behavioral game traces.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125772749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Advanced Analytics for Train Delay Prediction Systems by Including Exogenous Weather Data 包含外生天气数据的列车延误预测系统的高级分析

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-01 DOI: 10.1109/DSAA.2016.57

L. Oneto, Emanuele Fumeo, Giorgio Clerico, Renzo Canepa, Federico Papa, C. Dambra, N. Mazzino, D. Anguita

{"title":"Advanced Analytics for Train Delay Prediction Systems by Including Exogenous Weather Data","authors":"L. Oneto, Emanuele Fumeo, Giorgio Clerico, Renzo Canepa, Federico Papa, C. Dambra, N. Mazzino, D. Anguita","doi":"10.1109/DSAA.2016.57","DOIUrl":"https://doi.org/10.1109/DSAA.2016.57","url":null,"abstract":"State-of-the-art train delay prediction systems neither exploit historical data about train movements, nor exogenous data about phenomena that can affect railway operations. They rely, instead, on static rules built by experts of the railway infrastructure based on classical univariate statistics. The purpose of this paper is to build a data-driven train delay prediction system that exploits the most recent analytics tools. The train delay prediction problem has been mapped into a multivariate regression problem and the performance of kernel methods, ensemble methods and feed-forward neural networks have been compared. Firstly, it is shown that it is possible to build a reliable and robust data-driven model based only on the historical data about the train movements. Additionally, the model can be further improved by including data coming from exogenous sources, in particular the weather information provided by national weather services. Results on real world data coming from the Italian railway network show that the proposal of this paper is able to remarkably improve the current state-of-the-art train delay prediction systems. Moreover, the performed simulations show that the inclusion of weather data into the model has a significant positive impact on its performance.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115225704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Web Behavior Analysis Using Sparse Non-Negative Matrix Factorization 基于稀疏非负矩阵分解的网络行为分析

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-01 DOI: 10.1109/DSAA.2016.85

Akihiro Demachi, Shin Matsushima, K. Yamanishi

引用次数: 3

Anonymizing NYC Taxi Data: Does It Matter? 匿名纽约市出租车数据:重要吗?

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-01 DOI: 10.1109/DSAA.2016.21

Marie Douriez, Harish Doraiswamy, J. Freire, Cláudio T. Silva

{"title":"Anonymizing NYC Taxi Data: Does It Matter?","authors":"Marie Douriez, Harish Doraiswamy, J. Freire, Cláudio T. Silva","doi":"10.1109/DSAA.2016.21","DOIUrl":"https://doi.org/10.1109/DSAA.2016.21","url":null,"abstract":"The widespread use of location-based services has led to an increasing availability of trajectory data from urban environments. These data carry rich information that are useful for improving cities through traffic management and city planning. Yet, it also contains information about individuals which can jeopardize their privacy. In this study, we work with the New York City (NYC) taxi trips data set publicly released by the Taxi and Limousine Commission (TLC). This data set contains information about every taxi cab ride that happened in NYC. A bad hashing of the medallion numbers (the ID corresponding to a taxi) allowed the recovery of all the medallion numbers and led to a privacy breach for the drivers, whose income could be easily extracted. In this work, we initiate a study to evaluate whether \"perfect\" anonymity is possible and if such an identity disclosure can be avoided given the availability of diverse sets of external data sets through which the hidden information can be recovered. This is accomplished through a spatio-temporal join based attack which matches the taxi data with an external medallion data that can be easily gathered by an adversary. Using a simulation of the medallion data, we show that our attack can re-identify over 91% of the taxis that ply in NYC even when using a perfect pseudonymization of medallion numbers. We also explore the effectiveness of trajectory anonymization strategies and demonstrate that our attack can still identify a significant fraction of the taxis in NYC. Given the restrictions in publishing the taxi data by TLC, our results indicate that unless the utility of the data set is significantly compromised, it will not be possible to maintain the privacy of taxi medallion owners and drivers.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122650308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 57

Mining Pre-Exposure Prophylaxis Trends in Social Media 挖掘社交媒体的暴露前预防趋势

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-01 DOI: 10.1109/DSAA.2016.29

P. Breen, Jane M Kelly, T. Heckman, Shannon P. Quinn

{"title":"Mining Pre-Exposure Prophylaxis Trends in Social Media","authors":"P. Breen, Jane M Kelly, T. Heckman, Shannon P. Quinn","doi":"10.1109/DSAA.2016.29","DOIUrl":"https://doi.org/10.1109/DSAA.2016.29","url":null,"abstract":"Pre-Exposure Prophylaxis (PrEP) is a ground-breaking biomedical approach to curbing the transmission of Human Immunodeficiency Virus (HIV). Truvada, the most common form of PrEP, is a combination of tenofovir and emtricitabine and is a once-daily oral mediation taken by HIV-seronegative persons at elevated risk for HIV infection. When taken reliably every day, PrEP can reduce one's risk for HIV infection by as much as 99%. While highly efficacious, PrEP is expensive, somewhat stigmatized, and many health care providers remain uninformed about its benefits. Data mining of social media can monitor the spread of HIV in the United States, but no study has investigated PrEP use and sentiment via social media. This paper describes a data mining and machine learning strategy using natural language processing (NLP) that monitors Twitter social media data to identify PrEP discussion trends. Results showed that we can identify PrEP and HIV discussion dynamics over time, and assign PrEP-related tweets positive or negative sentiment. Results can enable public health professionals to monitor PrEP discussion trends and identify strategies to improve HIV prevention via PrEP.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128595396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using Survival Ensembles 手机社交游戏用户流失预测:基于生存组合的完整评估

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-01 DOI: 10.1109/DSAA.2016.84

Á. Periáñez, A. Saas, Anna Guitart, Colin Magne

{"title":"Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using Survival Ensembles","authors":"Á. Periáñez, A. Saas, Anna Guitart, Colin Magne","doi":"10.1109/DSAA.2016.84","DOIUrl":"https://doi.org/10.1109/DSAA.2016.84","url":null,"abstract":"Reducing user attrition, i.e. churn, is a broad challenge faced by several industries. In mobile social games, decreasing churn is decisive to increase player retention and rise revenues. Churn prediction models allow to understand player loyalty and to anticipate when they will stop playing a game. Thanks to these predictions, several initiatives can be taken to retain those players who are more likely to churn. Survival analysis focuses on predicting the time of occurrence of a certain event, churn in our case. Classical methods, like regressions, could be applied only when all players have left the game. The challenge arises for datasets with incomplete churning information for all players, as most of them still connect to the game. This is called a censored data problem and is in the nature of churn. Censoring is commonly dealt with survival analysis techniques, but due to the inflexibility of the survival statistical algorithms, the accuracy achieved is often poor. In contrast, novel ensemble learning techniques, increasingly popular in a variety of scientific fields, provide high-class prediction results. In this work, we develop, for the first time in the social games domain, a survival ensemble model which provides a comprehensive analysis together with an accurate prediction of churn. For each player, we predict the probability of churning as function of time, which permits to distinguish various levels of loyalty profiles. Additionally, we assess the risk factors that explain the predicted player survival times. Our results show that churn prediction by survival ensembles significantly improves the accuracy and robustness of traditional analyses, like Cox regression.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":" 14","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120828419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 79

Combining Static and Dynamic Features for Multivariate Sequence Classification 结合静态和动态特征的多变量序列分类

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-01 DOI: 10.1109/DSAA.2016.10

A. Leontjeva, Ilya Kuzovkin

引用次数: 23