2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)最新文献

筛选
英文 中文
A Multi-Granularity Pattern-Based Sequence Classification Framework for Educational Data 基于多粒度模式的教育数据序列分类框架
Mohammad Jaber, P. Wood, P. Papapetrou, A. González‐Marcos
{"title":"A Multi-Granularity Pattern-Based Sequence Classification Framework for Educational Data","authors":"Mohammad Jaber, P. Wood, P. Papapetrou, A. González‐Marcos","doi":"10.1109/DSAA.2016.46","DOIUrl":"https://doi.org/10.1109/DSAA.2016.46","url":null,"abstract":"In many application domains, such as education, sequences of events occurring over time need to be studied in order to understand the generative process behind these sequences, and hence classify new examples. In this paper, we propose a novel multi-granularity sequence classification framework that generates features based on frequent patterns at multiple levels of time granularity. Feature selection techniques are applied to identify the most informative features that are then used to construct the classification model. We show the applicability and suitability of the proposed framework to the area of educational data mining by experimenting on an educational dataset collected from an asynchronous communication tool in which students interact to accomplish an underlying group project. The experimental results showed that our model can achieve competitive performance in detecting the students' roles in their corresponding projects, compared to a baseline similarity-based approach.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128405209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Task Composition in Crowdsourcing 众包中的任务构成
S. Amer-Yahia, Éric Gaussier, V. Leroy, Julien Pilourdault, R. M. Borromeo, Motomichi Toyama
{"title":"Task Composition in Crowdsourcing","authors":"S. Amer-Yahia, Éric Gaussier, V. Leroy, Julien Pilourdault, R. M. Borromeo, Motomichi Toyama","doi":"10.1109/DSAA.2016.27","DOIUrl":"https://doi.org/10.1109/DSAA.2016.27","url":null,"abstract":"Crowdsourcing has gained popularity in a variety of domains as an increasing number of jobs are \"taskified\" and completed independently by a set of workers. A central process in crowdsourcing is the mechanism through which workers find tasks. On popular platforms such as Amazon Mechanical Turk, tasks can be sorted by dimensions such as creation date or reward amount. Research efforts on task assignment have focused on adopting a requester-centric approach whereby tasks are proposed to workers in order to maximize overall task throughput, result quality and cost. In this paper, we advocate the need to complement that with a worker-centric approach to task assignment, and examine the problem of producing, for each worker, a personalized summary of tasks that preserves overall task throughput. We formalize task composition for workers as an optimization problem that finds a representative set of k valid and relevant Composite Tasks (CTs). Validity enforces that a composite task complies with the task arrival rate and satisfies the worker's expected wage. Relevance imposes that tasks match the worker's qualifications. We show empirically that workers' experience is greatly improved due to task homogeneity in each CT and to the adequation of CTs with workers' skills. As a result task throughput is improved.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131729438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Maritime Pattern Extraction from AIS Data Using a Genetic Algorithm 基于遗传算法的AIS数据海事模式提取
Andrej Dobrkovic, M. Iacob, J. Hillegersberg
{"title":"Maritime Pattern Extraction from AIS Data Using a Genetic Algorithm","authors":"Andrej Dobrkovic, M. Iacob, J. Hillegersberg","doi":"10.1109/DSAA.2016.73","DOIUrl":"https://doi.org/10.1109/DSAA.2016.73","url":null,"abstract":"The long term prediction of maritime vessels' destinations and arrival times is essential for making an effective logistics planning. As ships are influenced by various factors over a long period of time, the solution cannot be achieved by analyzing sailing patterns of each entity separately. Instead, an approach is required, that can extract maritime patterns for the area in question and represent it in a form suitable for querying all possible routes any vessel in that region can take. To tackle this problem we use a genetic algorithm (GA) to cluster vessel position data obtained from the publicly available Automatic Identification System (AIS). The resulting clusters are treated as route waypoints (WP), and by connecting them we get nodes and edges of a directed graph depicting maritime patterns. Since standard clustering algorithms have difficulties in handling data with varying density, and genetic algorithms are slow when handling large data volumes, in this paper we investigate how to enhance the genetic algorithm to allow fast and accurate waypoint identification. We also include a quad tree structure to preprocess data and reduce the input for the GA. When the route graph is created, we add post processing to remove inconsistencies caused by noise in the AIS data. Finally, we validate the results produced by the GA by comparing resulting patterns with known inland water routes for two Dutch provinces.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"829 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116422551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
What Did I Do Wrong in My MOBA Game? Mining Patterns Discriminating Deviant Behaviours 我在MOBA游戏中做错了什么?识别异常行为的挖掘模式
Olivier Cavadenti, Víctor Codocedo, Jean-François Boulicaut, Mehdi Kaytoue-Uberall
{"title":"What Did I Do Wrong in My MOBA Game? Mining Patterns Discriminating Deviant Behaviours","authors":"Olivier Cavadenti, Víctor Codocedo, Jean-François Boulicaut, Mehdi Kaytoue-Uberall","doi":"10.1109/DSAA.2016.75","DOIUrl":"https://doi.org/10.1109/DSAA.2016.75","url":null,"abstract":"The success of electronic sports (eSports), where professional gamers participate in competitive leagues and tournaments, brings new challenges for the video game industry. Other than fun, games must be difficult and challenging for eSports professionals but still easy and enjoyable for amateurs. In this article, we consider Multi-player Online Battle Arena games (MOBA) and particularly, \"Defense of the Ancients 2\", commonly known simply as DOTA2. In this context, a challenge is to propose data analysis methods and metrics that help players to improve their skills. We design a data mining-based method that discovers strategic patterns from historical behavioral traces: Given a model encoding an expected way of playing (the norm), we are interested in patterns deviating from the norm that may explain a game outcome from which player can learn more efficient ways of playing. The method is formally introduced and shown to be adaptable to different scenarios. Finally, we provide an experimental evaluation over a dataset of 10 000 behavioral game traces.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125772749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Advanced Analytics for Train Delay Prediction Systems by Including Exogenous Weather Data 包含外生天气数据的列车延误预测系统的高级分析
L. Oneto, Emanuele Fumeo, Giorgio Clerico, Renzo Canepa, Federico Papa, C. Dambra, N. Mazzino, D. Anguita
{"title":"Advanced Analytics for Train Delay Prediction Systems by Including Exogenous Weather Data","authors":"L. Oneto, Emanuele Fumeo, Giorgio Clerico, Renzo Canepa, Federico Papa, C. Dambra, N. Mazzino, D. Anguita","doi":"10.1109/DSAA.2016.57","DOIUrl":"https://doi.org/10.1109/DSAA.2016.57","url":null,"abstract":"State-of-the-art train delay prediction systems neither exploit historical data about train movements, nor exogenous data about phenomena that can affect railway operations. They rely, instead, on static rules built by experts of the railway infrastructure based on classical univariate statistics. The purpose of this paper is to build a data-driven train delay prediction system that exploits the most recent analytics tools. The train delay prediction problem has been mapped into a multivariate regression problem and the performance of kernel methods, ensemble methods and feed-forward neural networks have been compared. Firstly, it is shown that it is possible to build a reliable and robust data-driven model based only on the historical data about the train movements. Additionally, the model can be further improved by including data coming from exogenous sources, in particular the weather information provided by national weather services. Results on real world data coming from the Italian railway network show that the proposal of this paper is able to remarkably improve the current state-of-the-art train delay prediction systems. Moreover, the performed simulations show that the inclusion of weather data into the model has a significant positive impact on its performance.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115225704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Web Behavior Analysis Using Sparse Non-Negative Matrix Factorization 基于稀疏非负矩阵分解的网络行为分析
Akihiro Demachi, Shin Matsushima, K. Yamanishi
{"title":"Web Behavior Analysis Using Sparse Non-Negative Matrix Factorization","authors":"Akihiro Demachi, Shin Matsushima, K. Yamanishi","doi":"10.1109/DSAA.2016.85","DOIUrl":"https://doi.org/10.1109/DSAA.2016.85","url":null,"abstract":"We are concerned with the issue of discovering behavioral patterns on the web. When a large amount of web access logs are given, we are interested in how they are categorized and how they are related to activities in real life. In order to conduct that analysis, we develop a novel algorithm for sparse non-negative matrix factorization (SNMF), which can discover patterns of web behaviors. Although there exist a number of variants of SNMFs, our algorithm is novel in that it updates parameters in a multiplicative way with performance guaranteed, thereby works more robustly than existing ones, even when the rank of factorized matrices is large. We demonstrate the effectiveness of our algorithm using artificial data sets. We then apply our algorithm into a large scale web log data obtained from 70,000 monitors to discover meaningful relations among web behavioral patterns and real life activities. We employ the information-theoretic measure to demonstrate that our algorithm is able to extract more significant relations among web behavior patterns and real life activities than competitive methods.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114908371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Anonymizing NYC Taxi Data: Does It Matter? 匿名纽约市出租车数据:重要吗?
Marie Douriez, Harish Doraiswamy, J. Freire, Cláudio T. Silva
{"title":"Anonymizing NYC Taxi Data: Does It Matter?","authors":"Marie Douriez, Harish Doraiswamy, J. Freire, Cláudio T. Silva","doi":"10.1109/DSAA.2016.21","DOIUrl":"https://doi.org/10.1109/DSAA.2016.21","url":null,"abstract":"The widespread use of location-based services has led to an increasing availability of trajectory data from urban environments. These data carry rich information that are useful for improving cities through traffic management and city planning. Yet, it also contains information about individuals which can jeopardize their privacy. In this study, we work with the New York City (NYC) taxi trips data set publicly released by the Taxi and Limousine Commission (TLC). This data set contains information about every taxi cab ride that happened in NYC. A bad hashing of the medallion numbers (the ID corresponding to a taxi) allowed the recovery of all the medallion numbers and led to a privacy breach for the drivers, whose income could be easily extracted. In this work, we initiate a study to evaluate whether \"perfect\" anonymity is possible and if such an identity disclosure can be avoided given the availability of diverse sets of external data sets through which the hidden information can be recovered. This is accomplished through a spatio-temporal join based attack which matches the taxi data with an external medallion data that can be easily gathered by an adversary. Using a simulation of the medallion data, we show that our attack can re-identify over 91% of the taxis that ply in NYC even when using a perfect pseudonymization of medallion numbers. We also explore the effectiveness of trajectory anonymization strategies and demonstrate that our attack can still identify a significant fraction of the taxis in NYC. Given the restrictions in publishing the taxi data by TLC, our results indicate that unless the utility of the data set is significantly compromised, it will not be possible to maintain the privacy of taxi medallion owners and drivers.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122650308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Mining Pre-Exposure Prophylaxis Trends in Social Media 挖掘社交媒体的暴露前预防趋势
P. Breen, Jane M Kelly, T. Heckman, Shannon P. Quinn
{"title":"Mining Pre-Exposure Prophylaxis Trends in Social Media","authors":"P. Breen, Jane M Kelly, T. Heckman, Shannon P. Quinn","doi":"10.1109/DSAA.2016.29","DOIUrl":"https://doi.org/10.1109/DSAA.2016.29","url":null,"abstract":"Pre-Exposure Prophylaxis (PrEP) is a ground-breaking biomedical approach to curbing the transmission of Human Immunodeficiency Virus (HIV). Truvada, the most common form of PrEP, is a combination of tenofovir and emtricitabine and is a once-daily oral mediation taken by HIV-seronegative persons at elevated risk for HIV infection. When taken reliably every day, PrEP can reduce one's risk for HIV infection by as much as 99%. While highly efficacious, PrEP is expensive, somewhat stigmatized, and many health care providers remain uninformed about its benefits. Data mining of social media can monitor the spread of HIV in the United States, but no study has investigated PrEP use and sentiment via social media. This paper describes a data mining and machine learning strategy using natural language processing (NLP) that monitors Twitter social media data to identify PrEP discussion trends. Results showed that we can identify PrEP and HIV discussion dynamics over time, and assign PrEP-related tweets positive or negative sentiment. Results can enable public health professionals to monitor PrEP discussion trends and identify strategies to improve HIV prevention via PrEP.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128595396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using Survival Ensembles 手机社交游戏用户流失预测:基于生存组合的完整评估
Á. Periáñez, A. Saas, Anna Guitart, Colin Magne
{"title":"Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using Survival Ensembles","authors":"Á. Periáñez, A. Saas, Anna Guitart, Colin Magne","doi":"10.1109/DSAA.2016.84","DOIUrl":"https://doi.org/10.1109/DSAA.2016.84","url":null,"abstract":"Reducing user attrition, i.e. churn, is a broad challenge faced by several industries. In mobile social games, decreasing churn is decisive to increase player retention and rise revenues. Churn prediction models allow to understand player loyalty and to anticipate when they will stop playing a game. Thanks to these predictions, several initiatives can be taken to retain those players who are more likely to churn. Survival analysis focuses on predicting the time of occurrence of a certain event, churn in our case. Classical methods, like regressions, could be applied only when all players have left the game. The challenge arises for datasets with incomplete churning information for all players, as most of them still connect to the game. This is called a censored data problem and is in the nature of churn. Censoring is commonly dealt with survival analysis techniques, but due to the inflexibility of the survival statistical algorithms, the accuracy achieved is often poor. In contrast, novel ensemble learning techniques, increasingly popular in a variety of scientific fields, provide high-class prediction results. In this work, we develop, for the first time in the social games domain, a survival ensemble model which provides a comprehensive analysis together with an accurate prediction of churn. For each player, we predict the probability of churning as function of time, which permits to distinguish various levels of loyalty profiles. Additionally, we assess the risk factors that explain the predicted player survival times. Our results show that churn prediction by survival ensembles significantly improves the accuracy and robustness of traditional analyses, like Cox regression.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":" 14","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120828419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Combining Static and Dynamic Features for Multivariate Sequence Classification 结合静态和动态特征的多变量序列分类
A. Leontjeva, Ilya Kuzovkin
{"title":"Combining Static and Dynamic Features for Multivariate Sequence Classification","authors":"A. Leontjeva, Ilya Kuzovkin","doi":"10.1109/DSAA.2016.10","DOIUrl":"https://doi.org/10.1109/DSAA.2016.10","url":null,"abstract":"Model precision in a classification task is highly dependent on the feature space that is used to train the model. Moreover, whether the features are sequential or static will dictate which classification method can be applied as most of the machine learning algorithms are designed to deal with either one or another type of data. In real-life scenarios, however, it is often the case that both static and dynamic features are present, or can be extracted from the data. In this work, we demonstrate how generative models such as Hidden Markov Models (HMM) and Long Short-Term Memory (LSTM) artificial neural networks can be used to extract temporal information from the dynamic data. We explore how the extracted information can be combined with the static features in order to improve the classification performance. We evaluate the existing techniques and suggest a hybrid approach, which outperforms other methods on several public datasets.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132767583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信