2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)最新文献

筛选
英文 中文
Exploiting a Bootstrapping Approach for Automatic Annotation of Emotions in Texts 基于自举方法的文本情感自动标注
Lea Canales, C. Strapparava, E. Boldrini, P. Martínez-Barco
{"title":"Exploiting a Bootstrapping Approach for Automatic Annotation of Emotions in Texts","authors":"Lea Canales, C. Strapparava, E. Boldrini, P. Martínez-Barco","doi":"10.1109/DSAA.2016.78","DOIUrl":"https://doi.org/10.1109/DSAA.2016.78","url":null,"abstract":"The objective of this research is to develop a technique to automatically annotate emotional corpora. The complexity of automatic annotation of emotional corpora still presents numerous challenges and thus there is a need to develop a technique that allow us to tackle the annotation task. The relevance of this research is demonstrated by the fact that people's emotions and the patterns of these emotions provide a great value for business, individuals, society or politics. Hence, the creation of a robust emotion detection system becomes crucial. Due to the subjectivity of the emotions, the main challenge for the creation of emotional resources is the annotation process. Thus, with this staring point in mind, the objective of our paper is to illustrate an innovative and effective bootstrapping process for automatic annotations of emotional corpora. The evaluations carried out confirm the soundness of the proposed approach and allow us to consider the bootstrapping process as an appropriate approach to create resources such as an emotional corpus that can be employed on supervised machine learning towards the improvement of emotion detection systems.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"56 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116372838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Role Models: Mining Role Transitions Data in IT Project Management 角色模型:挖掘IT项目管理中的角色转换数据
G. Palshikar, Sachin Pawar, Nitin Ramrakhiyani
{"title":"Role Models: Mining Role Transitions Data in IT Project Management","authors":"G. Palshikar, Sachin Pawar, Nitin Ramrakhiyani","doi":"10.1109/DSAA.2016.62","DOIUrl":"https://doi.org/10.1109/DSAA.2016.62","url":null,"abstract":"The notion of roles is crucial in project management across various domains. A role indicates a broad set of tasks, activities, deliverables and responsibilities that the person needs to carry out within a project. Assigning roles to team members clarifies the expectations of work items to be delivered by each and structures the interactions of the team among themselves as well as with external stakeholders. This paper analyzes a sizeable real-life dataset regarding the actual usage of roles in software development and maintenance projects in a large multinational IT organization. The paper introduces and formalizes concepts such as seniority level of a role, career progression and career lines, formulates various business questions related to role-based project management, proposes analytics techniques to answer them and outlines the actual results produced to answer the business questions. The business questions are related to dependencies between roles, patterns in role assignments and durations, predicting role changes, discovering insights useful for meeting career aspirations, interesting role sequences etc. The proposed analytics algorithms are based on Markov models, sequence mining, classification and survival analysis.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132364641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Using Players' Gameplay Action-Decision Profiles to Prescribe Training: Reducing Training Costs with Serious Games Analytics 利用玩家的玩法动作决策档案来规定训练:利用严肃游戏分析降低训练成本
C. S. Loh, I. Li
{"title":"Using Players' Gameplay Action-Decision Profiles to Prescribe Training: Reducing Training Costs with Serious Games Analytics","authors":"C. S. Loh, I. Li","doi":"10.1109/DSAA.2016.74","DOIUrl":"https://doi.org/10.1109/DSAA.2016.74","url":null,"abstract":"Players' gameplay action-decision data can be used towards profiling as serious games analytics. The insights gained can help support the decisions for performance improvement and as 'prescriptions' for training – e.g., diagnosing who should receive training, how much training will be given, informing the design of the game, and determining the contents for inclusion and exclusion. Data-driven training prescription can help learning organizations save money by mitigating unnecessary training to reduce costs. Players' learning performance in games can be measured in lieu of their behaviors traced in situ the training environment. Novice players' action-decision data can first be converted into Course of Actions (COAs) before pairwise similarity comparison against that of the expert(s) to reveal how similar they are to the training goal, or expert/model answer. We identified three Gameplay Action-Decision (GAD) profiles from these gameplay action-decision data and applied them as diagnostics towards prescriptive training.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134268936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dilation of Chisini-Jensen-Shannon Divergence Chisini-Jensen-Shannon散度的扩张
P. Sharma, Gary Holness
{"title":"Dilation of Chisini-Jensen-Shannon Divergence","authors":"P. Sharma, Gary Holness","doi":"10.1109/DSAA.2016.25","DOIUrl":"https://doi.org/10.1109/DSAA.2016.25","url":null,"abstract":"Jensen-Shannon divergence (JSD) does not provide adequate separation when the difference between input distributions is subtle. A recently introduced technique, Chisini Jensen Shannon Divergence (CJSD), increases JSD's ability to discriminate between probability distributions by reformulating with operators from Chisini mean. As a consequence, CJSDs also carry additional properties concerning robustness. The utility of this approach was validated in the form of two SVM kernels that give superior classification performance. Our work explores why the performance improvement to JSDs is afforded by this reformulation. We characterize the nature of this improvement based on the idea of relative dilation, that is how Chisini mean transforms JSD's range and prove a number of propositions that establish the degree of this separation. Finally, we provide empirical validation on a synthetic dataset that confirms our theoretical results pertaining to relative dilation.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124535393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Fraud Detection in Energy Consumption: A Supervised Approach 能源消费中的欺诈检测:一种监督方法
Bernat Coma-Puig, J. Carmona, Ricard Gavaldà, Santiago Alcoverro, Victor Martin
{"title":"Fraud Detection in Energy Consumption: A Supervised Approach","authors":"Bernat Coma-Puig, J. Carmona, Ricard Gavaldà, Santiago Alcoverro, Victor Martin","doi":"10.1109/DSAA.2016.19","DOIUrl":"https://doi.org/10.1109/DSAA.2016.19","url":null,"abstract":"Data from utility meters (gas, electricity, water) is a rich source of information for distribution companies, beyond billing. In this paper we present a supervised technique, which primarily but not only feeds on meter information, to detect meter anomalies and customer fraudulent behavior (meter tampering). Our system detects anomalous meter readings on the basis of models built using machine learning techniques on past data. Unlike most previous work, it can incrementally incorporate the result of field checks to grow the database of fraud and non-fraud patterns, therefore increasing model precision over time and potentially adapting to emerging fraud patterns. The full system has been developed with a company providing electricity and gas and already used to carry out several field checks, with large improvements in fraud detection over the previous checks which used simpler techniques.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122785132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Meeting Health Care Research Needs in a Kimball Integrated Data Warehouse 在Kimball集成数据仓库中满足医疗保健研究需求
R. Hart, A. Kuo
{"title":"Meeting Health Care Research Needs in a Kimball Integrated Data Warehouse","authors":"R. Hart, A. Kuo","doi":"10.1109/DSAA.2016.91","DOIUrl":"https://doi.org/10.1109/DSAA.2016.91","url":null,"abstract":"Business Intelligence and the Kimball methodology, often referred to as dimensional modelling, are well established in data warehousing as a successful means of turning data into information. These techniques have been utilized in multiple business areas such as banking, manufacturing, marketing, sales, healthcare and more. Several articles have also shown how the Kimball approach can and has been used in the development of clinical research databases. However, these articles have also shown that there are weaknesses to the Kimball methodology when applied to complex areas such as clinical research. This paper describes our approach to address these weaknesses and meet the more sophisticated needs of health researchers by leveraging relationships within the underlying data and advanced techniques in the Kimball methodology.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123195470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Framework for Description and Analysis of Sampling-Based Approximate Triangle Counting Algorithms 基于采样的近似三角形计数算法描述与分析框架
M. H. Chehreghani
{"title":"A Framework for Description and Analysis of Sampling-Based Approximate Triangle Counting Algorithms","authors":"M. H. Chehreghani","doi":"10.1109/DSAA.2016.15","DOIUrl":"https://doi.org/10.1109/DSAA.2016.15","url":null,"abstract":"Counting the number of triangles in a large graph has many important applications in network analysis. Several frequently computed metrics such as the clustering coefficient and the transitivity ratio need to count the number of triangles. In this paper, we present a randomized framework for expressing and analyzing approximate triangle counting algorithms. We show that many existing approximate triangle counting algorithms can be described in terms of probability distributions given as parameters to the proposed framework. Then, we show that our proposed framework provides a quantitative measure for the quality of different approximate algorithms. Finally, we perform experiments on real-world networks from different domains and show that there is no unique sampling technique outperforming the others for all networks and the quality of sampling techniques depends on different factors such as the structure of the network, the vertex degree-triangle correlation and the number of samples.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124229897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Projecting "Better Than Randomly": How to Reduce the Dimensionality of Very Large Datasets in a Way That Outperforms Random Projections 投影“优于随机”:如何以优于随机投影的方式降低超大数据集的维数
M. Wojnowicz, Di Zhang, Glenn Chisholm, Xuan Zhao, M. Wolff
{"title":"Projecting \"Better Than Randomly\": How to Reduce the Dimensionality of Very Large Datasets in a Way That Outperforms Random Projections","authors":"M. Wojnowicz, Di Zhang, Glenn Chisholm, Xuan Zhao, M. Wolff","doi":"10.1109/DSAA.2016.26","DOIUrl":"https://doi.org/10.1109/DSAA.2016.26","url":null,"abstract":"For very large datasets, random projections (RP) have become the tool of choice for dimensionality reduction. This is due to the computational complexity of principal component analysis. However, the recent development of randomized principal component analysis (RPCA) has opened up the possibility of obtaining approximate principal components on very large datasets. In this paper, we compare the performance of RPCA and RP in dimensionality reduction for supervised learning. In Experiment 1, study a malware classification task on a dataset with over 10 million samples, almost 100,000 features, and over 25 billion non-zero values, with the goal of reducing the dimensionality to a compressed representation of 5,000 features. In order to apply RPCA to this dataset, we develop a new algorithm called large sample RPCA (LS-RPCA), which extends the RPCA algorithm to work on datasets with arbitrarily many samples. We find that classification performance is much higher when using LS-RPCA for dimensionality reduction than when using random projections. In particular, across a range of target dimensionalities, we find that using LS-RPCA reduces classification error by between 37% and 54%. Experiment 2 generalizes the phenomenon to multiple datasets, feature representations, and classifiers. These findings have implications for a large number of research projects in which random projections were used as a preprocessing step for dimensionality reduction. As long as accuracy is at a premium and the target dimensionality is sufficiently less than the numeric rank of the dataset, randomized PCA may be a superior choice. Moreover, if the dataset has a large number of samples, then LS-RPCA will provide a method for obtaining the approximate principal components.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115946938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
On the Evaluation of Outlier Detection and One-Class Classification Methods 关于离群点检测和一类分类方法的评价
Lorne Swersky, Henrique O. Marques, J. Sander, R. Campello, A. Zimek
{"title":"On the Evaluation of Outlier Detection and One-Class Classification Methods","authors":"Lorne Swersky, Henrique O. Marques, J. Sander, R. Campello, A. Zimek","doi":"10.1109/DSAA.2016.8","DOIUrl":"https://doi.org/10.1109/DSAA.2016.8","url":null,"abstract":"It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem. In this paper, we focus on the comparison of oneclass classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies in several important aspects. We study a number of one-class classification and unsupervised outlier detection methods in a rigorous experimental setup, comparing them on a large number of datasets with different characteristics, using different performance measures. Our experiments led to conclusions that do not fully agree with those of previous work.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"02 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130938419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
A Symbolic Tree Model for Oil and Gas Production Prediction Using Time-Series Production Data 基于时序生产数据的油气产量预测符号树模型
Bingjie Wei, Helen Pinto, Xin Wang
{"title":"A Symbolic Tree Model for Oil and Gas Production Prediction Using Time-Series Production Data","authors":"Bingjie Wei, Helen Pinto, Xin Wang","doi":"10.1109/DSAA.2016.36","DOIUrl":"https://doi.org/10.1109/DSAA.2016.36","url":null,"abstract":"Oil and gas well production prediction takes place in early stages of production to estimate future recovery. A data driven workflow is proposed in this paper to construct a symbolic tree model to predict new well production using historic time-series production data of analogous wells. Production data are firstly aggregated and symbolized for dimensionality reduction and data discretization of time-series data. A symbolic tree is constructed on time-series symbol sequences, and pre-pruning mechanisms – minimum node size and spatial information gain – are integrated to achieve a compact and informative tree. A coverage index is used to assess the tree size. A case study was conducted applying the proposed workflow to shale gas wells in Montney-A pool in Canada. It has proved the feasibility and accuracy of the proposed method.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131311408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信