2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)最新文献

筛选
英文 中文
Estimating Causal Effects on Social Networks 估计社会网络的因果效应
L. Forastiere, F. Mealli, Albert Wu, E. Airoldi
{"title":"Estimating Causal Effects on Social Networks","authors":"L. Forastiere, F. Mealli, Albert Wu, E. Airoldi","doi":"10.1109/DSAA.2018.00016","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00016","url":null,"abstract":"In most real-world systems units are interconnected and can be represented as networks consisting of nodes and edges. For instance, in social systems individuals can have social ties, family or financial relationships. In settings where some units are exposed to a treatment and its effects spills over connected units, estimating both the direct effect of the treatment and spillover effects presents several challenges. First, assumptions on the way and the extent to which spillover effects occur along the observed network are required. Second, in observational studies, where the treatment assignment is not under the control of the investigator, confounding and homophily are potential threats to the identification and estimation of causal effects on networks. Here, we make two structural assumptions: i) neighborhood interference, which assumes interference to operate only through a function of the the immediate neighbors' treatments, ii) unconfoundedness of the individual and neighborhood treatment, which rules out the presence of unmeasured confounding variables, including those driving homophily. Under these assumptions we develop a new covariate-adjustment estimator for treatment and spillover effects in observational studies on networks. Estimation is based on a generalized propensity score that balances individual and neighborhood covariates across units under different levels of individual treatment and of exposure to neighbors' treatment. Adjustment for propensity score is performed using a penalized spline regression. Inference capitalizes on a three-step Bayesian procedure which allows taking into account the uncertainty in the propensity score estimation and avoiding model feedback. Finally, correlation of interacting units is taken into account using a community detection algorithm and incorporating random effects in the outcome model. All these sources of variability, including variability of treatment assignment, are accounted for in the posterior distribution of finite-sample causal estimands.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134474747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
DSAA 2018 Program Committee 2018年DSAA项目委员会
{"title":"DSAA 2018 Program Committee","authors":"","doi":"10.1109/dsaa.2018.00008","DOIUrl":"https://doi.org/10.1109/dsaa.2018.00008","url":null,"abstract":"","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129938477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Efficient Closed Infrequent Itemset Mining Using Bi-Directional Traversing 基于双向遍历的高效封闭非频繁项集挖掘
Yifeng Lu, T. Seidl
{"title":"Towards Efficient Closed Infrequent Itemset Mining Using Bi-Directional Traversing","authors":"Yifeng Lu, T. Seidl","doi":"10.1109/DSAA.2018.00024","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00024","url":null,"abstract":"In this work, we investigate the opposite question of frequent itemset mining: what patterns occurred less than a given minimum support in a transactional database? This question, known as infrequent itemset mining, is important in fields such as medical science, security, finance and scientific research. Frequent patterns represent expected or obvious information while infrequent patterns are those unexpected behaviors and are more interesting in some applications. For example, health-care needs to identify sporadic but lethal crossover effects. Security agents have to uncover infrequent associative fraud indicators. Existing infrequent itemset mining approaches are time-consuming. Furthermore, extracting all infrequent patterns might suffer from the redundant problem. In this paper, we study the two factors that affect the performance of itemset mining tasks. The concept of closed itemset is applied for infrequent patterns to reduce the number of returned patterns. An efficient closed infrequent itemset mining approach is proposed which combines both bottom-up and top-down traversing strategies. Extensive experimental results show that a simple algorithm based on our framework, without using advanced data structure or pruning techniques, can still be significantly more efficient when compared with other approaches.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121319788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MixDir: Scalable Bayesian Clustering for High-Dimensional Categorical Data MixDir:高维分类数据的可伸缩贝叶斯聚类
C. Ahlmann-Eltze, C. Yau
{"title":"MixDir: Scalable Bayesian Clustering for High-Dimensional Categorical Data","authors":"C. Ahlmann-Eltze, C. Yau","doi":"10.1109/DSAA.2018.00068","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00068","url":null,"abstract":"Multivariate analysis of high-dimensional datasets with multiple categorical variables (e.g. surveys, questionnaires) is a challenging task but can reveal patterns of responses that are masked from univariate analyses. In this paper we propose a novel variational inference algorithm to cluster high-dimensional categorical observations into latent classes. Variational inference is an approximate Bayesian inference algorithm, which combines fast optimization methods with the ability to propagate the uncertainty to the clustering (soft clustering). The model is robust to misspecification of the number of latent classes and can infer a reasonable number from the data. We assess the performance on synthetic and real world data and show that our algorithm has similar performance to the best other tested method if the correct number of classes is known and outperforms the other methods if it the number of classes needs to be inferred. An R-package implementing our algorithm is available at the Comprehensive R Archive Network","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127788095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Learning Data Mining 学习数据挖掘
Riccardo Guidotti, A. Monreale, S. Rinzivillo
{"title":"Learning Data Mining","authors":"Riccardo Guidotti, A. Monreale, S. Rinzivillo","doi":"10.1109/DSAA.2018.00047","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00047","url":null,"abstract":"In the last decade the usage and study of data mining and machine learning algorithms have received an increasing attention from several and heterogeneous fields of research. Learning how and why a certain algorithm returns a particular result, and understanding which are the main problems connected to its execution is a hot topic in the education of data mining methods. In order to support data mining beginners, students, teachers, and researchers we introduce a novel didactic environment. The Didactic Data Mining Environment (DDME) allows to execute a data mining algorithm on a dataset and to observe the algorithm behavior step by step to learn how and why a certain result is returned. DDME can be practically exploited by teachers and students for having a more interactive learning of data mining. Indeed, on top of the core didactic library, we designed a visual platform that allows online execution of experiments and the visualization of the algorithm steps. The visual platform abstracts the coding activity and makes available the execution of algorithms to non-technicians.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114293731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On Non-Routine Places in Urban Human Mobility 论城市人口流动中的非常规场所
C. Quadri, Matteo Zignani, S. Gaito, G. P. Rossi
{"title":"On Non-Routine Places in Urban Human Mobility","authors":"C. Quadri, Matteo Zignani, S. Gaito, G. P. Rossi","doi":"10.1109/DSAA.2018.00075","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00075","url":null,"abstract":"The dichotomy between two opposite propensities, exploration and exploitation, characterizes and drives many human behaviors, from decision making to social learning. Recently, this dichotomy has been found also in human mobility where people can be divided into two basic types: returners, those who are very regular in their daily mobility; and explorers, those who are inclined to break out of their daily mobility routine and explore new places. While the former attitude has been widely studied in literature and results in the well-known tendency to frequently visit a few locations (e.g. home and workplace), the latter trait remains an unexplored aspect and deserves further investigation. In this work we focus on the characterization of the places that an individual visits when she is driven by her propensity for exploration, i.e. non-routine places which are outside her usual daily mobility patterns. To this end, we mine an anonymized mobile phone dataset which integrates call, text and data activities of about one million subscribers in Milan, to detect and characterize the non-routine places. Moreover, we complement it with Foursquare venues along with their category to semantically characterize the reasons driving the choice of the places to explore. The analysis of the non-routine places and the mobility patterns during the exploration phase brings to light some interesting findings: i) to a greater or lesser extent, all individuals are explorers since they visit a significant number of non-routine places; ii) due to the exceptionality of a visit, non-routine places are farther from home and workplace than frequently visited places; iii) we are explorers in our leisure time; iv) we get to a non-routine place leaving our home, then we return home later; and v) in Milan, shopping, in particular at fashion and clothing stores, is the main interest behind the need to explore non-routine places.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123720750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Fine-Grained Analysis of Cyberbullying Using Weakly-Supervised Topic Models 基于弱监督主题模型的网络欺凌细粒度分析
Yue Zhang, Arti Ramesh
{"title":"Fine-Grained Analysis of Cyberbullying Using Weakly-Supervised Topic Models","authors":"Yue Zhang, Arti Ramesh","doi":"10.1109/DSAA.2018.00065","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00065","url":null,"abstract":"The possibility of anonymity and lack of effective ways to identify inappropriate messages have resulted in a significant amount of online interaction data that attempt to harass, bully, or offend the recipient. In this work, we perform a fine-grained quantitative and qualitative linguistic analysis of messages exchanged using one such recent web/smartphone application—Sarahah, that allows friends to exchange messages anonymously. We first develop a weakly supervised hierarchical framework using seeded topic models to automatically categorize Sarahah messages into different coarse and fine-grained bullying categories. Our linguistic analysis reveals that a significant number of messages exchanged using this platform (~ 20%) include inappropriate, hurtful, or profane language intended to embarrass, offend, or bully the recipient. We then present a detailed analysis of the messages and corresponding users' responses to these messages in the different bullying categories by comparing them across different linguistic and psychological attributes such as sentiment and psycho-linguistic categories from Linguistic Inquiry Word Count (LIWC). Finally, we perform a comparative analysis of messages exchanged on Sarahah to an existing labeled cyberbullying dataset from the Formspring social network on the severity of bullying, coarse-grained bullying categories, and anonymity. Our analysis sheds light on the different categories of bullying and the effect each category has on the recipient and helps quantify the different types and amounts of negativity existing in online social media.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126293413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Wetting and Drying of Soil: From Data to Understandable Models for Prediction 土壤干湿:从数据到可理解的预测模型
Aniruddha Basak, O. Mengshoel, K. Schmidt, Chinmay Kulkarni
{"title":"Wetting and Drying of Soil: From Data to Understandable Models for Prediction","authors":"Aniruddha Basak, O. Mengshoel, K. Schmidt, Chinmay Kulkarni","doi":"10.1109/DSAA.2018.00041","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00041","url":null,"abstract":"Soil moisture is critical to agriculture, ecology, and certain natural disasters. Existing soil moisture models often fail to predict soil moisture accurately for time periods greater than a few hours. To tackle this problem, we introduce in this paper two novel models, the Naive Accumulative Representation (NAR) and the Additive Exponential Accumulative Representation (AEAR). The parameters in these models reflect hydrological redistribution processes of gravity and suction. We validate our models using soil moisture and rainfall time series data collected from a steep gradient post-wildfire site in Southern California. Data analysis is challenging, since rapid landscape change in steep, burned hillslopes is typically observed in response to even small to moderate rain events. We found that the AEAR model fits the data well for three distinct soil textures at different depths below the ground surface (at 5cm, 15cm, and 30cm). Similar strong results are demonstrated in controlled soil moisture experiments. Our recommended AEAR model has been validated as effective and useful by earth scientists, giving better forecasts than existing models for time horizons of 10 to 24 hours.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129365590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Analysing Activities, Contextualized for General Health, Depression and Demographics 在一般健康、抑郁和人口统计学背景下分析活动
F. Murtagh
{"title":"Analysing Activities, Contextualized for General Health, Depression and Demographics","authors":"F. Murtagh","doi":"10.1109/DSAA.2018.00070","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00070","url":null,"abstract":"The contextualizing of large and complex data sources is crucial. Associated with analytical focus can be the addressing of bias in social media and other data sources, and associated with contextualization is Big Data calibration. European Social Survey data is used here, and the main objective is the evaluation of mental health survey data. While specific findings and outcomes are the major objectives here, relating to definition and properties of mental capital, future objectives will be as follows: to plan with metadata and ontology for further, future and rewarding integration with other data sources, both nationally and globally.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126865859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing New User Experience in Online Services 优化在线服务的新用户体验
Ken Soong, Xin Fu, Yang Zhou
{"title":"Optimizing New User Experience in Online Services","authors":"Ken Soong, Xin Fu, Yang Zhou","doi":"10.1109/DSAA.2018.00057","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00057","url":null,"abstract":"\"Well begun is half done\". This proverb is especially true for a web product when it comes to creating a delightful and proactive user experience. This article describes our work in the last few years optimizing new user experience at LinkedIn, driven by application of data science and advanced analytical methods. Through mining the logs generated by new users in the past, we uncovered signals from their initial session that can predict their retention. We established the difference between creating models for predictions and creating models to inform product strategy. We found that persistent features, such as a user's number of connections, and having a confirmed channel of communication (email, phone or app), more strongly predict new user retention than most transient features such as how long they spend on the registration form or how many page views they have visited. We further constructed a true north metric (Quality Signup) to drive our Growth team towards the right focus as they iterated through multiple versions of new user onboarding flows. The strong positive correlation between the Quality Signup metric and long-term retention, as well as the positive impact we have seen on the product over the last two years, validate our strategy to drive product roadmap through data-informed metrics.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133547961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信