2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)最新文献

筛选
英文 中文
Pattern Matching Trajectories for Investigative Graph Searches 调查图搜索的模式匹配轨迹
Benjamin W. K. Hung, A. Jayasumana, Vidarshana W. Bandara
{"title":"Pattern Matching Trajectories for Investigative Graph Searches","authors":"Benjamin W. K. Hung, A. Jayasumana, Vidarshana W. Bandara","doi":"10.1109/DSAA.2016.14","DOIUrl":"https://doi.org/10.1109/DSAA.2016.14","url":null,"abstract":"Investigative graph search is the process of searching for and prioritizing entities of interest that may exhibit part or all of a pattern of attributes or connections for a latent behavior. In this work we formulate a related sub-problem of determining the pattern matching trajectories of such entities. The goal is to not only provide analysts with the ability to find full or partial matches against a query pattern, but also a means to quantify the pace of the appearance of the indicators. This technology has a variety of potential applications such as aiding in the detection of homegrown violent extremists before they carry out acts of domestic terrorism, detecting signs for post-traumatic stress in veterans, or tracking potential customer activities and experiences along a consumer journey. We propose a vectorized graph pattern matching approach that calculates the multi-hop class similarities between nodes in query and data graphs over time. By tracking partial match trajectories, we provide another dimension of analysis in investigative graph searches to highlight entities on a pathway towards a pattern of a latent behavior. We demonstrate the performance of our approach on a real-world BlogCatalog dataset of over 470K nodes and 4 million edges, where 98.56% of nodes and 99.65% of edges were filtered out with preprocessing steps, and successfully detected the trajectory of the top 1,327 nodes towards a query pattern.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116984328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
MedCare: Leveraging Medication Similarity for Disease Prediction 医疗保健:利用药物相似性进行疾病预测
D. Dasgupta, N. Chawla
{"title":"MedCare: Leveraging Medication Similarity for Disease Prediction","authors":"D. Dasgupta, N. Chawla","doi":"10.1109/DSAA.2016.90","DOIUrl":"https://doi.org/10.1109/DSAA.2016.90","url":null,"abstract":"The emergence of electronic health records (EHRs) has made medical history including past and current diseases, and prescribed medications easily available. This has facilitated development of personalized and population health care management systems. Contemporary disease prediction systems leverage data such as disease diagnoses codes to compute patients' similarity and predict the possible future disease risks of an individual. However, we posit that not all diseases (such as pre-existing conditions) may be represented in an EHR as a disease diagnosis code. It is likely that a patient is already taking a medication but does not have a corresponding disease in the EHR. To that end, we posit that the medication history can serve as a proxy for disease diagnoses, and ask the question whether medication and disease diagnoses combined together can improve the predictability of such systems. Building on our prior work in predicting disease risks (CARE), we develop two disease prediction systems: one using medication-based similarity (medCARE) and the other using both disease and medication-based similarity (combinedCARE). We show that combinedCARE provided a greater coverage and a higher average rank.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"570 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116276816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Informative Priors and Bayesian Computation 信息先验和贝叶斯计算
Shirin Golchi
{"title":"Informative Priors and Bayesian Computation","authors":"Shirin Golchi","doi":"10.1109/DSAA.2016.67","DOIUrl":"https://doi.org/10.1109/DSAA.2016.67","url":null,"abstract":"The use of prior distributions is often a controversial topic in Bayesian inference. Informative priors are often avoided at all costs. However, when prior information is available informative priors are an appropriate way of introducing this information into the model. Furthermore, informative priors, when used properly and creatively, can provide solutions to computational issues and improve modeling efficiency. Through three examples with different applications we demonstrate the importance and usefulness of informative priors in incorporating external information into the model and overcoming computational difficulties.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127249468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Deconstructing Domain Names to Reveal Latent Topics 解构域名以揭示潜在主题
Cheryl J. Flynn, Kenneth E. Shirley, Wei Wang
{"title":"Deconstructing Domain Names to Reveal Latent Topics","authors":"Cheryl J. Flynn, Kenneth E. Shirley, Wei Wang","doi":"10.1109/DSAA.2016.63","DOIUrl":"https://doi.org/10.1109/DSAA.2016.63","url":null,"abstract":"Measurement of the lexical properties of domain names enables many types of relatively fast, lightweight web mining analyses. These include unsupervised learning tasks such as automatic categorization and clustering of websites, as well as supervised learning tasks, such as classifying websites as malicious or benign. In this paper we explore whether these tasks can be better accomplished by identifying semantically coherent groups of words in a large set of domain names using a combination of word segmentation and topic modeling methods. By segmenting domain names to generate a large set of new domain-level features, we compare three different unsupervised learning methods for identifying topics among domain name keywords: spherical k-means clustering (SKM), Latent Dirichlet Allocation (LDA), and the Biterm Topic Model (BTM). We successfully infer semantically coherent groups of words in two independent data sets, finding that BTM topics are quantitatively the most coherent. Using the BTM, we compare inferred topics across data sets and across time periods, and we also highlight instances of homophony within the topics. Finally, we show that the BTM topics can be used as features to improve the interpretability of a supervised learning model for the detection of malicious domain names. To our knowledge this is the first large-scale empirical analysis of the co-occurrence patterns of words within domain names.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127258720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Infinite Langevin Mixture Modeling and Feature Selection 无限朗之万混合建模与特征选择
Ola Amayri, N. Bouguila
{"title":"Infinite Langevin Mixture Modeling and Feature Selection","authors":"Ola Amayri, N. Bouguila","doi":"10.1109/DSAA.2016.22","DOIUrl":"https://doi.org/10.1109/DSAA.2016.22","url":null,"abstract":"In this paper, we introduce data clustering based on infinite mixture models for spherical patterns. This particular clustering is based on Langevin distribution which has been shown to be effective to model this kind of data. The proposed learning algorithm is tackled using a fully Bayesian approach. In contrast to classical Bayesian approaches, which suppose an unknown finite number of mixture components, proposed approach assumes an infinite number of components and have witnessed considerable theoretical and computational advances in recent years. In particular, we have developed a Markov Chain Monte Carlo (MCMC) algorithm to sample from the posterior distributions associated with the selected priors for the different model parameters. Moreover, we propose an infinite framework that allows simultaneous feature selection selection and parameter estimation. The usefulness of the developed framework has been shown via topic novelty detection application.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129097260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
On the Tiny Yet Real Happiness Phenomenon in the Mobile Games Market 论手机游戏市场中微小但真实的快乐现象
Po-Heng Chen, Yi-Pei Tu, Kuan-Ta Chen
{"title":"On the Tiny Yet Real Happiness Phenomenon in the Mobile Games Market","authors":"Po-Heng Chen, Yi-Pei Tu, Kuan-Ta Chen","doi":"10.1109/DSAA.2016.76","DOIUrl":"https://doi.org/10.1109/DSAA.2016.76","url":null,"abstract":"This paper explores a counter-intuitive observation in the global mobile games market: that despite people in East Asian countries currently experiencing a challenging economic environment with lower disposable incomes and less leisure time than people in the West, they still spend much greater amounts of money on mobile gaming on a per-user basis. We link this situation to the tiny yet real happiness (TYRH) phenomenon: a term coined by Haruki Murakami, frequently rumored as a future recipient of the Nobel Prize for Literature, in his 1986 book \"Afternoon at Langerhan's Island\". The TYRH phenomenon describes that, due to structural inequality problems, people (especially the members of younger generations) may lose their ambition to actively develop their careers, and instead to cherish small, ordinary moments of bliss. More concretely, people implicated in this phenomenon tend to maintain an attitude of \"living in the moment\" without regard for their current and future lives, and may even retreat into various non-career-related activities, including mobile gaming. In this paper, we investigate the possible role of the TYRH phenomenon in influencing how smartphone users spend money (and time) on mobile games. We find that countries with long work hours, higher scores on the Gini index, lower unemployment rates, and lower life satisfaction are all associated with higher per-user spending on mobile games on both the App Store and Google Play platforms. This suggests that the TYRH phenomenon is indeed positively associated with mobile game-playing and spending behavior, and that countries where the phenomenon is more prominent are likely to contribute disproportionately to the mobile games market, now and in the future.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132060115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Parallel Framework for Grid-Based Bottom-Up Subspace Clustering 基于网格的自底向上子空间聚类并行框架
Poonam Goyal, S. Kumari, Shubham Singh, V. Kishore, S. Balasubramaniam, Navneet Goyal
{"title":"A Parallel Framework for Grid-Based Bottom-Up Subspace Clustering","authors":"Poonam Goyal, S. Kumari, Shubham Singh, V. Kishore, S. Balasubramaniam, Navneet Goyal","doi":"10.1109/DSAA.2016.42","DOIUrl":"https://doi.org/10.1109/DSAA.2016.42","url":null,"abstract":"Clustering is a popular data mining and machine learning technique which discovers interesting patterns from unlabeled data by grouping similar objects together. Clustering high-dimensional data is a challenging task as points in high dimensional space are nearly equidistant from each other, rendering commonly used similarity measures ineffective. Subspace clustering has emerged as a possible solution to the problem of clustering high-dimensional data. In subspace clustering, we try to find clusters in different subspaces within a dataset. Many subspace clustering algorithms have been proposed in the last two decades to find clusters in multiple overlapping subspaces of high-dimensional data. Subspace clustering algorithms iteratively find the best subset of dimensions for a cluster from 2d–1 possible combinations in d-dimensional data. Subspace clustering is extremely compute intensive because of exhaustive search of subspaces, especially in the bottom-up subspace clustering algorithms. To address this issue, an efficient parallel framework for grid-based bottom-up subspace clustering algorithms is developed, considering popular algorithms belonging to this category. The framework is implemented for shared memory, distributed memory, and hybrid systems and is tested for three grid-based bottom-up subspace clustering algorithms: CLIQUE, MAFIA, and ENCLUS. All parallel implementations exhibit impressive speedup and scalability on real datasets.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125435618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Evidence-Based Behavioral Model for Calendar Schedules of Individual Mobile Phone Users 基于证据的个人手机用户日程安排行为模型
Iqbal H. Sarker, M. A. Kabir, A. Colman, Jun Han
{"title":"Evidence-Based Behavioral Model for Calendar Schedules of Individual Mobile Phone Users","authors":"Iqbal H. Sarker, M. A. Kabir, A. Colman, Jun Han","doi":"10.1109/DSAA.2016.86","DOIUrl":"https://doi.org/10.1109/DSAA.2016.86","url":null,"abstract":"The electronic calendar usually serves as a personal organizer and is a valuable resource for managing daily activities or schedules of the users. Naturally, a calendar provides various contextual information about individual's scheduled events/appointments, e.g., meeting. A number of researchers have utilized such information to predict human behavior for mobile communication, by assuming a predefined event-behavior mapping which is static and non-personalized. However, in the real world, people differ from each other in how they respond to incoming calls during their scheduled events, even a particular individual may respond differently subject to what type of event is scheduled in the calendar. Thus a static behavioral model does not necessarily map to calendar schedules and corresponding phone call response behavior of individuals. Therefore, we propose an evidencebased behavioral model (EBM) that dynamically identifies the actual call response behavior of individuals for various calendar events based on their mobile phone log that records the data related to a user's phone call activities. Experiments on real datasets show that our proposed technique better captures the user's call response behavior for various calendar events, thereby enabling more appropriate rules to be created for the purpose of automated handling of incoming calls in an intelligent call interruption management system.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121637877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems 数据科学家会问什么?自动制定和解决预测问题
B. Schreck, K. Veeramachaneni
{"title":"What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems","authors":"B. Schreck, K. Veeramachaneni","doi":"10.1109/DSAA.2016.55","DOIUrl":"https://doi.org/10.1109/DSAA.2016.55","url":null,"abstract":"In this paper, we designed a formal language, called Trane, for describing prediction problems over relational datasets, implemented a system that allows data scientists to specify problems in that language. We show that this language is able to describe several prediction problems and even the ones on KAGGLE-a data science competition website. We express 29 different KAGGLE problems in this language. We designed an interpreter, which translates input from the user, specified in this language, into a series of transformation and aggregation operations to apply to a dataset in order to generate labels that can be used to train a supervised machine learning classifier. Using a smaller subset of this language, we developed a system to automatically enumerate, interpret and solve prediction problems. We tested this system on the Walmart Store Sales Forecasting dataset found on KAGGLE, enumerated 1077 prediction problems and built models that attempted to solve them, for which we produced 235 AUC scores. Considering that only one out of those 1077 problems was the focus of a 2.5 month long competition on KAGGLE, we expect this system to deliver a thousandfold increase in data scientist's productivity.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":" 22","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120828233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Inconsistent Node Flattening for Improving Top-Down Hierarchical Classification 改进自顶向下分层分类的不一致节点平坦化
Azad Naik, H. Rangwala
{"title":"Inconsistent Node Flattening for Improving Top-Down Hierarchical Classification","authors":"Azad Naik, H. Rangwala","doi":"10.1109/DSAA.2016.47","DOIUrl":"https://doi.org/10.1109/DSAA.2016.47","url":null,"abstract":"Large-scale classification of data where classes are structurally organized in a hierarchy is an important area of research. Top-down approaches that exploit the hierarchy during the learning and prediction phase are efficient for large-scale hierarchical classification. However, accuracy of top-down approaches is poor due to error propagation i.e., prediction errors made at higher levels in the hierarchy cannot be corrected at lower levels. One of the main reason behind errors at the higher levels is the presence of inconsistent nodes that are introduced due to the arbitrary process of creating these hierarchies by domain experts. In this paper, we propose two different data-driven approaches (local and global) for hierarchical structure modification that identifies and flattens inconsistent nodes present within the hierarchy. Our extensive empirical evaluation of the proposed approaches on several image and text datasets with varying distribution of features, classes and training instances per class shows improved classification performance over competing hierarchical modification approaches. Specifically, we see an improvement upto 7% in Macro-F1 score with our approach over best TD baseline. SOURCE CODE: http://www.cs.gmu.edu/ mlbio/InconsistentNodeFlattening.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126706150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信