2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)最新文献_第8页

Connecting Opinions to Opinion-Leaders: A Case Study on Brazilian Political Protests 将意见与意见领袖联系起来:巴西政治抗议的案例研究

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-01 DOI: 10.1109/DSAA.2016.77

L. Rocha, Fernando Mourão, Ramon Vieira, A. Neves, D. Carvalho, Bortik Bandyopadhyay, S. Parthasarathy, R. Ferreira

{"title":"Connecting Opinions to Opinion-Leaders: A Case Study on Brazilian Political Protests","authors":"L. Rocha, Fernando Mourão, Ramon Vieira, A. Neves, D. Carvalho, Bortik Bandyopadhyay, S. Parthasarathy, R. Ferreira","doi":"10.1109/DSAA.2016.77","DOIUrl":"https://doi.org/10.1109/DSAA.2016.77","url":null,"abstract":"Social media applications have assumed an important role in decision-making process of users, affecting their choices about products and services. In this context, understanding and modeling opinions, as well as opinion-leaders, have implications for several tasks, such as recommendation, advertising, brand evaluation etc. Despite the intrinsic relation between opinions and opinion-leaders, most recent works focus exclusively on either understanding the opinions, by Sentiment Analysis (SA) proposals, or identifying opinion-leaders using Influential Users Detection (IUD). This paper presents a preliminary evaluation about a combined analysis of SA and IUD. In this sense, we propose a methodology to quantify factors in real domains that may affect such analysis, as well as the potential benefits of combining SA Methods with IUD ones. Empirical assessments on a sample of tweets about the Brazilian president reveal that the collective opinion and the set of top opinion-leaders over time are inter-related. Further, we were able to identify distinct characteristics of opinion propagation, and that the collective opinion may be accurately estimated by using a few top-k opinion-leaders. These results point out the combined analysis of SA and IUD as a promising research direction to be further exploited.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124961425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Disease Detection and Severity Estimation in Cotton Plant from Unconstrained Images 基于无约束图像的棉花病害检测与严重程度估计

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-01 DOI: 10.1109/DSAA.2016.81

Aditya Parikh, M. Raval, Chandrasinh Parmar, S. Chaudhary

引用次数: 52

Mining Research Problems from Scientific Literature 从科学文献中挖掘研究问题

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-01 DOI: 10.1109/DSAA.2016.44

Chanakya Aalla, Vikram Pudi

{"title":"Mining Research Problems from Scientific Literature","authors":"Chanakya Aalla, Vikram Pudi","doi":"10.1109/DSAA.2016.44","DOIUrl":"https://doi.org/10.1109/DSAA.2016.44","url":null,"abstract":"Extracting structured information from unstructured text is a critical problem. Over the past few years, various clustering algorithms have been proposed to solve this problem. In addition, various algorithms based on probabilistic topic models have been developed to find the hidden thematic structure from various corpora (i.e publications, blogs etc). Both types of algorithms have been transferred to the domain of scientific literature to extract structured information to solve problems like data exploration, expert detection etc. In order to remain domain-agnostic, these algorithms do not exploit the structure present in a scientific publication. Majority of researchers interpret a scientific publication as research conducted to report progress in solving some research problems. Following this interpretation, in this paper we present a different outlook to the same problem by modelling scientific publications around research problems. By associating a scientific publication with a research problem, exploring the scientific literature becomes more intuitive. In this paper, we propose an unsupervised framework to mine research problems from titles and abstracts of scientific literature. Our framework uses weighted frequent phrase mining to generate phrases and filters them to obtain high-quality phrases. These high-quality phrases are then used to segment the scientific publication into meaningful semantic units. After segmenting publications, we apply a number of heuristics to score the phrases and sentences to identify the research problems. In a postprocessing step we use a neighborhood based algorithm to merge different representations of the same problems. Experiments conducted on parts of DBLP dataset show promising results.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131662117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Waiting to Be Sold: Prediction of Time-Dependent House Selling Probability 待售:随时间变化的房屋销售概率预测

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-01 DOI: 10.1109/DSAA.2016.58

Mansurul Bhuiyan, M. Hasan

{"title":"Waiting to Be Sold: Prediction of Time-Dependent House Selling Probability","authors":"Mansurul Bhuiyan, M. Hasan","doi":"10.1109/DSAA.2016.58","DOIUrl":"https://doi.org/10.1109/DSAA.2016.58","url":null,"abstract":"Buying or selling a house is one of the important decisions in a person's life. Online listing websites like \"zillow.com\", \"trulia.com\", and \"realtor.com\" etc. provide significant and effective assistance during the buy/sell process. However, they fail to supply one important information of a house that is, approximately how long will it take for a house to be sold after it first appears in the listing? This information is equally important for both a potential buyer and the seller. With this information the seller will have an understanding of what she can do to expedite the sale, i.e. reduce the asking price, renovate/remodel some home features, etc. On the other hand, a potential buyer will have an idea of the available time for her to react i.e. to place an offer. In this work, we propose a supervised regression (Cox regression) model inspired by survival analysis to predict the sale probability of a house given historical home sale information within an observation time window. We use real-life housing data collected from \"trulia.com\" to validate the proposed prediction algorithm and show its superior performance over traditional regression methods. We also show how the sale probability of a house is influenced by the values of basic house features, such as price, size, # of bedrooms, # of bathrooms, and school quality.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132590488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

A Decision Tree-Based Approach for Categorizing Spatial Database Query Results 基于决策树的空间数据库查询结果分类方法

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-01 DOI: 10.1109/DSAA.2016.50

Xiangfu Meng, Xiaoyan Zhang, Jinguang Sun, Lin Li, Changzheng Xing, Chongchun Bi

{"title":"A Decision Tree-Based Approach for Categorizing Spatial Database Query Results","authors":"Xiangfu Meng, Xiaoyan Zhang, Jinguang Sun, Lin Li, Changzheng Xing, Chongchun Bi","doi":"10.1109/DSAA.2016.50","DOIUrl":"https://doi.org/10.1109/DSAA.2016.50","url":null,"abstract":"Spatial database queries are often exploratory. The users often find that their queries return too many answers and many of them may be irrelevant. Based on the coupling relationships between spatial objects, this paper proposes a novel categorization approach which consists of two steps. The first step analyzes the spatial object coupling relationship by considering the location proximity and semantic similarity between spatial objects, and then a set of clusters over the spatial objects can be generated, where each cluster represents one type of user need. When a user issues a spatial query, the second step presents to the user a category tree which is generated by using modified C4.5 decision tree algorithm over the clusters such that the user can easily select the subset of query results matching his/her needs by exploring the labels assigned on intermediate nodes of the tree. The experiments demonstrate that our spatial object clustering method can efficiently capture both the semantic and location correlations between spatial objects. The effectiveness and efficiency of the categorization algorithm is also demonstrated.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127127376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Closest Interval Join Using MapReduce 使用MapReduce进行最近间隔连接

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-01 DOI: 10.1109/DSAA.2016.39

Qiang Zhang, Andy He, Chris Liu, Eric Lo

引用次数: 4

Label, Segment, Featurize: A Cross Domain Framework for Prediction Engineering 标签、分段、特征:预测工程的跨领域框架

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-01 DOI: 10.1109/DSAA.2016.54

James Max Kanter, O. Gillespie, K. Veeramachaneni

引用次数: 20

The Semantic Knowledge Graph: A Compact, Auto-Generated Model for Real-Time Traversal and Ranking of any Relationship within a Domain 语义知识图:一个紧凑的，自动生成的模型，用于实时遍历和排序领域内的任何关系

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-09-02 DOI: 10.1109/DSAA.2016.51

Trey Grainger, Khalifeh AlJadda, M. Korayem, Andries Smith

{"title":"The Semantic Knowledge Graph: A Compact, Auto-Generated Model for Real-Time Traversal and Ranking of any Relationship within a Domain","authors":"Trey Grainger, Khalifeh AlJadda, M. Korayem, Andries Smith","doi":"10.1109/DSAA.2016.51","DOIUrl":"https://doi.org/10.1109/DSAA.2016.51","url":null,"abstract":"This paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between each pair of nodes and their corresponding edge, enabling edges to materialize dynamically from underlying corpus statistics. As a result, any combination of nodes can have edges to any other nodes materialize and be scored to reveal latent relationships between the nodes. This provides numerous benefits: the knowledge graph can be built automatically from a real-world corpus of data, new nodes - along with their combined edges - can be instantly materialized from any arbitrary combination of preexisting nodes (using set operations), and a full model of the semantic relationships between all entities within a domain can be represented and dynamically traversed using a highly compact representation of the graph. Such a system has widespread applications in areas as diverse as knowledge modeling and reasoning, natural language processing, anomaly detection, data cleansing, semantic search, analytics, data classification, root cause analysis, and recommendations systems. The main contribution of this paper is the introduction of a novel system - the Semantic Knowledge Graph - which is able to dynamically discover and score interesting relationships between any arbitrary combination of entities (words, phrases, or extracted concepts) through dynamically materializing nodes and edges from a compact graphical representation built automatically from a corpus of data representative of a knowledge domain. The source code for our Semantic Knowledge Graph implementation is being published along with this paper to facilitate further research and extensions of this work.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126771791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Learning Temporal Dependence from Time-Series Data with Latent Variables 从具有潜在变量的时间序列数据中学习时间依赖性

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-08-27 DOI: 10.1109/DSAA.2016.34

Hossein Hosseini, Sreeram Kannan, Baosen Zhang, R. Poovendran

引用次数: 4

Limiting the Diffusion of Information by a Selective PageRank-Preserving Approach 限制信息扩散的选择性PageRank-Preserving方法

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-07-31 DOI: 10.1109/DSAA.2016.16

G. Loukides, Robert Gwadera

{"title":"Limiting the Diffusion of Information by a Selective PageRank-Preserving Approach","authors":"G. Loukides, Robert Gwadera","doi":"10.1109/DSAA.2016.16","DOIUrl":"https://doi.org/10.1109/DSAA.2016.16","url":null,"abstract":"The problem of limiting the diffusion of information in social networks has received substantial attention. To deal with the problem, existing works aim to prevent the diffusion of information to as many nodes as possible, by deleting a given number of edges. Thus, they assume that the diffusing information can affect all nodes and that the deletion of each edge has the same impact on the information propagation properties of the graph. In this work, we propose an approach which lifts these limiting assumptions. Our approach allows specifying the nodes to which information diffusion should be prevented and their maximum allowable activation probability, and it performs edge deletion while avoiding drastic changes to the ability of the network to propagate information. To realize our approach, we propose a measure that captures changes, caused by deletion, to the PageRank distribution of the graph. Based on the measure, we define the problem of finding an edge subset to delete as an optimization problem. We show that the problem can be modeled as a Submodular Set Cover (SSC) problem and design an approximation algorithm, based on the well-known approximation algorithm for SSC. In addition, we develop an iterative heuristic that has similar effectiveness but is significantly more efficient than our algorithm. Experiments on real and synthetic data show the effectiveness and efficiency of our methods.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127224401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1