2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)最新文献

筛选
英文 中文
Connecting Opinions to Opinion-Leaders: A Case Study on Brazilian Political Protests 将意见与意见领袖联系起来:巴西政治抗议的案例研究
L. Rocha, Fernando Mourão, Ramon Vieira, A. Neves, D. Carvalho, Bortik Bandyopadhyay, S. Parthasarathy, R. Ferreira
{"title":"Connecting Opinions to Opinion-Leaders: A Case Study on Brazilian Political Protests","authors":"L. Rocha, Fernando Mourão, Ramon Vieira, A. Neves, D. Carvalho, Bortik Bandyopadhyay, S. Parthasarathy, R. Ferreira","doi":"10.1109/DSAA.2016.77","DOIUrl":"https://doi.org/10.1109/DSAA.2016.77","url":null,"abstract":"Social media applications have assumed an important role in decision-making process of users, affecting their choices about products and services. In this context, understanding and modeling opinions, as well as opinion-leaders, have implications for several tasks, such as recommendation, advertising, brand evaluation etc. Despite the intrinsic relation between opinions and opinion-leaders, most recent works focus exclusively on either understanding the opinions, by Sentiment Analysis (SA) proposals, or identifying opinion-leaders using Influential Users Detection (IUD). This paper presents a preliminary evaluation about a combined analysis of SA and IUD. In this sense, we propose a methodology to quantify factors in real domains that may affect such analysis, as well as the potential benefits of combining SA Methods with IUD ones. Empirical assessments on a sample of tweets about the Brazilian president reveal that the collective opinion and the set of top opinion-leaders over time are inter-related. Further, we were able to identify distinct characteristics of opinion propagation, and that the collective opinion may be accurately estimated by using a few top-k opinion-leaders. These results point out the combined analysis of SA and IUD as a promising research direction to be further exploited.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124961425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Disease Detection and Severity Estimation in Cotton Plant from Unconstrained Images 基于无约束图像的棉花病害检测与严重程度估计
Aditya Parikh, M. Raval, Chandrasinh Parmar, S. Chaudhary
{"title":"Disease Detection and Severity Estimation in Cotton Plant from Unconstrained Images","authors":"Aditya Parikh, M. Raval, Chandrasinh Parmar, S. Chaudhary","doi":"10.1109/DSAA.2016.81","DOIUrl":"https://doi.org/10.1109/DSAA.2016.81","url":null,"abstract":"The primary focus of this paper is to detect disease and estimate its stage for a cotton plant using images. Most disease symptoms are reflected on the cotton leaf. Unlike earlier approaches, the novelty of the proposal lies in processing images captured under uncontrolled conditions in the field using normal or a mobile phone camera by an untrained person. Such field images have a cluttered background making leaf segmentation very challenging. The proposed work use two cascaded classifiers. Using local statistical features, first classifier segments leaf from the background. Then using hue and luminance from HSV colour space another classifier is trained to detect disease and find its stage. The developed algorithm is a generalised as it can be applied for any disease. However as a showcase, we detect Grey Mildew, widely prevalent fungal disease in North Gujarat, India.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122038660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Mining Research Problems from Scientific Literature 从科学文献中挖掘研究问题
Chanakya Aalla, Vikram Pudi
{"title":"Mining Research Problems from Scientific Literature","authors":"Chanakya Aalla, Vikram Pudi","doi":"10.1109/DSAA.2016.44","DOIUrl":"https://doi.org/10.1109/DSAA.2016.44","url":null,"abstract":"Extracting structured information from unstructured text is a critical problem. Over the past few years, various clustering algorithms have been proposed to solve this problem. In addition, various algorithms based on probabilistic topic models have been developed to find the hidden thematic structure from various corpora (i.e publications, blogs etc). Both types of algorithms have been transferred to the domain of scientific literature to extract structured information to solve problems like data exploration, expert detection etc. In order to remain domain-agnostic, these algorithms do not exploit the structure present in a scientific publication. Majority of researchers interpret a scientific publication as research conducted to report progress in solving some research problems. Following this interpretation, in this paper we present a different outlook to the same problem by modelling scientific publications around research problems. By associating a scientific publication with a research problem, exploring the scientific literature becomes more intuitive. In this paper, we propose an unsupervised framework to mine research problems from titles and abstracts of scientific literature. Our framework uses weighted frequent phrase mining to generate phrases and filters them to obtain high-quality phrases. These high-quality phrases are then used to segment the scientific publication into meaningful semantic units. After segmenting publications, we apply a number of heuristics to score the phrases and sentences to identify the research problems. In a postprocessing step we use a neighborhood based algorithm to merge different representations of the same problems. Experiments conducted on parts of DBLP dataset show promising results.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131662117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Waiting to Be Sold: Prediction of Time-Dependent House Selling Probability 待售:随时间变化的房屋销售概率预测
Mansurul Bhuiyan, M. Hasan
{"title":"Waiting to Be Sold: Prediction of Time-Dependent House Selling Probability","authors":"Mansurul Bhuiyan, M. Hasan","doi":"10.1109/DSAA.2016.58","DOIUrl":"https://doi.org/10.1109/DSAA.2016.58","url":null,"abstract":"Buying or selling a house is one of the important decisions in a person's life. Online listing websites like \"zillow.com\", \"trulia.com\", and \"realtor.com\" etc. provide significant and effective assistance during the buy/sell process. However, they fail to supply one important information of a house that is, approximately how long will it take for a house to be sold after it first appears in the listing? This information is equally important for both a potential buyer and the seller. With this information the seller will have an understanding of what she can do to expedite the sale, i.e. reduce the asking price, renovate/remodel some home features, etc. On the other hand, a potential buyer will have an idea of the available time for her to react i.e. to place an offer. In this work, we propose a supervised regression (Cox regression) model inspired by survival analysis to predict the sale probability of a house given historical home sale information within an observation time window. We use real-life housing data collected from \"trulia.com\" to validate the proposed prediction algorithm and show its superior performance over traditional regression methods. We also show how the sale probability of a house is influenced by the values of basic house features, such as price, size, # of bedrooms, # of bathrooms, and school quality.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132590488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Decision Tree-Based Approach for Categorizing Spatial Database Query Results 基于决策树的空间数据库查询结果分类方法
Xiangfu Meng, Xiaoyan Zhang, Jinguang Sun, Lin Li, Changzheng Xing, Chongchun Bi
{"title":"A Decision Tree-Based Approach for Categorizing Spatial Database Query Results","authors":"Xiangfu Meng, Xiaoyan Zhang, Jinguang Sun, Lin Li, Changzheng Xing, Chongchun Bi","doi":"10.1109/DSAA.2016.50","DOIUrl":"https://doi.org/10.1109/DSAA.2016.50","url":null,"abstract":"Spatial database queries are often exploratory. The users often find that their queries return too many answers and many of them may be irrelevant. Based on the coupling relationships between spatial objects, this paper proposes a novel categorization approach which consists of two steps. The first step analyzes the spatial object coupling relationship by considering the location proximity and semantic similarity between spatial objects, and then a set of clusters over the spatial objects can be generated, where each cluster represents one type of user need. When a user issues a spatial query, the second step presents to the user a category tree which is generated by using modified C4.5 decision tree algorithm over the clusters such that the user can easily select the subset of query results matching his/her needs by exploring the labels assigned on intermediate nodes of the tree. The experiments demonstrate that our spatial object clustering method can efficiently capture both the semantic and location correlations between spatial objects. The effectiveness and efficiency of the categorization algorithm is also demonstrated.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127127376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Closest Interval Join Using MapReduce 使用MapReduce进行最近间隔连接
Qiang Zhang, Andy He, Chris Liu, Eric Lo
{"title":"Closest Interval Join Using MapReduce","authors":"Qiang Zhang, Andy He, Chris Liu, Eric Lo","doi":"10.1109/DSAA.2016.39","DOIUrl":"https://doi.org/10.1109/DSAA.2016.39","url":null,"abstract":"The closest interval join problem is to find all the closest intervals between two interval sets R and S. Applications of closest interval join include bioinformatics and other data science. Interval data can be very large and continue to increase in size due to the advancement of data acquisition technology. In this paper, we present efficient MapReduce algorithms to compute closest interval join. Experiments based on both real and synthetic interval data demonstrated that our algorithms are efficient.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129780136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Label, Segment, Featurize: A Cross Domain Framework for Prediction Engineering 标签、分段、特征:预测工程的跨领域框架
James Max Kanter, O. Gillespie, K. Veeramachaneni
{"title":"Label, Segment, Featurize: A Cross Domain Framework for Prediction Engineering","authors":"James Max Kanter, O. Gillespie, K. Veeramachaneni","doi":"10.1109/DSAA.2016.54","DOIUrl":"https://doi.org/10.1109/DSAA.2016.54","url":null,"abstract":"In this paper, we introduce \"prediction engineering\" as a formal step in the predictive modeling process. We define a generalizable 3 part framework — Label, Segment, Featurize (L-S-F) — to address the growing demand for predictive models. The framework provides abstractions for data scientists to customize the process to unique prediction problems. We describe how to apply the L-S-F framework to characteristic problems in 2 domains and demonstrate an implementation over 5 unique prediction problems defined on a dataset of crowdfunding projects from DonorsChoose.org. The results demonstrate how the L-S-F framework complements existing tools to allow us to rapidly build and evaluate 26 distinct predictive models. L-S-F enables development of models that provide value to all parties involved (donors, teachers, and people running the platform).","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121389870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
The Semantic Knowledge Graph: A Compact, Auto-Generated Model for Real-Time Traversal and Ranking of any Relationship within a Domain 语义知识图:一个紧凑的,自动生成的模型,用于实时遍历和排序领域内的任何关系
Trey Grainger, Khalifeh AlJadda, M. Korayem, Andries Smith
{"title":"The Semantic Knowledge Graph: A Compact, Auto-Generated Model for Real-Time Traversal and Ranking of any Relationship within a Domain","authors":"Trey Grainger, Khalifeh AlJadda, M. Korayem, Andries Smith","doi":"10.1109/DSAA.2016.51","DOIUrl":"https://doi.org/10.1109/DSAA.2016.51","url":null,"abstract":"This paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between each pair of nodes and their corresponding edge, enabling edges to materialize dynamically from underlying corpus statistics. As a result, any combination of nodes can have edges to any other nodes materialize and be scored to reveal latent relationships between the nodes. This provides numerous benefits: the knowledge graph can be built automatically from a real-world corpus of data, new nodes - along with their combined edges - can be instantly materialized from any arbitrary combination of preexisting nodes (using set operations), and a full model of the semantic relationships between all entities within a domain can be represented and dynamically traversed using a highly compact representation of the graph. Such a system has widespread applications in areas as diverse as knowledge modeling and reasoning, natural language processing, anomaly detection, data cleansing, semantic search, analytics, data classification, root cause analysis, and recommendations systems. The main contribution of this paper is the introduction of a novel system - the Semantic Knowledge Graph - which is able to dynamically discover and score interesting relationships between any arbitrary combination of entities (words, phrases, or extracted concepts) through dynamically materializing nodes and edges from a compact graphical representation built automatically from a corpus of data representative of a knowledge domain. The source code for our Semantic Knowledge Graph implementation is being published along with this paper to facilitate further research and extensions of this work.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126771791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Learning Temporal Dependence from Time-Series Data with Latent Variables 从具有潜在变量的时间序列数据中学习时间依赖性
Hossein Hosseini, Sreeram Kannan, Baosen Zhang, R. Poovendran
{"title":"Learning Temporal Dependence from Time-Series Data with Latent Variables","authors":"Hossein Hosseini, Sreeram Kannan, Baosen Zhang, R. Poovendran","doi":"10.1109/DSAA.2016.34","DOIUrl":"https://doi.org/10.1109/DSAA.2016.34","url":null,"abstract":"We consider the setting where a collection of time series, modeled as random processes, evolve in a causal manner, and one is interested in learning the graph governing the relationships of these processes. A special case of wide interest and applicability is the setting where the noise is Gaussian and relationships are Markov and linear. We study this setting with two additional features: firstly, each random process has a hidden (latent) state, which we use to model the internal memory possessed by the variables (similar to hidden Markov models). Secondly, each variable can depend on its latent memory state through a random lag (rather than a fixed lag), thus modeling memory recall with differing lags at distinct times. Under this setting, we develop an estimator and prove that under a genericity assumption, the parameters of the model can be learned consistently. We also propose a practical adaption of this estimator, which demonstrates significant performance gains in both synthetic and real-world datasets.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129544363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Limiting the Diffusion of Information by a Selective PageRank-Preserving Approach 限制信息扩散的选择性PageRank-Preserving方法
G. Loukides, Robert Gwadera
{"title":"Limiting the Diffusion of Information by a Selective PageRank-Preserving Approach","authors":"G. Loukides, Robert Gwadera","doi":"10.1109/DSAA.2016.16","DOIUrl":"https://doi.org/10.1109/DSAA.2016.16","url":null,"abstract":"The problem of limiting the diffusion of information in social networks has received substantial attention. To deal with the problem, existing works aim to prevent the diffusion of information to as many nodes as possible, by deleting a given number of edges. Thus, they assume that the diffusing information can affect all nodes and that the deletion of each edge has the same impact on the information propagation properties of the graph. In this work, we propose an approach which lifts these limiting assumptions. Our approach allows specifying the nodes to which information diffusion should be prevented and their maximum allowable activation probability, and it performs edge deletion while avoiding drastic changes to the ability of the network to propagate information. To realize our approach, we propose a measure that captures changes, caused by deletion, to the PageRank distribution of the graph. Based on the measure, we define the problem of finding an edge subset to delete as an optimization problem. We show that the problem can be modeled as a Submodular Set Cover (SSC) problem and design an approximation algorithm, based on the well-known approximation algorithm for SSC. In addition, we develop an iterative heuristic that has similar effectiveness but is significantly more efficient than our algorithm. Experiments on real and synthetic data show the effectiveness and efficiency of our methods.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127224401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信