Proceedings of the 3rd IKDD Conference on Data Science, 2016最新文献

筛选
英文 中文
Query Classification using LDA Topic Model and Sparse Representation Based Classifier 基于LDA主题模型和稀疏表示分类器的查询分类
Proceedings of the 3rd IKDD Conference on Data Science, 2016 Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888474
Indrani Bhattacharya, J. Sil
{"title":"Query Classification using LDA Topic Model and Sparse Representation Based Classifier","authors":"Indrani Bhattacharya, J. Sil","doi":"10.1145/2888451.2888474","DOIUrl":"https://doi.org/10.1145/2888451.2888474","url":null,"abstract":"Users often seek for information by submitting query consisting of keywords may belong to multiple topics, representing overlapping concepts. Objective of the work is to classify the query into a topic class label by considering the query keywords distributed over various topics. The approach effectively reduces the search space in order to retrieve information computationally efficient way. First we apply Latent Dirichlet Allocation (LDA) on the entire corpus to group the documents into topics consisting of unique words. As a next step, a term vocabulary (TRV) has been built with unique words present in the topics. We develop a Topic-Vocabulary Matrix (TVM) by encoding the TRV with respect to each topic. The TVM expresses word distribution among the topics and presented as training data set, which is sparse. The query is encoded by the same way and submitted as test data. We apply sparse representation based classifier (SRC) to classify the query as a topic. The proposed approach shows satisfactory performance with 93% accuracy in classifying query.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123584802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Exploiting Local and Global Context In PPI networks For Efficient Protein Function Prediction 利用局部和全局背景在PPI网络有效的蛋白质功能预测
Proceedings of the 3rd IKDD Conference on Data Science, 2016 Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888461
D. S. Kumar, Siddharth Goyal, V. Reddy, Ramesh Loganathan
{"title":"Exploiting Local and Global Context In PPI networks For Efficient Protein Function Prediction","authors":"D. S. Kumar, Siddharth Goyal, V. Reddy, Ramesh Loganathan","doi":"10.1145/2888451.2888461","DOIUrl":"https://doi.org/10.1145/2888451.2888461","url":null,"abstract":"Protein-protein interaction (PPI) networks are valuable biological data source which contain rich information useful for protein function prediction. The PPI network data obtained from high-throughput experiments is known to be noisy and incomplete. In the literature, common neighbor, clustering, and classification-based approaches have been proposed to improve the performance of protein function prediction by modeling PPI data as a graph. These approaches exploit the fact that protein shares function with other proteins directly interacting with it. In this paper we have experimented an alternative approach by exploiting the notion that two proteins share a function if they have a well defined group of directly or indirectly connected common neighbors. The experiments conducted on variety of PPI network datasets show that the proposed approach improves protein function prediction accuracy over existing approaches.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126256251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling Spatio-temporal Change Pattern using Mathematical Morphology 基于数学形态学的时空变化模式建模
Proceedings of the 3rd IKDD Conference on Data Science, 2016 Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888458
Monidipa Das, S. Ghosh
{"title":"Modeling Spatio-temporal Change Pattern using Mathematical Morphology","authors":"Monidipa Das, S. Ghosh","doi":"10.1145/2888451.2888458","DOIUrl":"https://doi.org/10.1145/2888451.2888458","url":null,"abstract":"Detection and assessment of spatio-temporal change pattern is a challenging task, and may provide insights into various spatio-temporal changes, like urban sprawl monitoring, surveillance of epidemics due to infectious diseases etc. The existing spatio-temporal pattern mining techniques mostly deal with the assessment of thematic change patterns. However, analyzing the spatio-temporal pattern of geometric changes is also important for analyzing such kinds of spatial changes on a temporal scale. This paper presents a novel framework for modeling such spatio-temporal change in geometry with the help of mathematical morphology and directional granulometric analysis. Morphological operators have been used to detect the various spatio-temporal change patterns in geometry, like spatial growth (due to Expansion and Merge), spatial shrinkage (due to Contraction and Split) etc. Further, the temporal changes in the orientations of these patterns have been modeled by performing granulometric analyses on them. The proposed framework for spatio-temporal change pattern modeling has been validated considering four cases of spatio-temporal change, namely (i) spatial expansion, (ii) spatial contraction, (iii) spatial merge, and (iv) spatial split in regional distribution of climate zones in Australia.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134252030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Learning transition models of biological regulatory and signaling networks from noisy data 从噪声数据中学习生物调控和信号网络的过渡模型
Proceedings of the 3rd IKDD Conference on Data Science, 2016 Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888469
Deepika Vatsa, Sumeet Agarwal, A. Srinivasan
{"title":"Learning transition models of biological regulatory and signaling networks from noisy data","authors":"Deepika Vatsa, Sumeet Agarwal, A. Srinivasan","doi":"10.1145/2888451.2888469","DOIUrl":"https://doi.org/10.1145/2888451.2888469","url":null,"abstract":"In this paper, we present an extended 2-step probabilistic LGTS (PLGTS) transition system which aims to identify the network structure and stochastic nature of biological processes using time series data. This work is a step towards system identification in a noisy environment using transition systems. Here, the noise implies noise in transitions between states in the observed data. Interestingly, noise in the data helps in assisting system identification. Experimental results on synthetic data show that noise actually helps in understanding the system dynamics as well as constraining the solution space; thus helping to identify the most probable network structure for a given data set.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134341339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Scalable Quick Reduct Algorithm: Iterative MapReduce Approach 可伸缩快速约简算法:迭代MapReduce方法
Proceedings of the 3rd IKDD Conference on Data Science, 2016 Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888476
P. Singh, P. Prasad
{"title":"Scalable Quick Reduct Algorithm: Iterative MapReduce Approach","authors":"P. Singh, P. Prasad","doi":"10.1145/2888451.2888476","DOIUrl":"https://doi.org/10.1145/2888451.2888476","url":null,"abstract":"Feature selection by reduct computation is the key technique for knowledge acquistion using rough set theory. Existing MapReduce based reduct algorithms use Hadoop Map Reduce framework, which is not suitable for iterative algorithms. Paper aims to design and implementation of Iterative MapReduce based Quick reduct algorithm using Twister framework. The proposed In_MRQRA Algorithm has partial granular level computations at mappers and granular computations at reducer. Experimental analysis on KDD-Cup99 dataset empirically established the relevence of proposed approach.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128549862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Weighted Linear Loss Twin Support Vector Clustering 加权线性损失双支持向量聚类
Proceedings of the 3rd IKDD Conference on Data Science, 2016 Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888467
Reshma Khemchandani, Aman Pal
{"title":"Weighted Linear Loss Twin Support Vector Clustering","authors":"Reshma Khemchandani, Aman Pal","doi":"10.1145/2888451.2888467","DOIUrl":"https://doi.org/10.1145/2888451.2888467","url":null,"abstract":"Traditional point based clustering methods such as k-means [1], k-median [2], etc. work by partitioning the data into clusters based on the cluster prototype points. These methods perform poorly in case when data is not distributed around several cluster points. In contrast to these, plane based clustering methods such as k-plane clustering [3], local k-proximal plane clustering [4], etc. have been proposed in literature. These methods calculate k cluster center planes and partition the data into k clusters according to the proximity of the datapoints with these k planes. Working on the lines of [5], in this paper, we have presented a Weighted Linear Loss Twin Support Vector Clustering termed as WLL-TWSVC for clustering problems. By introducing the weighted linear loss in the formulation of TWSVC leads to solving system of linear equations with lower computational cost as opposed to solving series of quadratic programming problems along with system of linear equations as in TWSVC. We have also introduces a regularization term in the objective function which takes care of structural risk component along with empirical risk.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129951344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Investigating the Potential of Aggregated Tweets as Surrogate Data for Forecasting Civil Protests 调查汇总tweet作为预测民间抗议替代数据的潜力
Proceedings of the 3rd IKDD Conference on Data Science, 2016 Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888466
Swati Agarwal, A. Sureka
{"title":"Investigating the Potential of Aggregated Tweets as Surrogate Data for Forecasting Civil Protests","authors":"Swati Agarwal, A. Sureka","doi":"10.1145/2888451.2888466","DOIUrl":"https://doi.org/10.1145/2888451.2888466","url":null,"abstract":"Online Micro-blogging Social Media websites like Twitter are being used as a real-time platform for information sharing and communication during planning and mobilization of civil unrest events. We conduct a study of more than 1.5 million English Tweets spanning 5 months on the topic of Immigration and found evidences of Twitter being used as a platform for planning and mobilization of protests and civil disobedience related demonstrations. We believe that Twitter data can be used as a surrogate and open-source precursor for forecasting civil unrest and investigate Machine Learning based techniques for building a prediction model. We present our solution approach consisting of various components such as named entity recognition (temporal, spatial location, people expressions extraction), semantic enrichment of events related tweets (crowd-buzz & commentary and mobilization & planning) location-time-topic correlation miner. We conduct a series of experiments on a real-world and large dataset and investigate the application of trend analysis. We conduct two case studies on civil unrest related events and demonstrate the effectiveness of our approach.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129978919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Mining Multi-source Data to Study Workplace Activity Patterns 挖掘多源数据研究工作场所活动模式
Proceedings of the 3rd IKDD Conference on Data Science, 2016 Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888470
Sachin Patel, Ravi Mahamuni, Meghendra Singh, David Clarance, Mayuri Duggirala, Shivani Sharma, Vinay Katiyar, Gauri Deshpande, Amruta Deshmukh, Vaibhav, Vivek Balaraman
{"title":"Mining Multi-source Data to Study Workplace Activity Patterns","authors":"Sachin Patel, Ravi Mahamuni, Meghendra Singh, David Clarance, Mayuri Duggirala, Shivani Sharma, Vinay Katiyar, Gauri Deshpande, Amruta Deshmukh, Vaibhav, Vivek Balaraman","doi":"10.1145/2888451.2888470","DOIUrl":"https://doi.org/10.1145/2888451.2888470","url":null,"abstract":"Examining work activity patterns is a problem of enduring research in organizations. The fortuitous availability of a whole new set of data collection mechanisms such as mobiles, activity loggers, GPS based location detectors, provide us new ways of studying workplace behaviour. We present a data collection framework that helps in collection, anonymization, fusion, processing and mining of behavioural data. We use the framework to study the activities in a research and development team with an aim to find the relationship between behavioural traits, states, and activity patterns. We find partial support for the claim that behavioral states and activity patterns are associated.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125107479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trustworthiness of t-Distributed Stochastic Neighbour Embedding t分布随机邻居嵌入的可信度
Proceedings of the 3rd IKDD Conference on Data Science, 2016 Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888465
Shishir Pandey, R. Vaze
{"title":"Trustworthiness of t-Distributed Stochastic Neighbour Embedding","authors":"Shishir Pandey, R. Vaze","doi":"10.1145/2888451.2888465","DOIUrl":"https://doi.org/10.1145/2888451.2888465","url":null,"abstract":"A well known technique for embedding high dimensional objects in two or three dimensional space is the t-distributed stochastic neighbour embedding (t-SNE). The t-SNE minimizes the Kullback-Liebler (KL) divergence between two probability distributions, one induced on points in the high dimensional space and the other induced on points in the low dimensional embedding space. In this work, we consider a more general framework of using Rényi divergence which is parametrized by the order α, the KL-divergence is a special case when α → 1.We study how various Rényi divergences perform when compared to the KL-divergence. We show that in terms of the metrics of trustworthiness and neighbourhood preservation, the embedding becomes better as Rényi divergence approaches the KL-divergence.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134092287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SocialStories: Segmenting Stories within Trending Twitter Topics SocialStories:在Twitter热门话题中分割故事
Proceedings of the 3rd IKDD Conference on Data Science, 2016 Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888453
Kokil Jaidka, Kaushik Ramachandran, Prakhar Gupta, Sajal Rustagi
{"title":"SocialStories: Segmenting Stories within Trending Twitter Topics","authors":"Kokil Jaidka, Kaushik Ramachandran, Prakhar Gupta, Sajal Rustagi","doi":"10.1145/2888451.2888453","DOIUrl":"https://doi.org/10.1145/2888451.2888453","url":null,"abstract":"This study present SocialStories - a system based on incremental clustering for streaming tweets, for identifying fine-grained stories within a broader trending topic on Twitter. The contributions include a novel tf-metric, called the inverse cluster frequency, and a decay weighting for entities. We present our experiments on 0.19 million tweets posted in June 2014, revolving around the mentions of a software brand before, during and after a marketing conference and a software release. The novelty of our work is the text-based similarity calculation metrics, including a new similarity metric, called the inverse cluster frequency, and time-specific metrics that allow for the decay of old entities with the passage of time and preserve the homogeneity and the freshness of stories. We report improved performance and higher recall of 80%, against the gold standard (posthoc journalistic reports), as compared to LDA-, and Wavelet-based systems. Our algorithm is able to cluster 80% of all tweets into story-based clusters, which are 86% pure. It also enables earlier detection of trending stories than manual reports, and is far more accurate in identifying fine-grained stories within sub-topics as compared to baseline systems.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125379682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信