Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics最新文献

筛选
英文 中文
Augmenting MATLAB with semantic objects for an interactive visual environment 增强MATLAB与语义对象的交互式视觉环境
C. Lee, J. Choo, Duen Horng Chau, Haesun Park
{"title":"Augmenting MATLAB with semantic objects for an interactive visual environment","authors":"C. Lee, J. Choo, Duen Horng Chau, Haesun Park","doi":"10.1145/2501511.2501521","DOIUrl":"https://doi.org/10.1145/2501511.2501521","url":null,"abstract":"Analysis tools such as Matlab, R, and SAS support a myriad of built-in computational functions and various standard visualization techniques. However, most of them provide little interaction from visualizations mainly due to the fact that the tools treat the data as just numerical vectors or matrices while ignoring any semantic meaning associated with them. To solve this limitation, we augment Matlab, one of the widely used data analysis tools, with the capability of directly handling the underlying semantic objects and their meanings. Such capabilities allow users to flexibly assign essential interaction capabilities, such as brushing-and-linking and details-on-demand interactions, to visualizations. To demonstrate the capabilities, two usage scenarios in document and graph analysis domains are presented.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130595119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards anytime active learning: interrupting experts to reduce annotation costs 随时主动学习:打断专家,降低注释成本
M. E. Ramirez-Loaiza, A. Culotta, M. Bilgic
{"title":"Towards anytime active learning: interrupting experts to reduce annotation costs","authors":"M. E. Ramirez-Loaiza, A. Culotta, M. Bilgic","doi":"10.1145/2501511.2501524","DOIUrl":"https://doi.org/10.1145/2501511.2501524","url":null,"abstract":"Many active learning methods use annotation cost or expert quality as part of their framework to select the best data for annotation. While these methods model expert quality, availability, or expertise, they have no direct influence on any of these elements. We present a novel framework built upon decision-theoretic active learning that allows the learner to directly control label quality by allocating a time budget to each annotation. We show that our method is able to improve performance efficiency of the active learner through an interruption mechanism trading off the induced error with the cost of annotation. Our simulation experiments on three document classification tasks show that some interruption is almost always better than none, but that the optimal interruption time varies by dataset.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"79 2-3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123453999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Storygraph: extracting patterns from spatio-temporal data 故事图:从时空数据中提取模式
Ayush Shrestha, B. Miller, Ying Zhu, Yi Zhao
{"title":"Storygraph: extracting patterns from spatio-temporal data","authors":"Ayush Shrestha, B. Miller, Ying Zhu, Yi Zhao","doi":"10.1145/2501511.2501525","DOIUrl":"https://doi.org/10.1145/2501511.2501525","url":null,"abstract":"Analysis of spatio-temporal data often involves correlating different events in time and location to uncover relationships between them. It is also desirable to identify different patterns in the data. Visualizing time and space in the same chart is not trivial. Common methods includes plotting the latitude, longitude and time as three dimensions of a 3D chart. Drawbacks of these 3D charts include not being able to scale well due to cluttering, occlusion and difficulty to track time in case of clustered events. In this paper we present a novel 2D visualization technique called Storygraph which provides an integrated view of time and location to address these issues. We also present storylines based on Storygraph which show movement of the actors over time. Lastly, we present case studies to show the applications of Storygraph.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125297727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Lytic: synthesizing high-dimensional algorithmic analysis with domain-agnostic, faceted visual analytics 分析:综合高维算法分析与领域不可知论,面可视化分析
Edward Clarkson, J. Choo, John Turgeson, R. Decuir, Haesun Park
{"title":"Lytic: synthesizing high-dimensional algorithmic analysis with domain-agnostic, faceted visual analytics","authors":"Edward Clarkson, J. Choo, John Turgeson, R. Decuir, Haesun Park","doi":"10.1145/2501511.2501518","DOIUrl":"https://doi.org/10.1145/2501511.2501518","url":null,"abstract":"We present Lytic, a domain-independent, faceted visual analytic (VA) system for interactive exploration of large datasets. It combines a flexible UI that adapts to arbitrary character-separated value (CSV) datasets with algorithmic preprocessing to compute unsupervised dimension reduction and cluster data from high-dimensional fields. It provides a variety of visualization options that require minimal user effort to configure and a consistent user experience between visualization types and underlying datasets. Filtering, comparison and visualization operations work in concert, allowing users to hop seamlessly between actions and pursue answers to expected and unexpected data hypotheses.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116600206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Zips: mining compressing sequential patterns in streams 压缩:挖掘压缩流中的顺序模式
Hoang Thanh Lam, T. Calders, Jie Yang, F. Mörchen, Dmitriy Fradkin
{"title":"Zips: mining compressing sequential patterns in streams","authors":"Hoang Thanh Lam, T. Calders, Jie Yang, F. Mörchen, Dmitriy Fradkin","doi":"10.1145/2501511.2501520","DOIUrl":"https://doi.org/10.1145/2501511.2501520","url":null,"abstract":"We propose a streaming algorithm, based on the minimal description length (MDL) principle, for extracting non-redundant sequential patterns. For static databases, the MDL-based approach that selects patterns based on their capacity to compress data rather than their frequency, was shown to be remarkably effective for extracting meaningful patterns and solving the redundancy issue in frequent itemset and sequence mining. The existing MDL-based algorithms, however, either start from a seed set of frequent patterns, or require multiple passes through the data. As such, the existing approaches scale poorly and are unsuitable for large datasets. Therefore, our main contribution is the proposal of a new, streaming algorithm, called Zips, that does not require a seed set of patterns and requires only one scan over the data. For Zips, we extended the Lempel-Ziv (LZ) compression algorithm in three ways: first, whereas LZ assigns codes uniformly as it builds up its dictionary while scanning the input, Zips assigns codewords according to the usage of the dictionary words; more heaviliy used words get shorter code-lengths. Secondly, Zips exploits also non-consecutive occurences of dictionary words for compression. And, third, the well-known space-saving algorithm is used to evict unpromising words from the dictionary. Experiments on one synthetic and two real-world large-scale datasets show that our approach extracts meaningful compressing patterns with similar quality to the state-of-the-art multi-pass algorithms proposed for static databases of sequences. Moreover, our approach scales linearly with the size of data streams while all the existing algorithms do not.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123513297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Building blocks for exploratory data analysis tools 探索性数据分析工具的构建块
S. Alspaugh, Marti A. Hearst, A. Ganapathi, R. Katz
{"title":"Building blocks for exploratory data analysis tools","authors":"S. Alspaugh, Marti A. Hearst, A. Ganapathi, R. Katz","doi":"10.1145/2501511.2501515","DOIUrl":"https://doi.org/10.1145/2501511.2501515","url":null,"abstract":"Data exploration is largely manual and labor intensive. Although there are various tools and statistical techniques that can be applied to data sets, there is little help to identify what questions to ask of a data set, let alone what domain knowledge is useful in answering the questions. In this paper, we study user queries against production data sets in Splunk. Specifically, we characterize the interplay between data sets and the operations used to analyze them using latent semantic analysis, and discuss how this characterization serves as a building block for a data analysis recommendation system. This is a work-in-progress paper.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127111295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics ACM SIGKDD交互式数据探索和分析研讨会论文集
Duen Horng Chau, Jilles Vreeken, M. Leeuwen, C. Faloutsos
{"title":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","authors":"Duen Horng Chau, Jilles Vreeken, M. Leeuwen, C. Faloutsos","doi":"10.1145/2501511","DOIUrl":"https://doi.org/10.1145/2501511","url":null,"abstract":"We have entered the era of big data. Massive datasets, surpassing terabytes and petabytes in size are now commonplace. They arise in numerous settings in science, government, and enterprises, and technology exists by which we can collect and store such massive amounts of information. Yet, making sense of these data remains a fundamental challenge. We lack the means to exploratively analyze databases of this scale. Currently, few technologies allow us to freely \"wander\" around the data, and make discoveries by following our intuition, or serendipity. While standard data mining aims at finding highly interesting results, it is typically computationally demanding and time consuming, thus may not be well-suited for interactive exploration of large datasets. \u0000 \u0000Interactive data mining techniques that aptly integrate human intuition, by means of visualization and intuitive human-computer interaction techniques, and machine computation support have been shown to help people gain significant insights into a wide range of problems. However, as datasets are being generated in larger volumes, higher velocity, and greater variety, creating effective interactive data mining techniques becomes a much harder task.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132283579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
One click mining: interactive local pattern discovery through implicit preference and performance learning 一键挖掘:通过隐式偏好和性能学习进行交互式本地模式发现
Mario Boley, M. Mampaey, Bo Kang, P. Tokmakov, S. Wrobel
{"title":"One click mining: interactive local pattern discovery through implicit preference and performance learning","authors":"Mario Boley, M. Mampaey, Bo Kang, P. Tokmakov, S. Wrobel","doi":"10.1145/2501511.2501517","DOIUrl":"https://doi.org/10.1145/2501511.2501517","url":null,"abstract":"It is known that productive pattern discovery from data has to interactively involve the user as directly as possible. State-of-the-art toolboxes require the specification of sophisticated workflows with an explicit selection of a data mining method, all its required parameters, and a corresponding algorithm. This hinders the desired rapid interaction---especially with users that are experts of the data domain rather than data mining experts. In this paper, we present a fundamentally new approach towards user involvement that relies exclusively on the implicit feedback available from the natural analysis behavior of the user, and at the same time allows the user to work with a multitude of pattern classes and discovery algorithms simultaneously without even knowing the details of each algorithm. To achieve this goal, we are relying on a recently proposed co-active learning model and a special feature representation of patterns to arrive at an adaptively tuned user interestingness model. At the same time, we propose an adaptive time-allocation strategy to distribute computation time among a set of underlying mining algorithms. We describe the technical details of our approach, present the user interface for gathering implicit feedback, and provide preliminary evaluation results.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129539432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Randomly sampling maximal itemsets 随机抽样最大项目集
Sandy Moens, Bart Goethals
{"title":"Randomly sampling maximal itemsets","authors":"Sandy Moens, Bart Goethals","doi":"10.1145/2501511.2501523","DOIUrl":"https://doi.org/10.1145/2501511.2501523","url":null,"abstract":"Pattern mining techniques generally enumerate lots of uninteresting and redundant patterns. To obtain less redundant collections, techniques exist that give condensed representations of these collections. However, the proposed techniques often rely on complete enumeration of the pattern space, which can be prohibitive in terms of time and memory. Sampling can be used to filter the output space of patterns without explicit enumeration. We propose a framework for random sampling of maximal itemsets from transactional databases. The presented framework can use any monotonically decreasing measure as interestingness criteria for this purpose. Moreover, we use an approximation measure to guide the search for maximal sets to different parts of the output space. We show in our experiments that the method can rapidly generate small collections of patterns with good quality. The sampling framework has been implemented in the interactive visual data mining tool called MIME1, as such enabling users to quickly sample a collection of patterns and analyze the results.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129673114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Methods for exploring and mining tables on Wikipedia 在维基百科上探索和挖掘表格的方法
Chandra Bhagavatula, Thanapon Noraset, Doug Downey
{"title":"Methods for exploring and mining tables on Wikipedia","authors":"Chandra Bhagavatula, Thanapon Noraset, Doug Downey","doi":"10.1145/2501511.2501516","DOIUrl":"https://doi.org/10.1145/2501511.2501516","url":null,"abstract":"Knowledge bases extracted automatically from the Web present new opportunities for data mining and exploration. Given a large, heterogeneous set of extracted relations, new tools are needed for searching the knowledge and uncovering relationships of interest. We present WikiTables, a Web application that enables users to interactively explore tabular knowledge extracted from Wikipedia. In experiments, we show that WikiTables substantially outperforms baselines on the novel task of automatically joining together disparate tables to uncover \"interesting\" relationships between table columns. We find that a \"Semantic Relatedness\" measure that leverages the Wikipedia link structure accounts for a majority of this improvement. Further, on the task of keyword search for tables, we show that WikiTables performs comparably to Google Fusion Tables despite using an order of magnitude fewer tables. Our work also includes the release of a number of public resources, including over 15 million tuples of extracted tabular data, manually annotated evaluation sets, and public APIs.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123653685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 92
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信