{"title":"Augmenting MATLAB with semantic objects for an interactive visual environment","authors":"C. Lee, J. Choo, Duen Horng Chau, Haesun Park","doi":"10.1145/2501511.2501521","DOIUrl":"https://doi.org/10.1145/2501511.2501521","url":null,"abstract":"Analysis tools such as Matlab, R, and SAS support a myriad of built-in computational functions and various standard visualization techniques. However, most of them provide little interaction from visualizations mainly due to the fact that the tools treat the data as just numerical vectors or matrices while ignoring any semantic meaning associated with them. To solve this limitation, we augment Matlab, one of the widely used data analysis tools, with the capability of directly handling the underlying semantic objects and their meanings. Such capabilities allow users to flexibly assign essential interaction capabilities, such as brushing-and-linking and details-on-demand interactions, to visualizations. To demonstrate the capabilities, two usage scenarios in document and graph analysis domains are presented.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130595119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards anytime active learning: interrupting experts to reduce annotation costs","authors":"M. E. Ramirez-Loaiza, A. Culotta, M. Bilgic","doi":"10.1145/2501511.2501524","DOIUrl":"https://doi.org/10.1145/2501511.2501524","url":null,"abstract":"Many active learning methods use annotation cost or expert quality as part of their framework to select the best data for annotation. While these methods model expert quality, availability, or expertise, they have no direct influence on any of these elements. We present a novel framework built upon decision-theoretic active learning that allows the learner to directly control label quality by allocating a time budget to each annotation. We show that our method is able to improve performance efficiency of the active learner through an interruption mechanism trading off the induced error with the cost of annotation. Our simulation experiments on three document classification tasks show that some interruption is almost always better than none, but that the optimal interruption time varies by dataset.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"79 2-3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123453999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Storygraph: extracting patterns from spatio-temporal data","authors":"Ayush Shrestha, B. Miller, Ying Zhu, Yi Zhao","doi":"10.1145/2501511.2501525","DOIUrl":"https://doi.org/10.1145/2501511.2501525","url":null,"abstract":"Analysis of spatio-temporal data often involves correlating different events in time and location to uncover relationships between them. It is also desirable to identify different patterns in the data. Visualizing time and space in the same chart is not trivial. Common methods includes plotting the latitude, longitude and time as three dimensions of a 3D chart. Drawbacks of these 3D charts include not being able to scale well due to cluttering, occlusion and difficulty to track time in case of clustered events. In this paper we present a novel 2D visualization technique called Storygraph which provides an integrated view of time and location to address these issues. We also present storylines based on Storygraph which show movement of the actors over time. Lastly, we present case studies to show the applications of Storygraph.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125297727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edward Clarkson, J. Choo, John Turgeson, R. Decuir, Haesun Park
{"title":"Lytic: synthesizing high-dimensional algorithmic analysis with domain-agnostic, faceted visual analytics","authors":"Edward Clarkson, J. Choo, John Turgeson, R. Decuir, Haesun Park","doi":"10.1145/2501511.2501518","DOIUrl":"https://doi.org/10.1145/2501511.2501518","url":null,"abstract":"We present Lytic, a domain-independent, faceted visual analytic (VA) system for interactive exploration of large datasets. It combines a flexible UI that adapts to arbitrary character-separated value (CSV) datasets with algorithmic preprocessing to compute unsupervised dimension reduction and cluster data from high-dimensional fields. It provides a variety of visualization options that require minimal user effort to configure and a consistent user experience between visualization types and underlying datasets. Filtering, comparison and visualization operations work in concert, allowing users to hop seamlessly between actions and pursue answers to expected and unexpected data hypotheses.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116600206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hoang Thanh Lam, T. Calders, Jie Yang, F. Mörchen, Dmitriy Fradkin
{"title":"Zips: mining compressing sequential patterns in streams","authors":"Hoang Thanh Lam, T. Calders, Jie Yang, F. Mörchen, Dmitriy Fradkin","doi":"10.1145/2501511.2501520","DOIUrl":"https://doi.org/10.1145/2501511.2501520","url":null,"abstract":"We propose a streaming algorithm, based on the minimal description length (MDL) principle, for extracting non-redundant sequential patterns. For static databases, the MDL-based approach that selects patterns based on their capacity to compress data rather than their frequency, was shown to be remarkably effective for extracting meaningful patterns and solving the redundancy issue in frequent itemset and sequence mining. The existing MDL-based algorithms, however, either start from a seed set of frequent patterns, or require multiple passes through the data. As such, the existing approaches scale poorly and are unsuitable for large datasets. Therefore, our main contribution is the proposal of a new, streaming algorithm, called Zips, that does not require a seed set of patterns and requires only one scan over the data. For Zips, we extended the Lempel-Ziv (LZ) compression algorithm in three ways: first, whereas LZ assigns codes uniformly as it builds up its dictionary while scanning the input, Zips assigns codewords according to the usage of the dictionary words; more heaviliy used words get shorter code-lengths. Secondly, Zips exploits also non-consecutive occurences of dictionary words for compression. And, third, the well-known space-saving algorithm is used to evict unpromising words from the dictionary. Experiments on one synthetic and two real-world large-scale datasets show that our approach extracts meaningful compressing patterns with similar quality to the state-of-the-art multi-pass algorithms proposed for static databases of sequences. Moreover, our approach scales linearly with the size of data streams while all the existing algorithms do not.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123513297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Alspaugh, Marti A. Hearst, A. Ganapathi, R. Katz
{"title":"Building blocks for exploratory data analysis tools","authors":"S. Alspaugh, Marti A. Hearst, A. Ganapathi, R. Katz","doi":"10.1145/2501511.2501515","DOIUrl":"https://doi.org/10.1145/2501511.2501515","url":null,"abstract":"Data exploration is largely manual and labor intensive. Although there are various tools and statistical techniques that can be applied to data sets, there is little help to identify what questions to ask of a data set, let alone what domain knowledge is useful in answering the questions. In this paper, we study user queries against production data sets in Splunk. Specifically, we characterize the interplay between data sets and the operations used to analyze them using latent semantic analysis, and discuss how this characterization serves as a building block for a data analysis recommendation system. This is a work-in-progress paper.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127111295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Duen Horng Chau, Jilles Vreeken, M. Leeuwen, C. Faloutsos
{"title":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","authors":"Duen Horng Chau, Jilles Vreeken, M. Leeuwen, C. Faloutsos","doi":"10.1145/2501511","DOIUrl":"https://doi.org/10.1145/2501511","url":null,"abstract":"We have entered the era of big data. Massive datasets, surpassing terabytes and petabytes in size are now commonplace. They arise in numerous settings in science, government, and enterprises, and technology exists by which we can collect and store such massive amounts of information. Yet, making sense of these data remains a fundamental challenge. We lack the means to exploratively analyze databases of this scale. Currently, few technologies allow us to freely \"wander\" around the data, and make discoveries by following our intuition, or serendipity. While standard data mining aims at finding highly interesting results, it is typically computationally demanding and time consuming, thus may not be well-suited for interactive exploration of large datasets. \u0000 \u0000Interactive data mining techniques that aptly integrate human intuition, by means of visualization and intuitive human-computer interaction techniques, and machine computation support have been shown to help people gain significant insights into a wide range of problems. However, as datasets are being generated in larger volumes, higher velocity, and greater variety, creating effective interactive data mining techniques becomes a much harder task.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132283579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mario Boley, M. Mampaey, Bo Kang, P. Tokmakov, S. Wrobel
{"title":"One click mining: interactive local pattern discovery through implicit preference and performance learning","authors":"Mario Boley, M. Mampaey, Bo Kang, P. Tokmakov, S. Wrobel","doi":"10.1145/2501511.2501517","DOIUrl":"https://doi.org/10.1145/2501511.2501517","url":null,"abstract":"It is known that productive pattern discovery from data has to interactively involve the user as directly as possible. State-of-the-art toolboxes require the specification of sophisticated workflows with an explicit selection of a data mining method, all its required parameters, and a corresponding algorithm. This hinders the desired rapid interaction---especially with users that are experts of the data domain rather than data mining experts. In this paper, we present a fundamentally new approach towards user involvement that relies exclusively on the implicit feedback available from the natural analysis behavior of the user, and at the same time allows the user to work with a multitude of pattern classes and discovery algorithms simultaneously without even knowing the details of each algorithm. To achieve this goal, we are relying on a recently proposed co-active learning model and a special feature representation of patterns to arrive at an adaptively tuned user interestingness model. At the same time, we propose an adaptive time-allocation strategy to distribute computation time among a set of underlying mining algorithms. We describe the technical details of our approach, present the user interface for gathering implicit feedback, and provide preliminary evaluation results.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129539432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Randomly sampling maximal itemsets","authors":"Sandy Moens, Bart Goethals","doi":"10.1145/2501511.2501523","DOIUrl":"https://doi.org/10.1145/2501511.2501523","url":null,"abstract":"Pattern mining techniques generally enumerate lots of uninteresting and redundant patterns. To obtain less redundant collections, techniques exist that give condensed representations of these collections. However, the proposed techniques often rely on complete enumeration of the pattern space, which can be prohibitive in terms of time and memory. Sampling can be used to filter the output space of patterns without explicit enumeration. We propose a framework for random sampling of maximal itemsets from transactional databases. The presented framework can use any monotonically decreasing measure as interestingness criteria for this purpose. Moreover, we use an approximation measure to guide the search for maximal sets to different parts of the output space. We show in our experiments that the method can rapidly generate small collections of patterns with good quality. The sampling framework has been implemented in the interactive visual data mining tool called MIME1, as such enabling users to quickly sample a collection of patterns and analyze the results.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129673114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Methods for exploring and mining tables on Wikipedia","authors":"Chandra Bhagavatula, Thanapon Noraset, Doug Downey","doi":"10.1145/2501511.2501516","DOIUrl":"https://doi.org/10.1145/2501511.2501516","url":null,"abstract":"Knowledge bases extracted automatically from the Web present new opportunities for data mining and exploration. Given a large, heterogeneous set of extracted relations, new tools are needed for searching the knowledge and uncovering relationships of interest. We present WikiTables, a Web application that enables users to interactively explore tabular knowledge extracted from Wikipedia. In experiments, we show that WikiTables substantially outperforms baselines on the novel task of automatically joining together disparate tables to uncover \"interesting\" relationships between table columns. We find that a \"Semantic Relatedness\" measure that leverages the Wikipedia link structure accounts for a majority of this improvement. Further, on the task of keyword search for tables, we show that WikiTables performs comparably to Google Fusion Tables despite using an order of magnitude fewer tables. Our work also includes the release of a number of public resources, including over 15 million tuples of extracted tabular data, manually annotated evaluation sets, and public APIs.","PeriodicalId":126062,"journal":{"name":"Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123653685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}