H. Kashima, T. Tsumura, T. Idé, Takahide Nogayama, R. Hirade, H. Etoh, T. Fukuda
{"title":"Network-based problem detection for distributed systems","authors":"H. Kashima, T. Tsumura, T. Idé, Takahide Nogayama, R. Hirade, H. Etoh, T. Fukuda","doi":"10.1109/ICDE.2005.93","DOIUrl":"https://doi.org/10.1109/ICDE.2005.93","url":null,"abstract":"We introduce a network-based problem detection framework for distributed systems, which includes a data-mining method for discovering dynamic dependencies among distributed services from transaction data collected from network, and a novel problem detection method based on the discovered dependencies. From observed containments of transaction execution time periods, we estimate the probabilities of accidental and non-accidental containments, and build a competitive model for discovering direct dependencies by using a model estimation method based on the online EM algorithm. Utilizing the discovered dependency information, we also propose a hierarchical problem detection framework, where microscopic dependency information is incorporated with a macroscopic anomaly metric that monitors the behavior of the system as a whole. This feature is made possible by employing a network-based design which provides overall information of the system without any impact on the performance.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131081618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Kobayashi, Wenxin Liang, D. Kobayashi, Akitsugu Watanabe, H. Yokota
{"title":"VLEI code: an efficient labeling method for handling XML documents in an RDB","authors":"K. Kobayashi, Wenxin Liang, D. Kobayashi, Akitsugu Watanabe, H. Yokota","doi":"10.1109/ICDE.2005.153","DOIUrl":"https://doi.org/10.1109/ICDE.2005.153","url":null,"abstract":"A number of XML labeling methods have been proposed to store XML documents in relational databases. However, they have a vulnerable point, in insertion operations. We propose the variable length endless insertable (VLEI) code and apply it to XML labeling to reduce the cost of insertion operations. Results of our experiments indicate that a combination of the VLEI code and Dewey order is effective for handling skewed insertions.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116275707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SEA-CNN: scalable processing of continuous k-nearest neighbor queries in spatio-temporal databases","authors":"Xiaopeng Xiong, M. Mokbel, Walid G. Aref","doi":"10.1109/ICDE.2005.128","DOIUrl":"https://doi.org/10.1109/ICDE.2005.128","url":null,"abstract":"Location-aware environments are characterized by a large number of objects and a large number of continuous queries. Both the objects and continuous queries may change their locations over time. In this paper, we focus on continuous k-nearest neighbor queries (CKNN, for short). We present a new algorithm, termed SEA-CNN, for answering continuously a collection of concurrent CKNN queries. SEA-CNN has two important features: incremental evaluation and shared execution. SEA-CNN achieves both efficiency and scalability in the presence of a set of concurrent queries. Furthermore, SEA-CNN does not make any assumptions about the movement of objects, e.g., the objects velocities and shapes of trajectories, or about the mutability of the objects and/or the queries, i.e., moving or stationary queries issued on moving or stationary objects. We provide theoretical analysis of SEA-CNN with respect to the execution costs, memory requirements and effects of tunable parameters. Comprehensive experimentation shows that SEA-CNN is highly scalable and is more efficient in terms of both I/O and CPU costs in comparison to other R-tree-based CKNN techniques.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131822851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Triage: an adaptive architecture for load shedding in TelegraphCQ","authors":"Frederick Reiss, J. Hellerstein","doi":"10.1109/ICDE.2005.44","DOIUrl":"https://doi.org/10.1109/ICDE.2005.44","url":null,"abstract":"Many of the data sources used in stream query processing are known to exhibit bursty behavior. Data in a burst often has different characteristics than steady-state data, and therefore may be of particular interest. In this paper, we describe the Data Triage architecture that we are adding to TelegraphCQ to provide low latency results with good accuracy under such bursts.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"94 19","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131879689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Buneman, Byron Choi, W. Fan, R. Hutchison, Robert Mann, Stratis Viglas
{"title":"Vectorizing and querying large XML repositories","authors":"P. Buneman, Byron Choi, W. Fan, R. Hutchison, Robert Mann, Stratis Viglas","doi":"10.1109/ICDE.2005.150","DOIUrl":"https://doi.org/10.1109/ICDE.2005.150","url":null,"abstract":"Vertical partitioning is a well-known technique for optimizing query performance in relational databases. An extreme form of this technique, which we call vectorization, is to store each column separately. We use a generalization of vectorization as the basis for a native XML store. The idea is to decompose an XML document into a set of vectors that contain the data values and a compressed skeleton that describes the structure. In order to query this representation and produce results in the same vectorized format, we consider a practical fragment of XQuery and introduce the notion of query graphs and a novel graph reduction algorithm that allows us to leverage relational optimization techniques as well as to reduce the unnecessary loading of data vectors and decompression of skeletons. A preliminary experimental study based on some scientific and synthetic XML data repositories in the order of gigabytes supports the claim that these techniques are scalable and have the potential to provide performance comparable with established relational database technology.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132385164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatiotemporal annotation graph (STAG): a data model for composite digital objects","authors":"S. Yamini, Amarnath Gupta","doi":"10.1109/ICDE.2005.136","DOIUrl":"https://doi.org/10.1109/ICDE.2005.136","url":null,"abstract":"In this demonstration, we present a database over complex documents, which, in addition to a structured text content, also has update information, annotations, and embedded objects. We propose a new data model called spatiotemporal annotation graphs (STAG) for a database of composite digital objects and present a system that shows a query language to efficiently and effectively query such database. The particular application to be demonstrated is a database over annotated MS Word and PowerPoint presentations with embedded multimedia objects.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127297992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scrutinizing frequent pattern discovery performance","authors":"Osmar R Zaiane, Mohammad El-Hajj, Yi Li, S. Luk","doi":"10.1109/ICDE.2005.127","DOIUrl":"https://doi.org/10.1109/ICDE.2005.127","url":null,"abstract":"Benchmarking technical solutions is as important as the solutions themselves. Yet many fields still lack any type of rigorous evaluation. Performance benchmarking has always been an important issue in databases and has played a significant role in the development, deployment and adoption of technologies. To help assessing the myriad algorithms for frequent itemset mining, we built an open framework and testbed to analytically study the performance of different algorithms and their implementations, and contrast their achievements given different data characteristics, different conditions, and different types of patterns to discover and their constraints. This facilitates reporting consistent and reproducible performance results using known conditions.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127412607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maintaining implicated statistics in constrained environments","authors":"Yannis Sismanis, N. Roussopoulos","doi":"10.1109/ICDE.2005.84","DOIUrl":"https://doi.org/10.1109/ICDE.2005.84","url":null,"abstract":"Aggregated information regarding implicated entities is critical for online applications like network management, traffic characterization or identifying patters of resource consumption. Recently there has been a flurry of research for online aggregation on streams (like quantiles, hot items, hierarchical heavy hitters) but surprisingly the problem of summarizing implicated information in stream data has received no attention. As an example, consider an IP-network and the implication source /spl rarr/ destination. Flash crowds - such as those that follow recent sport events (like the Olympics) or seek information regarding catastrophic events - or denial of service attacks direct a large volume of traffic from a huge number of sources to a very small number of destinations. In this paper we present novel randomized algorithms for monitoring such implications with constraints in both memory and processing power for environments like network routers. Our experiments demonstrate several factors of improvements over straightforward approaches.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129028057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
François Bry, Fatih Coşkun, S. Durmaz, Tim Furche, Dan Olteanu, Markus Spannagel
{"title":"The XML stream query processor SPEX","authors":"François Bry, Fatih Coşkun, S. Durmaz, Tim Furche, Dan Olteanu, Markus Spannagel","doi":"10.1109/ICDE.2005.141","DOIUrl":"https://doi.org/10.1109/ICDE.2005.141","url":null,"abstract":"Data streams are an emerging technology for data dissemination in cases where the data throughput or size makes it unfeasible to rely on the conventional approach based on storing the data before processing it. SPEX evaluates XPath queries against XML data streams. SPEX is built upon formal frameworks for (1) rewriting XPath queries into equivalent XPath queries without reverse axes and (2) correct query evaluation with polynomial combined complexity using networks of pushdown transducers. Such transducers are simple, independent, and can be connected in a flexible manner, thus allowing not only easy extensions but also extensive query optimization. Querying XML streams with SPEX consists in four steps: first, the input XPath query is rewritten into an XPath query without reverse axes. Second, the forward XPath query is compiled into a logical query plan abstracting out details of the concrete XPath syntax. Then, a physical query plan is generated by extending the logical query plan with operators for determination and collection of answers. In the last step, the XML stream is processed continuously with the physical query plan, and the output stream conveying the answers to the original query is generated progressively.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129099049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RDF aggregate queries and views","authors":"E. Hung, Yu Deng, V. S. Subrahmanian","doi":"10.1109/ICDE.2005.121","DOIUrl":"https://doi.org/10.1109/ICDE.2005.121","url":null,"abstract":"Resource description framework (RDF) is a rapidly expanding Web standard. RDF databases attempt to track the massive amounts of Web data and services available. In this paper, we study the problem of aggregate queries. We develop an algorithm to compute answers to aggregate queries over RDF databases and algorithms to maintain views involving those aggregates. Though RDF data can be stored in a standard relational DBMS (and hence we can execute standard relational aggregate queries and view maintenance methods on them), we show experimentally that our algorithms that operate directly on the RDF representation exhibit significantly superior performance.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129020495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}