{"title":"Processing High-Volume Stream Queries on a Supercomputer","authors":"E. Zeitler, T. Risch","doi":"10.1109/ICDEW.2006.118","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.118","url":null,"abstract":"Scientific instruments, such as radio telescopes, colliders, sensor networks, and simulators generate very high volumes of data streams that scientists analyze to detect and understand physical phenomena. The high data volume and the need for advanced computations on the streams require substantial hardware resources and scalable stream processing. We address these challenges by developing data stream management technology to support high-volume stream queries utilizing massively parallel computer hardware. We have developed a data stream management system prototype for state-of-the-art parallel hardware. The performance evaluation uses real measurement data from LOFAR, a radio telescope antenna array being developed in the Netherlands.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121046415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Workflow Engine for the Execution of Scientific Protocols","authors":"H. Ménager, Z. Lacroix","doi":"10.1109/ICDEW.2006.24","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.24","url":null,"abstract":"We present the execution engine of the SemanticBio system, an integration solution that provides scientists support to express and execute scientific protocols. In SemanticBio scientific workflows are first expressed as conceptual workflows using scientific ontologies. Conceptual workflows are then translated in a semi-automated process into executable workflows, composed of calls to coordinated web services. Once the user has selected an executable workflow that meets the protocol needs, the SemanticBio execution engine supports the execution of data flow-coordinated tasks, i.e., the execution of a task is only based on the availability of its inputs. This engine is a lightweight, yet flexible approach to the execution of such workflows. Our approach addresses the problem of semantic interoperability of scientific resources publicly available on the web.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125774130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kensuke Ohta, D. Kobayashi, Takashi Kobayashi, R. Taguchi, H. Yokota
{"title":"Treatment of Rules in Individual Metadata of Flexible Contents Management","authors":"Kensuke Ohta, D. Kobayashi, Takashi Kobayashi, R. Taguchi, H. Yokota","doi":"10.1109/ICDEW.2006.153","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.153","url":null,"abstract":"The properties of contents stored in a computer system are very wide while the data volume treated in the system becomes very large. It is important to treat each stored object in different manners to reject its properties in the data management for the large amount of stored data. To satisfy the requirement, we propose a method for the autonomous management based on ECA rules stored in metadata of the contents. We study the feasibility of treating a large number of ECA rules corresponding to the number of stored objects. Because the cost for evaluating conditions in the rules becomes dominant to the system perfornzance when the number of objects increases, we divide the conditions into two types, previously evaluable conditions and nuttime evaluable conditions, and construct a discrimination network for the previously evaluable conditions of each event to reduce the cost for processing the rules. We implement the methods in the autonomous disk system, a high functional storage system we proposed, and evaluate the eficiency of them.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129839197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sibel Adali, Bouchra Bouqata, Adam Marcus, F. Spear, B. Szymanski
{"title":"A Day in the Life of a Metamorphic Petrologist","authors":"Sibel Adali, Bouchra Bouqata, Adam Marcus, F. Spear, B. Szymanski","doi":"10.1109/ICDEW.2006.8","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.8","url":null,"abstract":"In this paper, we describe the functionality of a toolkit for sharing and long-term use of different types of geological data sets across disciplines. Our tools allow users to describe the meaning of their data by attaching semantic information to it. The toolkit also makes use of the users’ access patterns to learn how the data are used to further enhance the utility of data centric methods. These learned patterns are used in conjunction with the semantic data to help other users find common ways to navigate heterogenous collections and highlight interesting information. Our current prototype is being developed in close collaboration with the Metamorphic Petrology working group formed to facilitate sharing of data within this subdiscipline of geosciences as well as with other systems for sharing of geological data.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129965539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Location-based Spatial Queries with Data Sharing in Mobile Environments","authors":"Wei-Shinn Ku, Roger Zimmermann","doi":"10.1109/ICDEW.2006.72","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.72","url":null,"abstract":"Mobile clients feature increasingly sophisticated wireless networking support that enables real-time information exchange with remote databases. Location-dependent spatial queries, such as determining the proximity of stationary objects (e.g., restaurants and gas stations) are an important class of inquiries. We present novel approaches to support nearest-neighbor queries and window queries from mobile hosts by leveraging the sharing capabilities of wireless ad-hoc networks. We illustrate how previous query results cached in the local storage of neighboring mobile peers can be leveraged to either fully or partially compute and verify spatial queries at a local host. The feasibility and appeal of our technique is illustrated through extensive simulation results that indicate a considerable reduction of the query load on the remote database. Furthermore, the scalability of our approaches is excellent because a higher density of mobile hosts increases its effectiveness.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130614078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tomohiro Yoshihara, D. Kobayashi, R. Taguchi, H. Yokota
{"title":"A Concurrency Control Method for Parallel Btree Structures","authors":"Tomohiro Yoshihara, D. Kobayashi, R. Taguchi, H. Yokota","doi":"10.1109/ICDEW.2006.7","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.7","url":null,"abstract":"A new concurrency control protocol for parallel Btree structures, MARK-OPT, is proposed. MARK-OPT marks the lowest structure-modification-operation (SMO) occurrence point during optimistic latch coupling operations, to reduce the cost of SMO compared to the conventional protocols such as ARIES/IM and INC-OPT. The marking reduces the frequency of restarts for spreading the range of X latches, which will clearly improves the system throughput. Moreover, the MARK-OPT is deadlock free and satis- fies the physical consistency requirement for Btrees. These indicate that the MARK-OPT is right and suitable as a concurrency control protocol for Btree structures. This paper also proposes three variations of the protocol, INC-MARKOPT, 2P-INT-MARK-OPT and 2P-REP-MARK-OPT, by focusing on tree structure changes from other transactions. We implement the proposed protocols, the INC-OPT, and the ARIES/IM for the Fat-Btree, a form of parallel Btree, and compare the performance of these protocols using a large-scale blade system. The experimental results indicate that the proposed protocols always improve the system throughput, and the 2P-REP-MARK-OPT is the most useful protocol in a high update environment. Moreover, in the experiment, the low frequency of restarts in the proposed protocols indicates that the marking in the proposed protocols is effective.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131972671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Managing the Forecast Factory","authors":"Laura Bright, D. Maier, Bill Howe","doi":"10.1109/ICDEW.2006.76","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.76","url":null,"abstract":"The CORIE forecast factory consists of a set of data product generation runs that are executed daily on dedicated local resources. The goal is to maximize productivity and resource utilization while still ensuring timely completion of all forecasts. Many existing workflow management systems address low-level workflow specification and execution challenges, but do not directly address the high-level challenges posed by large-scale data product factories. In this paper we discuss several specific challenges to managing the CORIE forecast factory including planning and scheduling, improving data flow, and analyzing log data, and point out their analogs in the \"physical\" manufacturing world. We present solutions we have implemented to address these challenges, and present experimental results that show the benefits of these solutions.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128097930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measuring Generality of Documents","authors":"H. Shin, E. Hovy, D. McLeod, Larry Pryor","doi":"10.1109/ICDEW.2006.77","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.77","url":null,"abstract":"Most traditional Information Retrieval (IR) systems, including web search engines, operationalize \"relevant\" as the word frequency in a document of a set of keywords. Because of this limitation, traditional IR systems frequently retrieve irrelevant documents in response to a user’s request. In this paper, we propose a new criterion, \"generality,\" that provides an additional basis on which to rank retrieved documents. The generality is a level of abstraction to retrieve results based on desired generality appropriate for a user’s knowledge and interests. We compared our generality quantification algorithm with human judges’ weighting of values to show that the developed algorithm is significantly correlated.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123321998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"P2P Directories for Distributed Web Search: From Each According to His Ability, to Each According to His Needs","authors":"Matthias Bender, S. Michel, G. Weikum","doi":"10.1109/ICDEW.2006.110","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.110","url":null,"abstract":"A compelling application of peer-to-peer (P2P) system technology would be distributed Web search, where each peer autonomously runs a search engine on a personalized local corpus (e.g., built from a thematically focused Web crawl) and peers collaborate by routing queries to remote peers that can contribute many or particularly good results for these specific queries. Such systems typically rely on a decentralized directory, e.g., built on top of a distributed hash table (DHT), that holds compact, aggregated statistical metadata about the peers which is used to identify promising peers for a particular query. To support an a-priori unlimited number of peers, it is crucial to keep the load on the distributed directory low. Moreover, each peer should ideally tailor its postings to the directory to reflect its particular strengths, such as rich information about specialized topics that no or only few other peers would also cover. This paper addresses this problem by proposing strategies for peers that identify suitable subsets of the most beneficial statistical metadata. We argue that posting a carefully selected subset of metadata can achieve almost the same result quality as a complete metadata directory, for only the most relevant peers are eventually involved in the execution of a given query. Additionally, asking only relevant peers will result in higher precision, as the noise introduced by poor peers is reduced. We have implemented these strategies in our fully operational P2P Web search prototype Minerva, and present experimental results on real-world Web data that show the viability of the strategies and their gains in terms of high search result quality at low networking costs.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116150075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Multiple-Perspective, Interactive Approach for Web Information Extraction and Exploration","authors":"N. Moon, Yahsin. Hsu, Rahul Singh","doi":"10.1109/ICDEW.2006.11","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.11","url":null,"abstract":"While increasing amounts of complex information are becoming available on the web, there is, beyond keywordbased search and listing of results, a paucity of user interface paradigms and implementations that support interaction, exploration, and assimilation of information. This paper describes our design of a novel framework to address this deficiency. The proposed framework supports both direct search behavior as well as more exploratory search strategies through multiple-perspective visualization and interaction with search results. The approach is developed around the twin themes of supporting data context and facilitating effective interactions between users and data. The system supports data context through determination of semantic correlations between web pages and extraction of the spatio-temporal data contained therein. A multipleperspective environment is then used to display semantic and spatio-temporal relationships as well as to provide intuitive views of the data, specifically through web page thumbnail, map, and timeline modules. The environment supports direct interactions with the data through a reflective interface by which user selections in any one panel highlight the corresponding information in other panels. In this environment, visual cues and explicit facilities to model space and time aid in recognition, querying, and exploration of information as well as in representation and reasoning with complex relationships (such as spatio-temporal, causal, evolutionary) in the data. Experimental studies of a quantitative and qualitative nature demonstrate the efficacy of the system in facilitating both information extraction and discovery.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114298304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}