Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications最新文献
{"title":"LH*s: a high-availability and high-security scalable distributed data structure","authors":"W. Litwin, Marie-Anne Neimat","doi":"10.1109/RIDE.1997.583720","DOIUrl":"https://doi.org/10.1109/RIDE.1997.583720","url":null,"abstract":"LH*s is high availability variant of LH*, a Scalable Distributed Data Structure. An LH*s record is striped onto different server nodes. A parity segment allows one to reconstruct the record if a segment fails. The insert or key search time is about a msec on a 10 Mb/s net, and about 100 /spl mu/s at 1 Gb/s net, assuming the segments in the distributed RAM. The file size depends only on the distributed storage available, i.e., a RAM file can reach dozens of GB in practice. Data security is enhanced, as every site contains only partial and typically meaningless data. The price to pay is 20-50% more storage for the file than for an LH* file, and some additional messaging, especially for the scan search.","PeriodicalId":177468,"journal":{"name":"Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117029832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High Performance Banking","authors":"John A. Keane","doi":"10.1109/RIDE.1997.583702","DOIUrl":"https://doi.org/10.1109/RIDE.1997.583702","url":null,"abstract":"The aim of the High Performance Banking (HYPERBANK) project is to provide the banking sector with the requisite toolset for increased understanding of existing and prospective customers, and better tailoring of products and services for those customers. The approach integrates three areas: business knowledge modelling, data warehousing and data mining, and parallel computing.","PeriodicalId":177468,"journal":{"name":"Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125186741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wen-Syan Li, Y. Hara, N. Fix, K. Candan, K. Hirata, Sougata Mukherjea
{"title":"Brokerage architecture for stock photo industry","authors":"Wen-Syan Li, Y. Hara, N. Fix, K. Candan, K. Hirata, Sougata Mukherjea","doi":"10.1109/RIDE.1997.583710","DOIUrl":"https://doi.org/10.1109/RIDE.1997.583710","url":null,"abstract":"The Internet has grown to become a major component of the global world-wide network infrastructure, linking millions of users. We first address the need for an electronic market and match-making between consumers and providers on the Internet for the stock photo industry. We discuss business issues and highlight technologies required to support an electronic market image exchange on the global Internet, including multimedia databases, visual query interfaces, visualization tools, and watermarking techniques. Finally, we summarize the operational flows and brokerage services provided by the brokerage system.","PeriodicalId":177468,"journal":{"name":"Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121060261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed data management in workflow environments","authors":"G. Alonso, B. Reinwald, C. Mohan","doi":"10.1109/RIDE.1997.583708","DOIUrl":"https://doi.org/10.1109/RIDE.1997.583708","url":null,"abstract":"Most existing workflow management systems (WFMSs) are based on a client/server architecture. This architecture simplifies the overall design but it does not match the distributed nature of workflow applications and imposes severe limitations in terms of scalability and reliability. Moreover workflow engines are not very sophisticated in terms of data management. Forgetting the fact that workflow is, to a great extent, data flow. In this paper we propose a novel architecture to address the issue of data management in a WFMS. This architecture is based on a fully distributed workflow engine for control flow, plus a set of loosely synchronized replicated databases for dataflow. The resulting system offers greater robustness and reliability as well as much better data handling capabilities than existing approaches. To better illustrate this novel architecture and its implications, two commercial systems are employed in this paper: FlowMark, as the workflow engine, and the replication capabilities of Lotus Notes, as the support system for distributed data management.","PeriodicalId":177468,"journal":{"name":"Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124058579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mediator join indices","authors":"Lingling Yan, M. Tamer Özsu, Ling Liu","doi":"10.1109/RIDE.1997.583698","DOIUrl":"https://doi.org/10.1109/RIDE.1997.583698","url":null,"abstract":"A mediator join index (MJI) is proposed to speed up N-way inter-database joins by reducing the amount of data transfer during evaluation. A family of algorithms, the query scrubbing algorithms (QSA), are developed to maintain MJI and to evaluate queries using MJI. QSA algorithms use query scrubbing to cope with update and query anomalies related to materialized views in the mediator context. Compared with existing algorithms, QSA algorithms incur less overhead in handling the anomalies and makes MJI a promising technique for efficient mediator query processing.","PeriodicalId":177468,"journal":{"name":"Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129269455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Batching and dynamic allocation techniques for increasing the stream capacity of an on-demand media server","authors":"D. Jadav, C. Srinilta, A. Choudhary","doi":"10.1109/RIDE.1997.583717","DOIUrl":"https://doi.org/10.1109/RIDE.1997.583717","url":null,"abstract":"A server for an interactive distributed multimedia system may require thousands of gigabytes of storage space and high I/O bandwidth. In order to maximize system utilization, and thus minimize cost, the load must be balanced among the server's disks, interconnection network and scheduler. Many algorithms for maximizing retrieval capacity from the storage system have been proposed. The paper presents techniques for improving server capacity by assigning media requests to the nodes of a server so as to balance the load on the interconnection network and the scheduling nodes. Five policies for request assignment are developed. The performance of these policies on an implementation of a server model developed earlier is presented.","PeriodicalId":177468,"journal":{"name":"Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128080631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalization and decision tree induction: efficient classification in data mining","authors":"M. Kamber, Lara Winstone, Wang Gon, Jiawei Han","doi":"10.1109/RIDE.1997.583715","DOIUrl":"https://doi.org/10.1109/RIDE.1997.583715","url":null,"abstract":"Efficiency and scalability are fundamental issues concerning data mining in large databases. Although classification has been studied extensively, few of the known methods take serious consideration of efficient induction in large databases and the analysis of data at multiple abstraction levels. The paper addresses the efficiency and scalability issues by proposing a data classification method which integrates attribute oriented induction, relevance analysis, and the induction of decision trees. Such an integration leads to efficient, high quality, multiple level classification of large amounts of data, the relaxation of the requirement of perfect training sets, and the elegant handling of continuous and noisy data.","PeriodicalId":177468,"journal":{"name":"Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124683728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stochastic performance guarantees for mixed workloads in a multimedia information system","authors":"G. Nerjes, Peter Muth, G. Weikum","doi":"10.1109/RIDE.1997.583719","DOIUrl":"https://doi.org/10.1109/RIDE.1997.583719","url":null,"abstract":"We present an approach to stochastic performance guarantees for multimedia servers with mixed workloads. Advanced multimedia applications such as digital libraries or teleteaching exhibit a mixed workload with accesses to both \"continuous\" and conventional, \"discrete\" data, where the fractions of continuous data and discrete data requests vary over time. We assume that a server shares all disks among continuous and discrete data, and we develop a stochastic performance model for the resulting mixed workload, using a combination of analytic and simulation based modeling. Based on this model we devise a round based scheduling scheme with stochastic performance guarantees: for continous data requests, we bound the probability that \"glitches\" occur and for discrete data requests, we bound the probability that the response time exceeds a certain tolerance threshold. We present early results of simulation studies.","PeriodicalId":177468,"journal":{"name":"Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128353555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammed J. Zaki, S. Parthasarathy, Wei Li, M. Ogihara
{"title":"Evaluation of sampling for data mining of association rules","authors":"Mohammed J. Zaki, S. Parthasarathy, Wei Li, M. Ogihara","doi":"10.1109/RIDE.1997.583696","DOIUrl":"https://doi.org/10.1109/RIDE.1997.583696","url":null,"abstract":"The discovery of association rules is a prototypical problem in data mining. The current algorithms proposed for data mining of association rules make repeated passes over the database to determine the commonly occurring item sets (or set of items). For large databases, the I/O overhead in scanning the database can be extremely high. The authors show that random sampling of transactions in the database is an effective method for finding association rules. Sampling can speed up the mining process by more than an order of magnitude by reducing I/O costs and drastically shrinking the number of transactions to be considered. They may also be able to make the sampled database resident in main-memory. Furthermore, they show that sampling can accurately represent the data patterns in the database with high confidence. They experimentally evaluate the effectiveness of sampling on different databases, and study the relationship between the performance, accuracy, and confidence of the chosen sample.","PeriodicalId":177468,"journal":{"name":"Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132358624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tackling the challenges of materialized view design in data warehousing environment","authors":"Jian Yang, K. Karlapalem, Qing Li","doi":"10.1109/RIDE.1997.583695","DOIUrl":"https://doi.org/10.1109/RIDE.1997.583695","url":null,"abstract":"The design of materialized views in a data warehousing environment is an important problem which has been largely overlooked in the past. If one regards data warehouse queries as integrated views over the base databases, then there is a need to select a set of views to be materialized so that the best combination of good performance and low maintenance cost can be achieved. The authors compare materialized view design (MVD) work with related problems such as common subexpressions and multiple query processing, discuss the unique requirements of MVD, and outline possible solutions of addressing some of the challenging issues of MVD.","PeriodicalId":177468,"journal":{"name":"Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133037530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}