Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.最新文献

筛选
英文 中文
Upper bound on the length of generalized disjunction-free patterns 广义无析取模式长度的上界
Marzena Kryszkiewicz
{"title":"Upper bound on the length of generalized disjunction-free patterns","authors":"Marzena Kryszkiewicz","doi":"10.1109/SSDBM.2004.72","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.72","url":null,"abstract":"A number of lossless representations of frequent patterns were proposed in recent years. The representation that consists of all frequent closed itemsets and the representations based on generalized disjunction-free patterns or on non-derivable itemsets are proven the most concise ones. Experiments show further that the latter ones are by a few orders of magnitude more concise (and determinable) than the former one. As follows from experiments, the representations based on generalized disjunction-free patterns are also more concise than the available in the literature representations of frequent patterns, which determine supports of patterns in an approximate way. In this paper, we provide an upper bound on the length of generalized disjunction-free patterns. The bound determines the maximum number of scans of the database carried out by a priori-like algorithms discovering the representations based on generalized disjunction-free patterns.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115188520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Fast computation of Iceberg Dwarf 冰山矮人的快速计算
Longgang Xiang, Feng Yucai
{"title":"Fast computation of Iceberg Dwarf","authors":"Longgang Xiang, Feng Yucai","doi":"10.1109/SSDBM.2004.36","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.36","url":null,"abstract":"Iceberg Dwarf (IceDwarf for short) combines the strength of Iceberg-Cube and Dwarf. It exploits the elegant Dwarf structure for cube tuple store and eliminates those unsatisfied sub-dwarfs. By only storing nontrivial cube tuples, IceDwarf reduces the size of a dwarf significantly; even Dwarf itself compresses the data cube effectively. We studied how to efficiently compute icedwarfs, and developed a straightforward algorithm (PAC). To further improve the performance, we explored the structure of Dwarf and presented four nice lemmas. Based on these observations, we proposed a new algorithm called PWC. It builds the IceDwarf by bottom-up computing all the partitions of a fact table and inserting them into the Dwarf structure, enabling Apriori-like pruning and single tuple partition optimization, and facilitating the detection of suffix redundancies. Our performance study showed that PWC is highly efficient and runs much faster than PAC for icedwarfs, even for computing full dwarfs.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128548779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Efficient similarity search in streaming time sequences 流时间序列的高效相似性搜索
Maria Kontaki, A. Papadopoulos
{"title":"Efficient similarity search in streaming time sequences","authors":"Maria Kontaki, A. Papadopoulos","doi":"10.1109/SSDBM.2004.33","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.33","url":null,"abstract":"Query processing in data streams is a very important research direction. The challenge in a database of data streams is to provide efficient algorithms and access methods for query processing, taking into consideration the fact that the database changes continuously as new data arrive. Traditional access methods that continuously update the data are considered inefficient, due to the significant update costs. In this paper we present IDC-Index, an efficient technique for similarity query processing in streaming time sequences, which is based on a multidimensional access method enhanced with a deferred update policy and an incremental computation of the discrete Fourier transform (DFT), which is used as a feature extraction method. The method manages to reduce the number of false alarms examined and therefore achieves high answers/candidates ratio. Moreover, an extensive performance evaluation based on synthetic random walk and real time sequences have shown that the proposed technique outperforms significantly existing approaches for similarity range query processing.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128571516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Parallelizing clustering of geoscientific data sets using data streams 使用数据流的地球科学数据集的并行聚类
Silvia Nittel, Kelvin T. Leung
{"title":"Parallelizing clustering of geoscientific data sets using data streams","authors":"Silvia Nittel, Kelvin T. Leung","doi":"10.1109/SSDBM.2004.58","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.58","url":null,"abstract":"Computing data mining algorithms such as clustering on massive geospatial data sets is still not feasible nor efficient today. In this paper, we introduce a k-means algorithm that is based on the data stream paradigm. The so-called partial/merge k-means algorithm is implemented as a set of data stream operators which are adaptable to available computing resources such as volatile memory and processing power. The partial data stream operator consumes as much data as can befit into RAM, and performs a weighted k-means on the data subset. Subsequently, the weighted partial results are merged by a second data stream operator. All operators can be cloned, and parallelized. In our analytical and experimental performance evaluation, we demonstrate that the partial/merge k-means can outperform a one-step algorithm by a large margin with regard to overall computation time and clustering quality with increasing data density per grid cell.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"85 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116305788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Temporal range exploration of large scale multidimensional time series data 大尺度多维时间序列数据的时间范围探索
J. JáJá, Jusub Kim, Qin Wang
{"title":"Temporal range exploration of large scale multidimensional time series data","authors":"J. JáJá, Jusub Kim, Qin Wang","doi":"10.1109/SSDBM.2004.68","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.68","url":null,"abstract":"We consider the problem of querying large scale multidimensional time series data to discover events of interest, test and validate hypotheses, or to associate temporal patterns with specific events. Large amounts of multidimensional time series data are currently available, and this type of data is growing at a fast rate due to the current trends in collecting time series of business, scientific, demographic, and simulation data. The ability to explore such collections interactively, even at a coarse level, will be critical in discovering the information and knowledge embedded in such collections. We develop indexing techniques and search algorithms to efficiently handle temporal range value querying of multidimensional time series data. Our indexing uses linear space data structures that enable the handling of queries very efficiently, invoking in the worst case a logarithmic number of queries to single time slices. We also show that our algorithm is ideally suited for parallel implementation on clusters of processors achieving a linear speedup in the number of available processors. A particularly simple data structure with provably good bounds is also presented for the case when the number of multidimensional objects is relatively small. These techniques improve significantly over previous techniques for either the serial or the parallel case, and are evaluated by extensive experimental results that confirm their superior performance.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124059201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
MISSION: an agent-based system for semantic integration of heterogeneous distributed statistical information sources MISSION:一个基于代理的异构分布式统计信息源语义集成系统
S. McClean, B. Scotney, Hans Rutjes, J. Hartkamp, Isambo Karali, M. Hatzopoulos, J. Lamb, Defeng Ma
{"title":"MISSION: an agent-based system for semantic integration of heterogeneous distributed statistical information sources","authors":"S. McClean, B. Scotney, Hans Rutjes, J. Hartkamp, Isambo Karali, M. Hatzopoulos, J. Lamb, Defeng Ma","doi":"10.1109/SSDBM.2004.52","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.52","url":null,"abstract":"The MISSION system utilises query agents, in particular the matching and negotiation agents that are responsible for pre-integration where the matching agent decomposes the query into sub-queries, and then searches metadata to find datasets that match the query fragments. Such an approach provides a capability of automating the process of executing queries on heterogeneous statistical databases that are distributed over the Internet. The novelty lies in the provision of automated methods for statistical aggregation, where the heterogeneity essentially resides in the classification schemes of categorical data, including both heterogeneity of nomenclature and heterogeneity of granularity. In addition, our solution permits queries to be specified in a goal-driven query-by-example format. Rather than impose an a priori global standard, the user can query through a unified interface where integration is done at run-time.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125962891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
All-nearest-neighbors queries in spatial databases 空间数据库中的全近邻查询
Jun Zhang, N. Mamoulis, D. Papadias, Yufei Tao
{"title":"All-nearest-neighbors queries in spatial databases","authors":"Jun Zhang, N. Mamoulis, D. Papadias, Yufei Tao","doi":"10.1109/SSDBM.2004.12","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.12","url":null,"abstract":"Given two sets A and B of multidimensional objects, the all-nearest-neighbors (ANN) query retrieves for each object in A its nearest neighbor in B. Although this operation is common in several applications, it has not received much attention in the database literature. In this paper we study alternative methods for processing ANN queries depending on whether A and B are indexed: Our algorithms are evaluated through extensive experimentation using synthetic and real datasets. The performance studies show that they are an order of magnitude faster than a previous approach based on closest-pairs query processing.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"324 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132481430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 108
A Monte Carlo sampling method for drawing representative samples from large databases 从大型数据库中抽取代表性样本的蒙特卡罗抽样方法
Hong Guo, W. Hou, Feng Yan, Qiang Zhu
{"title":"A Monte Carlo sampling method for drawing representative samples from large databases","authors":"Hong Guo, W. Hou, Feng Yan, Qiang Zhu","doi":"10.1109/SSDBM.2004.5","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.5","url":null,"abstract":"Sampling is important in areas like data mining, OLAP, selectivity estimation, clustering, etc. It has also become a necessity in social, economical, engineering, scientific, and statistical studies where databases are too large to handle. In this paper, a sampling method based on the Metropolis algorithm is proposed. Unlike the conventional uniform sampling methods, this method is able to select objects consistent with the underlying probability distribution. It is a simple, efficient, and powerful method suitable for all distributions. We have performed experiments to examine the qualities of the samples by comparing their statistical properties with the underlying population. The experimental results show that the samples selected by our method are bona fide representative.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125511009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
On integrating scientific resources through semantic registration 论通过语义配准整合科技资源
S. Bowers, K. Lin, Bertram Ludäscher
{"title":"On integrating scientific resources through semantic registration","authors":"S. Bowers, K. Lin, Bertram Ludäscher","doi":"10.1109/SSDBM.2004.56","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.56","url":null,"abstract":"In many data-centric scientific applications it is common to register datasets and computational services with a federation registry (also commonly called a catalog, directory, or repository). For example, the scientific data-handling system under development in the SEEK project must consider various dataset registries, including: MCAT, for access to SRB-registered datasets Metacat, for KNB-registered datasets DiGIR, for UDDI-registered data and Xanthoria, an XML-based data registry. A challenge for SEEK, and similar efforts such as GEON is to provide uniform access to registries and registered resources, based on emerging Web and grid standards.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"379 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116578488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Exploiting multiple paths to express scientific queries 利用多种途径来表达科学问题
Z. Lacroix, Tiffany Morris, K. Parekh, L. Raschid, Maria-Esther Vidal
{"title":"Exploiting multiple paths to express scientific queries","authors":"Z. Lacroix, Tiffany Morris, K. Parekh, L. Raschid, Maria-Esther Vidal","doi":"10.1109/SSDBM.2004.34","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.34","url":null,"abstract":"The purpose of this demonstration is to present the main features of the BioNavigation system. Scientific data collection needed in various stages of scientific discovery is typically performed manually. For each scientific object of interest (e.g., a gene, a sequence), scientists query a succession of Web resources following links between retrieved entries. Each of the steps provides part of the intended characterization of the scientific object. This process is sometimes partially supported by hard-coded scripts or complex queries that will be evaluated by a mediation-based data integration system or against a data warehouse. These approaches fail in guiding the scientists during the collection process. In contrast, the BioNavigation approach presented in the paper provides the scientists with information on the available alternative resources, their provenance, and the costs of data collection. The BioNavigation system enhances a mediation-based integration system and provides scientists with support for the following: to ask queries at a high conceptual level; to visualize the multiple alternative resources that may be exploited to execute their data collection queries; to choose the final execution path to evaluate their queries.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116592367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信