19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)最新文献

筛选
英文 中文
Duplicate Elimination in Space-partitioning Tree Indexes 空间分区树索引中的重复消除
M. Eltabakh, M. Ouzzani, Walid G. Aref
{"title":"Duplicate Elimination in Space-partitioning Tree Indexes","authors":"M. Eltabakh, M. Ouzzani, Walid G. Aref","doi":"10.1109/SSDBM.2007.10","DOIUrl":"https://doi.org/10.1109/SSDBM.2007.10","url":null,"abstract":"Space-partitioning trees, like the disk-based trie, quadtree, kd-tree and their variants, are a family of access methods that index multi-dimensional objects. In the case of indexing non-zero extent objects, e.g., line segments and rectangles, space-partitioning trees may replicate objects over multiple space partitions, e.g., PMR quadtree, expanded MX-CIF quadtree, and extended kd-tree. As a result, the answer to a query over these indexes may include duplicates that need to be eliminated, i.e., the same object may be reported more than once. In this paper, we propose generic duplicate elimination techniques for the class of space-partitioning trees in the context of SP-GiST; an extensible indexing framework for realizing space-partitioning trees. The proposed techniques are embedded inside the INDEX-SCAN operator. Therefore, duplicate copies of the same object do not propagate in the query plan, and the elimination process is transparent to the end-users. Two cases for the index structures are considered based on whether or not the objects' coordinates are stored inside the index tree. The theoretical and experimental analysis illustrate that the proposed techniques achieve savings in the storage requirements, I/O operations, and processing time when compared to adding a separate duplicate elimination operator in the query plan.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"468 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134061852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Some Challenges in Integrating Information on Protein Interactions and a Partial Solution 蛋白质相互作用信息整合的若干挑战及部分解决方案
H. Jagadish
{"title":"Some Challenges in Integrating Information on Protein Interactions and a Partial Solution","authors":"H. Jagadish","doi":"10.1109/SSDBM.2007.23","DOIUrl":"https://doi.org/10.1109/SSDBM.2007.23","url":null,"abstract":"Summary form only given. Independently constructed sources of (scientific) data frequently have overlapping, and sometimes contradictory, information content. Current methods of use fall into two categories: force the integration step onto the user, or merely collate the data, at most transforming it into a common format. The first method places an undue burden on the user to fit all of the jigsaw puzzle pieces together. The second leads to redundancy and possible inconsistency. We propose a third: deep data integration. The idea is to provide a cohesive view of all information currently available for a protein, interaction, or other object of scientific interest. Doing so requires that multiple pieces of data about the object, in different sources, first be identified as referring to the same object, if required through \"third party\" information; then that a single \"record\" be created comprising the union of the information in multiple matched records, keeping track of differences where these occur; and finally by tracking the provenance of every value in the dataset so scientists can judge what items to use, and how to resolve differences.In this talk, the I will describe our experiences with this approach in MiMI (http://mimi.ncibi.org). I will also discuss barriers to domain scientist use of the system, and my thoughts regarding how to make systems truly \"usable\".","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127868934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Component-based Data Layout for Efficient Slicing of Very Large Multidimensional Volumetric Data 基于组件的数据布局用于超大多维体积数据的高效切片
Jusub Kim, J. JáJá
{"title":"Component-based Data Layout for Efficient Slicing of Very Large Multidimensional Volumetric Data","authors":"Jusub Kim, J. JáJá","doi":"10.1109/SSDBM.2007.7","DOIUrl":"https://doi.org/10.1109/SSDBM.2007.7","url":null,"abstract":"In this paper, we introduce a new efficient data layout scheme to efficiently handle out-of-core axis-aligned slicing queries of very large multidimensional volumetric data. Slicing is a very useful dimension reduction tool that removes or reduces occlusion problems in visualizing 3D/4D volumetric data sets and that enables fast visual exploration of such data sets. We show that the data layouts based on typical space-filling curves are not optimal for the out-of-core slicing queries and present a novel component-based data layout scheme for a specialized problem domain, in which it is only required to provide fast slicing at every k-th value, for any k > 1. Our component-based data layout scheme provides much faster processing time for any axis-aligned slicing direction at every k-th value, k > 1, requiring less cache memory size and without any replication of data. In addition, the data layout can be generalized to any high dimension.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126374648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Boosting k-Nearest Neighbor Queries Estimating Suitable Query Radii 提高k近邻查询估计合适的查询半径
Marcos R. Vieira, C. Traina, A. Traina, Adriano S. Arantes, C. Faloutsos
{"title":"Boosting k-Nearest Neighbor Queries Estimating Suitable Query Radii","authors":"Marcos R. Vieira, C. Traina, A. Traina, Adriano S. Arantes, C. Faloutsos","doi":"10.1109/SSDBM.2007.5","DOIUrl":"https://doi.org/10.1109/SSDBM.2007.5","url":null,"abstract":"This paper proposes novel and effective techniques to estimate a radius to answer k-nearest neighbor queries. The first technique targets datasets where it is possible to learn the distribution about the pairwise distances between the elements, generating a global estimation that applies to the whole dataset. The second technique targets datasets where the first technique cannot be employed, generating estimations that depend on where the query center is located. The proposed k-NNF() algorithm combines both techniques, achieving remarkable speedups. Experiments performed on both real and synthetic datasets have shown that the proposed algorithm can accelerate k-NN queries more than 26 times compared with the incremental algorithm and spends half of the total time compared with the traditional k-NN() algorithms.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129072916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
An Efficient Algorithm for Mining Coherent Patterns from Heterogeneous Microarrays 一种从异构微阵列中挖掘相干模式的有效算法
Xiang Zhang, Wei Wang
{"title":"An Efficient Algorithm for Mining Coherent Patterns from Heterogeneous Microarrays","authors":"Xiang Zhang, Wei Wang","doi":"10.1109/SSDBM.2007.30","DOIUrl":"https://doi.org/10.1109/SSDBM.2007.30","url":null,"abstract":"DNA microarray techniques present a novel way for geneticists to monitor interactions among tens of thousands of genes simultaneously, and have become standard lab routines in gene discovery, disease diagnosis, and drug design. There has been extensive research on coherent subspace clustering of gene expressions measured under consistent experimental settings. This implies that all experiments are run using the same batch of microarray chips with similar characteristics of noise. Algorithms developed under this assumption may not be applicable for analyzing data collected from heterogeneous settings, where the set of genes being monitored may be different and expression levels may be not directly comparable even for the same gene. In this paper, we propose a model, F-cluster, for mining subspace coherent patterns from heterogeneous gene expression data, which is shown effective for revealing truthful patterns and reducing spurious ones. We also develop an efficient and scalable hybrid approach that combines gene-pair based and sample-pair based pruning to generate F-clusters from multiple gene expression matrices simultaneously. The experimental results demonstrate that our model can discover significant clusters that may not be identified by previous models.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"200 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115016610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信