{"title":"String join using precedence count matrix","authors":"Xia Cao, A. Tung, B. Ooi, K. Tan, Shuai Cheng Li","doi":"10.1109/SSDBM.2004.66","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.66","url":null,"abstract":"In this paper; we propose a filter-and-refine string join algorithm. While the filtering phase can rapidly prune away strings that are not joinable, the refinement phase employs a comprehensive algorithm to remove the remaining false alarms. The efficiency of the proposed scheme lies in the use of the precedence count matrix (PCM) for computing the edit distance between two sequences. With PCM, the complexity of sequence comparison is a constant time. We also evaluated the proposed sequence join algorithm, and our study shows that it outperforms the known techniques.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131825245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LCGMiner: levelwise closed graph pattern mining from large databases","authors":"Aihua Xu, H. Lei","doi":"10.1109/SSDBM.2004.47","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.47","url":null,"abstract":"LCGMiner (levelwise closed graph pattern miner) is proposed to improve CloseGraph (Yan and Han, 2003) in discovering frequent closed sub graphs. Frequent closed edgesets with the same extended vertexsets are expanded in pattern generation compared to one edge or one vertex in traditional methods. Experiments on synthetic datasets as well as a real NIH dataset demonstrates that our algorithm outperforms CloseGraph and gSpan.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133825638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An integrated metadata model for statistical data collection and processing","authors":"M. Vardaki, H. Papageorgiou","doi":"10.1109/SSDBM.2004.16","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.16","url":null,"abstract":"An integrated, semantically rich statistical metadata model is designed to cover the major stages of the statistical information processing (data collection and analysis including harmonization, processing of data and metadata and dissemination/output phases), which can minimize complexity of data warehousing environments and compatibility problems between distributed statistical information systems (SIS). The semantics of the model are analyzed, describing each part of the statistical processing. In addition, process metadata (operators) for automatic manipulation of both data and metadata are also defined over their common domain as well as logistic metadata for the location and format of data. Furthermore, we discuss how the proposed framework can facilitate actual information entry and analysis into a SIS. Finally, we demonstrate in a case study how the suggested metadata model can be implemented and integrated into a modern metadata-enabled SIS, thus standardizing the processing environment and assuring the quality of statistical results.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129899715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ants caught in the semantic Web: a study in the application of description logic to animal systematics","authors":"K. Taylor, Charles Gretton","doi":"10.1109/SSDM.2004.1311250","DOIUrl":"https://doi.org/10.1109/SSDM.2004.1311250","url":null,"abstract":"Scientists have been organising the forms of natural life into structured hierarchical systems since Linnaeus in the 18th century. Much more recently, computer scientists have developed a class of languages, called description logics (DL), that are aimed at describing concepts so that they may be automatically classified in hierarchical structures. These languages are being adopted in recent proposals for ontology definition that underly the semantic Web, particularly OWL-DL (Bechofer et al., 2003). In this paper we study the applicability of modern description logics to the application of animal systematics. We would like to improve both the process of scientific classification itself, and the methods for communication and integration of taxonomic knowledge. As a case study, we consider a published scientific treatment of Epopostruma, a genus of Australian Formicidae (ants) (Shattuck, 2000). We focus on expressing the morphological characters of Epopostruma, that is the features that derive from the form, structures, homologies and metamorphoses which characterise an individual. We express these characters in the description logic ALCQHIO/sub R//sup +/(D)/sup -/ underlying OWL-DL. Racer (Haarslev and Moller, 2001) is a readily-available reasoner ALCQHIO/sub R//sup +/(D)/sup -/, and is used in this paper to support the development of the DL application to animal systematics. We have used the native syntax of Racer for DL expressions in this paper. We find that most of the language used in a scientific description is readily adapted to the formal description logic language, with the exception of spatio-temporal elements and some higher-order constructs. We show that the reasoning capability is sufficient for consistency checking and retrieval of taxonomic knowledge. We discuss some benefits of the representation to assist the work of biological systematists.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128916656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Taylor, C. O'Keefe, J. Colton, R. Baxter, R. Sparks, Uma Srinivasan, M. Cameron, L. Lefort
{"title":"A service oriented architecture for a health research data network","authors":"K. Taylor, C. O'Keefe, J. Colton, R. Baxter, R. Sparks, Uma Srinivasan, M. Cameron, L. Lefort","doi":"10.1109/SSDBM.2004.7","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.7","url":null,"abstract":"This paper reports on an architecture aimed at providing a technology platform for a new research facility, called the Health Research Data Network (HRDN). The two key features - custodial control over access and use of resources; and confidentiality protection integrated into a secure end-to-end system for data sharing and analysis - distinguish HRDN from other service oriented architectures for distributed data sharing and analysis.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"16 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120949011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Where the rubber meets the sky: the semantic gap between data producers and data consumers","authors":"J. Gray","doi":"10.1109/SSDM.2004.1311187","DOIUrl":"https://doi.org/10.1109/SSDM.2004.1311187","url":null,"abstract":"Summary form only given. Historically, scientists gatherer and analyzed their own data. But technology has created functional specialization where some scientists gather or generate data, and others analyze it. Technology allows us to easily capture vast amounts of empirical data and to generate vast amounts of simulated data. Technology also allows us to store these bytes almost indefinitely. But there are few tools to organize scientific data for easy access and query, few tools to curate the data, and few tools to federate science archives. Domain scientists, notably NCBI and the Virtual Observatory, are making heroic efforts to address these problems. But this is a generic problem that cuts across all scientific disciplines. It requires a coordinated effort by the computer science community to build generic tools that will help all the sciences. Our current database products are a start, but much more is needed.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127010268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Siebeck, S. Shumilov, A. Cremers, M. Breunig, A. Thomsen
{"title":"Selected spatio-temporal data types and operations for a 3D/4D geological information system","authors":"J. Siebeck, S. Shumilov, A. Cremers, M. Breunig, A. Thomsen","doi":"10.1109/SSDBM.2004.62","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.62","url":null,"abstract":"The management of the evolution in time of large and complex 3D-models is not a trivial task and may largely surpass the data volume and complexity of conventional GIS. 3D objects changing in time have to be retrieved, inserted, processed and updated. It requires the development of application-specific spatiotemporal data types and operations. In this paper we present the design and the realisation of spatiotemporal data types and operations to be used in a typical 3D/4D-geological information system, and we give an outlook on our further research in the field of distributed spatiotemporal geoinformation systems.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"6 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132502526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The two cultures of digital curation","authors":"P. Buneman","doi":"10.1109/SSDM.2004.1311188","DOIUrl":"https://doi.org/10.1109/SSDM.2004.1311188","url":null,"abstract":"Summary form only given. The United Kingdom has recently created a Digital Curation Centre whose purpose is to provide advice on, develop tools for and conduct research on all aspects of digital curation. But what is digital curation, and why is it interesting to database researchers? Ask around, and you are likely to find two kinds of people involved in digital curation - at least they call themselves curators and use computers. Moreover, on the face of it, they have almost nothing else in common. An archivist (A) does the digital equivalent of putting documents in boxes. A is dealing with data generated by other people and is concerned with: appraisal - the selection of what documents to preserve, indexing and classification - the choice of which document to put into which box, and preservation - ensuring that the documents are preserved for posterity. A finds computers extremely useful because all kinds of \"digital objects\" may be archived, and the Internet provides easy access to digital objects. A scientist (B) does the digital equivalent of publishing a textbook or compendium. B might be a biologist and is publishing data that results from B's experiments or has been collected as a result of B's research into the literature. B's concerns are with organization and integration of data that has been collected from other sources, with the process of annotation of this data and with the publishing and presentation of the data. B finds computers and the Internet useful because it is easy to add recent data - one doesn't have to wait for the next paper edition to appear, one can build rather rich representations of the data, and it is easy to publish the data in a form that is accessible to the readers. In fact, B is likely to use some form of database technology. In this paper the author describes some of the challenges for database research and the progress that has been made on them: they include data integration, database archiving, annotation, and provenance.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132577843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marios Hadjieleftheriou, V. Kriakov, Yangui Tao, G. Kollios, A. Delis, V. Tsotras
{"title":"Spatio-temporal data services in a shared-nothing environment","authors":"Marios Hadjieleftheriou, V. Kriakov, Yangui Tao, G. Kollios, A. Delis, V. Tsotras","doi":"10.1109/SSDBM.2004.65","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.65","url":null,"abstract":"Recently, there has been a proliferation of applications that produce spatiotemporal data that has to be processed, stored and queried efficiently. These applications necessitate the execution of millions of updates in order to keep the underlying database up-to-date. Consequently, there is a need for spatiotemporal data management systems that are able to support such update intensive operations. Moreover, these systems should offer users the capability to examine present as well as past (historical) data versions in an on-line fashion. We propose a system that exploits the inherent parallelism of a shared-nothing computing environment for storing and indexing the spatiotemporal data. We describe our proposed system architecture, data organization, and outline techniques for ensuring robustness and scalability under excessive query loads and high update rates.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128607656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The schema evolution and data migration framework of the environmental mass database IMIS","authors":"D. Draheim, M. Horn, Ina Schulz","doi":"10.1109/SSDBM.2004.69","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.69","url":null,"abstract":"This paper describes a framework that supports the simultaneous evolution of object-oriented data models and relational schemas with respect to a tool-supported object-relational mapping. The proposed framework accounts for non-trivial data migration induced by type evolution from the outset. The support for data migration is offered on the level of transparent data access. The framework consists of the following integrated parts: an automatic model change detection mechanism, a generator for schema evolution code and a generator for data migration APIs. The framework has been concepted in the IMIS project. IMIS is an information system for environmental radioactivity measurements. Though the indicated domain especially demands a solution like the one discussed in this paper, the achievements are of general purpose for multi-tier system architectures with object-relational mapping.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129242403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}