{"title":"Detection and limitation of interval inference in statistical databases","authors":"Claus Boyens, O. Günther","doi":"10.1109/SSDBM.2004.29","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.29","url":null,"abstract":"Interval inference is a specific kind of statistical disclosure where a snooper collects and analyzes publicly available data to determine tight bounds on confidential numerical data. Institutions that disseminate public data include Census Bureaus and other independent organizations such as regional healthcare initiatives that provide chronic disease data that is collected from physicians, pharmacies and health maintenance organizations (HMOs). Such initiatives must ensure that the confidential values of the data providers are protected against interval inference while making sure that the released information is still useful for the prospective data users (such as medical researchers). In this paper, we consider the important case of 2-dimensional tables where the rows correspond to the data providers and the columns to confidential data categories. Although the inner cells of this table are confidential and should under no circumstances be published, marginal information about central tendency and dispersion can still be useful and worth publishing. It is the task of the data-disseminating institution to elicit these specific marginal data elements for publication such that no tight bounds on any inner table cell can be inferred. We present a new method that maximizes the usefulness of the disseminated information to the prospective data users while ensuring the confidentiality of the inner table cell values. We give a computational analysis and compare our methods to existing statistical disclosure methods.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116736351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improvements in distance-based indexing","authors":"M. Tasan, Z. M. Özsoyoglu","doi":"10.1109/SSDBM.2004.42","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.42","url":null,"abstract":"This work offers some improvements in the current distance-based indexing techniques. An optimal similarity search algorithm that is adopted from vector-based indexing is shown to be also optimal for distance-based indices. Farther similarity between the two types of indexing is revealed, leading to a general description of search structures. A probabilistic analysis of distance-based tree indices is also shown to be possible, allowing direct comparisons of structures without the need for extensive experimentation. This analysis will lend itself to future improved index construction algorithms.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115202400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DataMover: robust terabyte-scale multi-file replication over wide-area networks","authors":"A. Sim, Junmin Gu, A. Shoshani, V. Natarajan","doi":"10.1109/SSDBM.2004.28","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.28","url":null,"abstract":"Typically, large scientific datasets (order of terabytes) are generated at large computational centers, and stored on mass storage systems. However, large subsets of the data need to be moved to facilities available to application scientists for analysis. File replication of thousands of files is a tedious, error prone, but extremely important task in scientific applications. The automation of the file replication task requires automatic space acquisition and reuse, and monitoring the progress of staging thousands of files from the source mass storage system, transferring them over the network, archiving them at the target mass storage system or disk systems, and recovering from transient system failures. We have developed a robust replication system, called DataMover, which is now in regular use in High-Energy-Physics and Climate modeling experiments. Only a single command is necessary to request multi-file replication or the replication of an entire directory. A Web-based tool was developed to dynamically monitor the progress of the multi-file replication process.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125607695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}