Advances in database technology : proceedings. International Conference on Extending Database Technology最新文献

Fair Spatial Indexing: A paradigm for Group Spatial Fairness. 公平空间索引：群体空间公平范例

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2024-01-01 DOI: 10.48786/edbt.2024.14

Sina Shaham, Gabriel Ghinita, Cyrus Shahabi

{"title":"Fair Spatial Indexing: A paradigm for Group Spatial Fairness.","authors":"Sina Shaham, Gabriel Ghinita, Cyrus Shahabi","doi":"10.48786/edbt.2024.14","DOIUrl":"10.48786/edbt.2024.14","url":null,"abstract":"Machine learning (ML) is playing an increasing role in decision-making tasks that directly affect individuals, e.g., loan approvals, or job applicant screening. Significant concerns arise that, without special provisions, individuals from under-privileged backgrounds may not get equitable access to services and opportunities. Existing research studies fairness with respect to protected attributes such as gender, race or income, but the impact of location data on fairness has been largely overlooked. With the widespread adoption of mobile apps, geospatial attributes are increasingly used in ML, and their potential to introduce unfair bias is significant, given their high correlation with protected attributes. We propose techniques to mitigate location bias in machine learning. Specifically, we consider the issue of miscalibration when dealing with geospatial attributes. We focus on spatial group fairness and we propose a spatial indexing algorithm that accounts for fairness. Our KD-tree inspired approach significantly improves fairness while maintaining high learning accuracy, as shown by extensive experimental results on real data.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"27 2","pages":"150-161"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11531788/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142570639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Computing Generic Abstractions from Application Datasets 从应用程序数据集计算通用抽象

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2024-01-01 DOI: 10.48786/edbt.2024.09

Nelly Barret, I. Manolescu, P. Upadhyay

引用次数: 1

Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach 用于检测图像数据集中表示偏差的数据覆盖:一种众包方法

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-06-24 DOI: 10.48550/arXiv.2306.13868

Melika Mousavi, N. Shahbazi, Abolfazl Asudeh

{"title":"Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach","authors":"Melika Mousavi, N. Shahbazi, Abolfazl Asudeh","doi":"10.48550/arXiv.2306.13868","DOIUrl":"https://doi.org/10.48550/arXiv.2306.13868","url":null,"abstract":"Existing machine learning models have proven to fail when it comes to their performance for minority groups, mainly due to biases in data. In particular, datasets, especially social data, are often not representative of minorities. In this paper, we consider the problem of representation bias identification on image datasets without explicit attribute values. Using the notion of data coverage for detecting a lack of representation, we develop multiple crowdsourcing approaches. Our core approach, at a high level, is a divide and conquer algorithm that applies a search space pruning strategy to efficiently identify if a dataset misses proper coverage for a given group. We provide a different theoretical analysis of our algorithm, including a tight upper bound on its performance which guarantees its near-optimality. Using this algorithm as the core, we propose multiple heuristics to reduce the coverage detection cost across different cases with multiple intersectional/non-intersectional groups. We demonstrate how the pre-trained predictors are not reliable and hence not sufficient for detecting representation bias in the data. Finally, we adjust our core algorithm to utilize existing models for predicting image group(s) to minimize the coverage identification cost. We conduct extensive experiments, including live experiments on Amazon Mechanical Turk to validate our problem and evaluate our algorithms' performance.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"3 9 1","pages":"47-60"},"PeriodicalIF":0.0,"publicationDate":"2023-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84356880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Auditing for Spatial Fairness 空间公平性审计

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-02-23 DOI: 10.48550/arXiv.2302.12333

Dimitris Sacharidis, G. Giannopoulos, George Papastefanatos, K. Stefanidis

{"title":"Auditing for Spatial Fairness","authors":"Dimitris Sacharidis, G. Giannopoulos, George Papastefanatos, K. Stefanidis","doi":"10.48550/arXiv.2302.12333","DOIUrl":"https://doi.org/10.48550/arXiv.2302.12333","url":null,"abstract":"This paper studies algorithmic fairness when the protected attribute is location. To handle protected attributes that are continuous, such as age or income, the standard approach is to discretize the domain into predefined groups, and compare algorithmic outcomes across groups. However, applying this idea to location raises concerns of gerrymandering and may introduce statistical bias. Prior work addresses these concerns but only for regularly spaced locations, while raising other issues, most notably its inability to discern regions that are likely to exhibit spatial unfairness. Similar to established notions of algorithmic fairness, we define spatial fairness as the statistical independence of outcomes from location. This translates into requiring that for each region of space, the distribution of outcomes is identical inside and outside the region. To allow for localized discrepancies in the distribution of outcomes, we compare how well two competing hypotheses explain the observed outcomes. The null hypothesis assumes spatial fairness, while the alternate allows different distributions inside and outside regions. Their goodness of fit is then assessed by a likelihood ratio test. If there is no significant difference in how well the two hypotheses explain the observed outcomes, we conclude that the algorithm is spatially fair.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"35 1","pages":"485-491"},"PeriodicalIF":0.0,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74184759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

TransEdge: Supporting Efficient Read Queries Across Untrusted Edge Nodes transsedge:支持跨不可信边缘节点的高效读查询

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-02-16 DOI: 10.48550/arXiv.2302.08019

Abhishek A. Singh, Aasim Khan, S. Mehrotra, Faisal Nawab

引用次数: 0

Implementing and Evaluating E2LSH on Storage 在存储上实施和评估E2LSH

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.35

Yuuichi Nakanishi, Kazuhiro Hiwada, Yosuke Bando, Tomoya Suzuki, H. Kajihara, Shintarou Sano, Tatsuro Endo, Tatsuo Shiozawa

{"title":"Implementing and Evaluating E2LSH on Storage","authors":"Yuuichi Nakanishi, Kazuhiro Hiwada, Yosuke Bando, Tomoya Suzuki, H. Kajihara, Shintarou Sano, Tatsuro Endo, Tatsuo Shiozawa","doi":"10.48786/edbt.2023.35","DOIUrl":"https://doi.org/10.48786/edbt.2023.35","url":null,"abstract":"Locality sensitive hashing (LSH) is one of the widely-used approaches to approximate nearest neighbor search (ANNS) in high-dimensional spaces. The first work on LSH for the Euclidean distance, E2LSH, showed how ANNS can be solved efficiently at a sublinear query time in the database size with theoretically-guaranteed accuracy, although it required a large hash index size. Since then, several LSH variants having much smaller index sizes have been proposed. Their query time is linear or superlinear, but they have been shown to run effectively faster because they require fewer I/Os when the index is stored on hard disk drives and because they also permit in-memory execution with modern DRAM capacity. In this paper, we show that E2LSH is regaining the advantage in query speed with the advent of modern flash storage devices such as solid-state drives (SSDs). We evaluate E2LSH on a modern single-node computing environment and analyze its computational cost and I/O cost, from which we derive storage performance requirements for its external memory execution. Our analysis indicates that E2LSH on a single consumer-grade SSD can run faster than the state-of-the-art small-index methods executed in-memory. It also indicates that E2LSH with emerging high-performance storage devices and interfaces can approach in-memory E2LSH speeds. We implement a simple adaptation of E2LSH to external memory, E2LSH-on-Storage (E2LSHoS), and evaluate it for practical large datasets of up to one billion objects using different combinations of modern storage devices and interfaces. We demonstrate that our E2LSHoS implementation runs much faster than small-index methods and can approach in-memory E2LSH speeds, and also that its query time scales sublinearly with the database size beyond the index size limit of in-memory E2LSH.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"22 1","pages":"437-449"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77757083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Efficient Approach for Indoor Facility Location Selection 室内设施选址的一种有效方法

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.53

Yeasir Rayhan, T. Hashem, M. A. Cheema, Hua Lu, Mohammed Eunus Ali

{"title":"An Efficient Approach for Indoor Facility Location Selection","authors":"Yeasir Rayhan, T. Hashem, M. A. Cheema, Hua Lu, Mohammed Eunus Ali","doi":"10.48786/edbt.2023.53","DOIUrl":"https://doi.org/10.48786/edbt.2023.53","url":null,"abstract":"The advancement of indoor location-aware technologies enables a wide range of location based services in indoor spaces. In this paper, we formulate a novel Indoor Facility Location Selection (IFLS) query that finds the optimal location for placing a new facility (e.g., a coffee station) in an indoor venue (e.g., a university building) such that the maximum distance of all clients (e.g., staffs/students) to their nearest facility is minimized. To the best of our knowledge we are the first to address this problem in an indoor setting. We first adapt the state-of-the-art solution in road networks for indoor settings, which exposes the limitations of existing approaches to solve our problem in an indoor space. Therefore, we propose an efficient approach which prunes the search space in terms of the number of clients considered, and the total number of facilities retrieved from the database, thus reducing the total number of indoor distance calculations required. The key idea of our approach is to use a single pass on a state-of-the-art index for an indoor space, and reuse the nearest neighbor computation of clients to prune irrelevant facilities and clients. We evaluate the performance of both approaches on four indoor datasets. Our approach achieves a speedup from 2 . 84 × to 71 . 29 × for synthetic data and 97 . 74 × for real data over the baseline.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"33 1","pages":"632-644"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77819825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A simplified Architecture for Fast, Adaptive Compilation and Execution of SQL Queries 用于快速、自适应编译和执行SQL查询的简化架构

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.01

Immanuel Haffner, J. Dittrich

引用次数: 6

Supporting Complex Query Time Enrichment For Analytics 支持复杂的查询时间丰富分析

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.08

Dhrubajyoti Ghosh, Peeyush Gupta, S. Mehrotra, Shantanu Sharma

引用次数: 1

FELIP: A local Differentially Private approach to frequency estimation on multidimensional datasets 基于局部差分私有的多维数据集频率估计方法

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.56

José S. Costa Filho, Javam C. Machado

{"title":"FELIP: A local Differentially Private approach to frequency estimation on multidimensional datasets","authors":"José S. Costa Filho, Javam C. Machado","doi":"10.48786/edbt.2023.56","DOIUrl":"https://doi.org/10.48786/edbt.2023.56","url":null,"abstract":"Local Differential Privacy (LDP) allows answering queries on users data while maintaining their privacy. Queries are often is-sued on multidimensional datasets with categorical and numeric dimensions. In this paper, we tackle the problem of answering counting queries over multidimensional datasets with categorical and numeric dimensions under LDP. In the setting without a trusted central agent, the user’s private dimensions are firstly perturbed locally to preserve privacy and then sent to an aggregator who will be able to estimate answers to queries. We build our approach on the existing idea of using grids. Mapping users dimensions into grids which are perturbed and sent to the aggregator so it can estimate the real data distributions to answer different queries on the dimensions collected. Finer-grained grids lead to greater error due to noises, while coarser-grained ones result in greater error due to biases. We propose optimizing the construction of grids taking into consideration a number of different factors to obtain better accuracy. Also, we propose to adaptively select the LDP algorithm that based on the grid characteristics will provide the better utility. We conduct experiments on real and synthetic datasets and compare our solution with existing approaches.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"58 1","pages":"671-683"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84830983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3