Advances in database technology : proceedings. International Conference on Extending Database Technology最新文献

筛选
英文 中文
Fair Spatial Indexing: A paradigm for Group Spatial Fairness. 公平空间索引:群体空间公平范例
Sina Shaham, Gabriel Ghinita, Cyrus Shahabi
{"title":"Fair Spatial Indexing: A paradigm for Group Spatial Fairness.","authors":"Sina Shaham, Gabriel Ghinita, Cyrus Shahabi","doi":"10.48786/edbt.2024.14","DOIUrl":"10.48786/edbt.2024.14","url":null,"abstract":"<p><p>Machine learning (ML) is playing an increasing role in decision-making tasks that directly affect individuals, e.g., loan approvals, or job applicant screening. Significant concerns arise that, without special provisions, individuals from under-privileged backgrounds may not get equitable access to services and opportunities. Existing research studies <i>fairness</i> with respect to protected attributes such as gender, race or income, but the impact of location data on fairness has been largely overlooked. With the widespread adoption of mobile apps, geospatial attributes are increasingly used in ML, and their potential to introduce unfair bias is significant, given their high correlation with protected attributes. We propose techniques to mitigate location bias in machine learning. Specifically, we consider the issue of miscalibration when dealing with geospatial attributes. We focus on <i>spatial group fairness</i> and we propose a spatial indexing algorithm that accounts for fairness. Our KD-tree inspired approach significantly improves fairness while maintaining high learning accuracy, as shown by extensive experimental results on real data.</p>","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"27 2","pages":"150-161"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11531788/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142570639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computing Generic Abstractions from Application Datasets 从应用程序数据集计算通用抽象
Nelly Barret, I. Manolescu, P. Upadhyay
{"title":"Computing Generic Abstractions from Application Datasets","authors":"Nelly Barret, I. Manolescu, P. Upadhyay","doi":"10.48786/edbt.2024.09","DOIUrl":"https://doi.org/10.48786/edbt.2024.09","url":null,"abstract":"Digital data plays a central role in sciences, journalism, environment, digital humanities, etc. Open Data sharing initiatives lead to many large, interesting datasets being shared online. Some of these are RDF graphs, but other formats like CSV, relational, property graphs, JSON or XML documents are also frequent. Practitioners need to understand a dataset to decide whether it is suited to their needs. Datasets may come with a schema and/or may be summarized, however the first is not always provided and the latter is often too technical for non-IT users. To overcome these limitations, we present an end-to-end dataset abstraction approach, which ( 𝑖 ) applies on any (semi)structured data model; ( 𝑖𝑖 ) computes a description meant for human users, in the form of an Entity-Relationship diagram; ( 𝑖𝑖𝑖 ) integrates Information Extraction and data profiling to classify dataset entities among a large set of intelligible categories. We implemented our approach in a system called Abstra, and detail its performance on various datasets.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"14 1","pages":"94-107"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87206329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach 用于检测图像数据集中表示偏差的数据覆盖:一种众包方法
Melika Mousavi, N. Shahbazi, Abolfazl Asudeh
{"title":"Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach","authors":"Melika Mousavi, N. Shahbazi, Abolfazl Asudeh","doi":"10.48550/arXiv.2306.13868","DOIUrl":"https://doi.org/10.48550/arXiv.2306.13868","url":null,"abstract":"Existing machine learning models have proven to fail when it comes to their performance for minority groups, mainly due to biases in data. In particular, datasets, especially social data, are often not representative of minorities. In this paper, we consider the problem of representation bias identification on image datasets without explicit attribute values. Using the notion of data coverage for detecting a lack of representation, we develop multiple crowdsourcing approaches. Our core approach, at a high level, is a divide and conquer algorithm that applies a search space pruning strategy to efficiently identify if a dataset misses proper coverage for a given group. We provide a different theoretical analysis of our algorithm, including a tight upper bound on its performance which guarantees its near-optimality. Using this algorithm as the core, we propose multiple heuristics to reduce the coverage detection cost across different cases with multiple intersectional/non-intersectional groups. We demonstrate how the pre-trained predictors are not reliable and hence not sufficient for detecting representation bias in the data. Finally, we adjust our core algorithm to utilize existing models for predicting image group(s) to minimize the coverage identification cost. We conduct extensive experiments, including live experiments on Amazon Mechanical Turk to validate our problem and evaluate our algorithms' performance.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"3 9 1","pages":"47-60"},"PeriodicalIF":0.0,"publicationDate":"2023-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84356880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Auditing for Spatial Fairness 空间公平性审计
Dimitris Sacharidis, G. Giannopoulos, George Papastefanatos, K. Stefanidis
{"title":"Auditing for Spatial Fairness","authors":"Dimitris Sacharidis, G. Giannopoulos, George Papastefanatos, K. Stefanidis","doi":"10.48550/arXiv.2302.12333","DOIUrl":"https://doi.org/10.48550/arXiv.2302.12333","url":null,"abstract":"This paper studies algorithmic fairness when the protected attribute is location. To handle protected attributes that are continuous, such as age or income, the standard approach is to discretize the domain into predefined groups, and compare algorithmic outcomes across groups. However, applying this idea to location raises concerns of gerrymandering and may introduce statistical bias. Prior work addresses these concerns but only for regularly spaced locations, while raising other issues, most notably its inability to discern regions that are likely to exhibit spatial unfairness. Similar to established notions of algorithmic fairness, we define spatial fairness as the statistical independence of outcomes from location. This translates into requiring that for each region of space, the distribution of outcomes is identical inside and outside the region. To allow for localized discrepancies in the distribution of outcomes, we compare how well two competing hypotheses explain the observed outcomes. The null hypothesis assumes spatial fairness, while the alternate allows different distributions inside and outside regions. Their goodness of fit is then assessed by a likelihood ratio test. If there is no significant difference in how well the two hypotheses explain the observed outcomes, we conclude that the algorithm is spatially fair.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"35 1","pages":"485-491"},"PeriodicalIF":0.0,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74184759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
TransEdge: Supporting Efficient Read Queries Across Untrusted Edge Nodes transsedge:支持跨不可信边缘节点的高效读查询
Abhishek A. Singh, Aasim Khan, S. Mehrotra, Faisal Nawab
{"title":"TransEdge: Supporting Efficient Read Queries Across Untrusted Edge Nodes","authors":"Abhishek A. Singh, Aasim Khan, S. Mehrotra, Faisal Nawab","doi":"10.48550/arXiv.2302.08019","DOIUrl":"https://doi.org/10.48550/arXiv.2302.08019","url":null,"abstract":"We propose Transactional Edge (TransEdge), a distributed transaction processing system for untrusted environments such as edge computing systems. What distinguishes TransEdge is its focus on efficient support for read-only transactions. TransEdge allows reading from different partitions consistently using one round in most cases and no more than two rounds in the worst case. TransEdge design is centered around this dependency tracking scheme including the consensus and transaction processing protocols. Our performance evaluation shows that TransEdge's snapshot read-only transactions achieve an 9-24x speedup compared to current byzantine systems.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"22 1","pages":"684-696"},"PeriodicalIF":0.0,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91113987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementing and Evaluating E2LSH on Storage 在存储上实施和评估E2LSH
Yuuichi Nakanishi, Kazuhiro Hiwada, Yosuke Bando, Tomoya Suzuki, H. Kajihara, Shintarou Sano, Tatsuro Endo, Tatsuo Shiozawa
{"title":"Implementing and Evaluating E2LSH on Storage","authors":"Yuuichi Nakanishi, Kazuhiro Hiwada, Yosuke Bando, Tomoya Suzuki, H. Kajihara, Shintarou Sano, Tatsuro Endo, Tatsuo Shiozawa","doi":"10.48786/edbt.2023.35","DOIUrl":"https://doi.org/10.48786/edbt.2023.35","url":null,"abstract":"Locality sensitive hashing (LSH) is one of the widely-used approaches to approximate nearest neighbor search (ANNS) in high-dimensional spaces. The first work on LSH for the Euclidean distance, E2LSH, showed how ANNS can be solved efficiently at a sublinear query time in the database size with theoretically-guaranteed accuracy, although it required a large hash index size. Since then, several LSH variants having much smaller index sizes have been proposed. Their query time is linear or superlinear, but they have been shown to run effectively faster because they require fewer I/Os when the index is stored on hard disk drives and because they also permit in-memory execution with modern DRAM capacity. In this paper, we show that E2LSH is regaining the advantage in query speed with the advent of modern flash storage devices such as solid-state drives (SSDs). We evaluate E2LSH on a modern single-node computing environment and analyze its computational cost and I/O cost, from which we derive storage performance requirements for its external memory execution. Our analysis indicates that E2LSH on a single consumer-grade SSD can run faster than the state-of-the-art small-index methods executed in-memory. It also indicates that E2LSH with emerging high-performance storage devices and interfaces can approach in-memory E2LSH speeds. We implement a simple adaptation of E2LSH to external memory, E2LSH-on-Storage (E2LSHoS), and evaluate it for practical large datasets of up to one billion objects using different combinations of modern storage devices and interfaces. We demonstrate that our E2LSHoS implementation runs much faster than small-index methods and can approach in-memory E2LSH speeds, and also that its query time scales sublinearly with the database size beyond the index size limit of in-memory E2LSH.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"22 1","pages":"437-449"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77757083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient Approach for Indoor Facility Location Selection 室内设施选址的一种有效方法
Yeasir Rayhan, T. Hashem, M. A. Cheema, Hua Lu, Mohammed Eunus Ali
{"title":"An Efficient Approach for Indoor Facility Location Selection","authors":"Yeasir Rayhan, T. Hashem, M. A. Cheema, Hua Lu, Mohammed Eunus Ali","doi":"10.48786/edbt.2023.53","DOIUrl":"https://doi.org/10.48786/edbt.2023.53","url":null,"abstract":"The advancement of indoor location-aware technologies enables a wide range of location based services in indoor spaces. In this paper, we formulate a novel Indoor Facility Location Selection (IFLS) query that finds the optimal location for placing a new facility (e.g., a coffee station) in an indoor venue (e.g., a university building) such that the maximum distance of all clients (e.g., staffs/students) to their nearest facility is minimized. To the best of our knowledge we are the first to address this problem in an indoor setting. We first adapt the state-of-the-art solution in road networks for indoor settings, which exposes the limitations of existing approaches to solve our problem in an indoor space. Therefore, we propose an efficient approach which prunes the search space in terms of the number of clients considered, and the total number of facilities retrieved from the database, thus reducing the total number of indoor distance calculations required. The key idea of our approach is to use a single pass on a state-of-the-art index for an indoor space, and reuse the nearest neighbor computation of clients to prune irrelevant facilities and clients. We evaluate the performance of both approaches on four indoor datasets. Our approach achieves a speedup from 2 . 84 × to 71 . 29 × for synthetic data and 97 . 74 × for real data over the baseline.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"33 1","pages":"632-644"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77819825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A simplified Architecture for Fast, Adaptive Compilation and Execution of SQL Queries 用于快速、自适应编译和执行SQL查询的简化架构
Immanuel Haffner, J. Dittrich
{"title":"A simplified Architecture for Fast, Adaptive Compilation and Execution of SQL Queries","authors":"Immanuel Haffner, J. Dittrich","doi":"10.48786/edbt.2023.01","DOIUrl":"https://doi.org/10.48786/edbt.2023.01","url":null,"abstract":"Query compilation is crucial to efficiently execute query plans. In the past decade, we have witnessed considerable progress in this field, including compilation with LLVM, adaptively switching from interpretation to compiled code, as well as adaptively switching from non-optimized to optimized code. All of these ideas aim to reduce latency and/or increase throughput. However, these approaches require immense engineering effort, a considerable part of which includes reengineering very fundamental techniques from the compiler construction community, like register allocation or machine code generation – techniques studied in this field for decades. In this paper, we argue","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"1 1","pages":"1-13"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88929963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Supporting Complex Query Time Enrichment For Analytics 支持复杂的查询时间丰富分析
Dhrubajyoti Ghosh, Peeyush Gupta, S. Mehrotra, Shantanu Sharma
{"title":"Supporting Complex Query Time Enrichment For Analytics","authors":"Dhrubajyoti Ghosh, Peeyush Gupta, S. Mehrotra, Shantanu Sharma","doi":"10.48786/edbt.2023.08","DOIUrl":"https://doi.org/10.48786/edbt.2023.08","url":null,"abstract":"Several application domains require data to be enriched prior to its use. Data enrichment is often performed using expensive machine learning models to interpret low-level data ( e . g ., models for face detection) into semantically meaningful observation. Col-lecting and enriching data offline before loading it to a database is infeasible if one desires online analysis on data as it arrives. Enriching data on the fly at insertion could result in redundant work (if applications require only a fraction of the data to be enriched) and could result in a bottleneck (if enrichment functions are expensive). Any scalable solution requires enrichment during query processing. This paper explores two different architectures for integrating enrichment into query processing – a loosely coupled approach wherein enrichment is performed outside of the DBMS and a tightly coupled approach wherein it is performed within the DBMS. The paper addresses the challenges of increased query latency due to query time enrichment.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"91 1","pages":"92-104"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80872252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FELIP: A local Differentially Private approach to frequency estimation on multidimensional datasets 基于局部差分私有的多维数据集频率估计方法
José S. Costa Filho, Javam C. Machado
{"title":"FELIP: A local Differentially Private approach to frequency estimation on multidimensional datasets","authors":"José S. Costa Filho, Javam C. Machado","doi":"10.48786/edbt.2023.56","DOIUrl":"https://doi.org/10.48786/edbt.2023.56","url":null,"abstract":"Local Differential Privacy (LDP) allows answering queries on users data while maintaining their privacy. Queries are often is-sued on multidimensional datasets with categorical and numeric dimensions. In this paper, we tackle the problem of answering counting queries over multidimensional datasets with categorical and numeric dimensions under LDP. In the setting without a trusted central agent, the user’s private dimensions are firstly perturbed locally to preserve privacy and then sent to an aggregator who will be able to estimate answers to queries. We build our approach on the existing idea of using grids. Mapping users dimensions into grids which are perturbed and sent to the aggregator so it can estimate the real data distributions to answer different queries on the dimensions collected. Finer-grained grids lead to greater error due to noises, while coarser-grained ones result in greater error due to biases. We propose optimizing the construction of grids taking into consideration a number of different factors to obtain better accuracy. Also, we propose to adaptively select the LDP algorithm that based on the grid characteristics will provide the better utility. We conduct experiments on real and synthetic datasets and compare our solution with existing approaches.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"58 1","pages":"671-683"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84830983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信