Synthesis Lectures on Data Management最新文献

筛选
英文 中文
Data Profiling 数据概要分析
Synthesis Lectures on Data Management Pub Date : 2018-11-07 DOI: 10.2200/S00878ED1V01Y201810DTM052
Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock
{"title":"Data Profiling","authors":"Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock","doi":"10.2200/S00878ED1V01Y201810DTM052","DOIUrl":"https://doi.org/10.2200/S00878ED1V01Y201810DTM052","url":null,"abstract":"","PeriodicalId":187413,"journal":{"name":"Synthesis Lectures on Data Management","volume":"54 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113978586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Querying Graphs 查询的图表
Synthesis Lectures on Data Management Pub Date : 2018-10-01 DOI: 10.2200/S00873ED1V01Y201808DTM051
A. Bonifati, G. Fletcher, H. Voigt, N. Yakovets
{"title":"Querying Graphs","authors":"A. Bonifati, G. Fletcher, H. Voigt, N. Yakovets","doi":"10.2200/S00873ED1V01Y201808DTM051","DOIUrl":"https://doi.org/10.2200/S00873ED1V01Y201808DTM051","url":null,"abstract":"","PeriodicalId":187413,"journal":{"name":"Synthesis Lectures on Data Management","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126367349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 91
Natural Language Data Management and Interfaces 自然语言数据管理和接口
Synthesis Lectures on Data Management Pub Date : 2018-08-13 DOI: 10.2200/S00866ED1V01Y201807DTM049
Yunyao Li, Davood Rafiei
{"title":"Natural Language Data Management and Interfaces","authors":"Yunyao Li, Davood Rafiei","doi":"10.2200/S00866ED1V01Y201807DTM049","DOIUrl":"https://doi.org/10.2200/S00866ED1V01Y201807DTM049","url":null,"abstract":"The volume of natural language text data has been rapidly increasing over the past two decades, due to factors such as the growth of the Web, the low cost associated with publishing, and the progress on the digitization of printed texts. This growth combined with the proliferation of natural language systems for search and retrieving information provides tremendous opportunities for studying some of the areas where database systems and natural language processing systems overlap. This book explores two interrelated and important areas of overlap: (1) managing natural language data and (2) developing natural language interfaces to databases. It presents relevant concepts and research questions, state-of-the-art methods, related systems, and research opportunities and challenges covering both areas. Relevant topics discussed on natural language data management include data models, data sources, queries, storage and indexing, and transforming natural language text. Under natural language interfaces, it presents the anatomy of these interfaces to databases, the challenges related to query understanding and query translation, and relevant aspects of user interactions. Each of the challenges is covered in a systematic way: first starting with a quick overview of the topics, followed by a comprehensive view of recent techniques that have been proposed to address the challenge along with illustrative examples. It also reviews some notable systems in details in terms of how they address different challenges and their contributions. Finally, it discusses open challenges and opportunities for natural language management and interfaces. The goal of this book is to provide an introduction to the methods, problems, and solutions that are used in managing natural language data and building natural language interfaces to databases. It serves as a starting point for readers who are interested in pursuing additional work on these exciting topics in both academic and industrial environments.","PeriodicalId":187413,"journal":{"name":"Synthesis Lectures on Data Management","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121319665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Query Processing over Incomplete Databases 不完整数据库的查询处理
Synthesis Lectures on Data Management Pub Date : 2018-08-13 DOI: 10.2200/S00870ED1V01Y201807DTM050
Yunjun Gao, Xiaoye Miao
{"title":"Query Processing over Incomplete Databases","authors":"Yunjun Gao, Xiaoye Miao","doi":"10.2200/S00870ED1V01Y201807DTM050","DOIUrl":"https://doi.org/10.2200/S00870ED1V01Y201807DTM050","url":null,"abstract":"Abstract Incomplete data is part of life and almost all areas of scientific studies. Users tend to skip certain fields when they fill out online forms; participants choose to ignore sensitive quest...","PeriodicalId":187413,"journal":{"name":"Synthesis Lectures on Data Management","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134138015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Human Interaction with Graphs: A Visual Querying Perspective 人类与图形的交互:一个可视化的查询视角
Synthesis Lectures on Data Management Pub Date : 2018-08-08 DOI: 10.2200/S00855ED1V01Y201805DTM047
S. Bhowmick, Byron Choi, Chengkai Li
{"title":"Human Interaction with Graphs: A Visual Querying Perspective","authors":"S. Bhowmick, Byron Choi, Chengkai Li","doi":"10.2200/S00855ED1V01Y201805DTM047","DOIUrl":"https://doi.org/10.2200/S00855ED1V01Y201805DTM047","url":null,"abstract":"Interacting with graphs using queries has emerged as an important research problem for real-world applications that center on large graph data. Given the syntactic complexity of graph query languages (e.g., SPARQL, Cypher), visual graph query interfaces make it easy for non-programmers to query such graph data repositories. In this book, we present recent developments in the emerging area of visual graph querying paradigm that bridges traditional graph querying with human computer interaction (HCI). Specifically, we focus on techniques that emphasize deep integration between the visual graph query interface and the underlying graph query engine. We discuss various strategies and guidance for constructing graph queries visually, interleaving processing of graph queries and visual actions, visual exploration of graph query results, and automated performance study of visual graph querying frameworks. In addition, this book highlights open problems and new research directions. In summary, in this book, we review and summarize the research thus far into the integration of HCI and graph querying to facilitate user-friendly interaction with graph-structured data, giving researchers a snapshot of the current state of the art in this topic, and future research directions.","PeriodicalId":187413,"journal":{"name":"Synthesis Lectures on Data Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128966076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On Uncertain Graphs 论不确定图
Synthesis Lectures on Data Management Pub Date : 2018-07-23 DOI: 10.2200/S00862ED1V01Y201807DTM048
Arijit Khan, Yuan Ye, Lei Chen
{"title":"On Uncertain Graphs","authors":"Arijit Khan, Yuan Ye, Lei Chen","doi":"10.2200/S00862ED1V01Y201807DTM048","DOIUrl":"https://doi.org/10.2200/S00862ED1V01Y201807DTM048","url":null,"abstract":"Abstract Large-scale, highly interconnected networks, which are often modeled as graphs, pervade both our society and the natural world around us. Uncertainty, on the other hand, is inherent in the...","PeriodicalId":187413,"journal":{"name":"Synthesis Lectures on Data Management","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124139822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Full-Text (Substring) Indexes in External Memory 外部内存中的全文(子字符串)索引
Synthesis Lectures on Data Management Pub Date : 2011-12-20 DOI: 10.2200/S00396ED1V01Y201111DTM022
Marina Barsky, U. Stege, Alex Thomo
{"title":"Full-Text (Substring) Indexes in External Memory","authors":"Marina Barsky, U. Stege, Alex Thomo","doi":"10.2200/S00396ED1V01Y201111DTM022","DOIUrl":"https://doi.org/10.2200/S00396ED1V01Y201111DTM022","url":null,"abstract":"Nowadays, textual databases are among the most rapidly growing collections of data. Some of these collections contain a new type of data that differs from classical numerical or textual data. These are long sequences of symbols, not divided into well-separated small tokens (words). The most prominent among such collections are databases of biological sequences, which are experiencing today an unprecedented growth rate. Starting in 2008, the \"1000 Genomes Project\" has been launched with the ultimate goal of collecting sequences of additional 1,500 Human genomes, 500 each of European, African, and East Asian origin. This will produce an extensive catalog of Human genetic variations. The size of just the raw sequences in this catalog would be about 5 terabytes. Querying strings without well-separated tokens poses a different set of challenges, typically addressed by building full-text indexes, which provide effective structures to index all the substrings of the given strings. Since full-text indexes occupy more space than the raw data, it is often necessary to use disk space for their construction. However, until recently, the construction of full-text indexes in secondary storage was considered impractical due to excessive I/O costs. Despite this, algorithms developed in the last decade demonstrated that efficient external construction of full-text indexes is indeed possible. This book is about large-scale construction and usage of full-text indexes. We focus mainly on suffix trees, and show efficient algorithms that can convert suffix trees to other kinds of full-text indexes and vice versa. There are four parts in this book. They are a mix of string searching theory with the reality of external memory constraints. The first part introduces general concepts of full-text indexes and shows the relationships between them. The second part presents the first series of external-memory construction algorithms that can handle the construction of full-text indexes for moderately large strings in the order of few gigabytes. The third part presents algorithms that scale for very large strings. The final part examines queries that can be facilitated by disk-resident full-text indexes. Table of Contents: Structures for Indexing Substrings / External Construction of Suffix Trees / Scaling Up: When the Input Exceeds the Main Memory / Queries for Disk-based Indexes / Conclusions and Open Problems","PeriodicalId":187413,"journal":{"name":"Synthesis Lectures on Data Management","volume":"285 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122970074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Fundamentals of Physical Design and Query Compilation 物理设计与查询编译基础
Synthesis Lectures on Data Management Pub Date : 2011-08-01 DOI: 10.2200/S00363ED1V01Y201105DTM018
David Toman, G. Weddell
{"title":"Fundamentals of Physical Design and Query Compilation","authors":"David Toman, G. Weddell","doi":"10.2200/S00363ED1V01Y201105DTM018","DOIUrl":"https://doi.org/10.2200/S00363ED1V01Y201105DTM018","url":null,"abstract":"Query compilation is the problem of translating user requests formulated over purely conceptual and domain specific ways of understanding data, commonly called logical designs, to efficient executable programs called query plans. Such plans access various concrete data sources through their low-level often iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and how such capabilities relate to logical design is commonly called a physical design. This book is an introduction to the fundamental methods underlying database technology that solves the problem of query compilation. The methods are presented in terms of first-order logic which serves as the vehicle for specifying physical design, expressing user requests and query plans, and understanding how query plans implement user requests. Table of Contents: Introduction / Logical Design and User Queries / Basic Physical Design and Query Plans / On Practical Physical Design / Query Compilation and Plan Synthesis / Updating Data","PeriodicalId":187413,"journal":{"name":"Synthesis Lectures on Data Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129525296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
Access Control in Data Management Systems 数据管理系统中的访问控制
Synthesis Lectures on Data Management Pub Date : 2010-05-06 DOI: 10.2200/S00281ED1V01Y201005DTM004
E. Ferrari
{"title":"Access Control in Data Management Systems","authors":"E. Ferrari","doi":"10.2200/S00281ED1V01Y201005DTM004","DOIUrl":"https://doi.org/10.2200/S00281ED1V01Y201005DTM004","url":null,"abstract":"Access control is one of the fundamental services that any Data Management System should provide. Its main goal is to protect data from unauthorized read and write operations. This is particularly crucial in today's open and interconnected world, where each kind of information can be easily made available to a huge user population, and where a damage or misuse of data may have unpredictable consequences that go beyond the boundaries where data reside or have been generated. This book provides an overview of the various developments in access control for data management systems. Discretionary, mandatory, and role-based access control will be discussed, by surveying the most relevant proposals and analyzing the benefits and drawbacks of each paradigm in view of the requirements of different application domains. Access control mechanisms provided by commercial Data Management Systems are presented and discussed. Finally, the last part of the book is devoted to discussion of some of the most challenging and innovative research trends in the area of access control, such as those related to the Web 2.0 revolution or to the Database as a Service paradigm. This book is a valuable reference for an heterogeneous audience. It can be used as either an extended survey for people who are interested in access control or as a reference book for senior undergraduate or graduate courses in data security with a special focus on access control. It is also useful for technologists, researchers, managers, and developers who want to know more about access control and related emerging trends. Table of Contents: Access Control: Basic Concepts / Discretionary Access Control for Relational Data Management Systems / Discretionary Access Control for Advanced Data Models / Mandatory Access Control / Role-based Access Control / Emerging Trends in Access Control","PeriodicalId":187413,"journal":{"name":"Synthesis Lectures on Data Management","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126331362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
The Four Generations of Entity Resolution 四代实体解析
Synthesis Lectures on Data Management Pub Date : 1900-01-01 DOI: 10.2200/S01067ED1V01Y202012DTM064
G. Papadakis, Ekaterini Ioannou, Emanouil Thanos, Themis Palpanas
{"title":"The Four Generations of Entity Resolution","authors":"G. Papadakis, Ekaterini Ioannou, Emanouil Thanos, Themis Palpanas","doi":"10.2200/S01067ED1V01Y202012DTM064","DOIUrl":"https://doi.org/10.2200/S01067ED1V01Y202012DTM064","url":null,"abstract":"Entity Resolution (ER) lies at the core of data integration and cleaning and, thus, a bulk of the research examines ways for improving its effectiveness and time efficiency. The initial ER methods primarily target Veracity in the context of structured (relational) data that are described by a schema of well-known quality and meaning. To achieve high effectiveness, they leverage schema, expert, and/or external knowledge. Part of these methods are extended to address Volume, processing large datasets through multi-core or massive parallelization approaches, such as the MapReduce paradigm. However, these early schema-based approaches are inapplicable to Web Data, which abound in voluminous, noisy, semi-structured, and highly heterogeneous information. To address the additional challenge of Variety, recent works on ER adopt a novel, loosely schema-aware functionality that emphasizes scalability and robustness to noise. Another line of present research focuses on the additional challenge of Velocity, aiming to process data collections of a continuously increasing volume. The latest works, though, take advantage of the significant breakthroughs in Deep Learning and Crowdsourcing, incorporating external knowledge to enhance the existing words to a significant extent. This synthesis lecture organizes ER methods into four generations based on the challenges posed by these four Vs. For each generation, we outline the corresponding ER workflow, discuss the state-of-the-art methods per workflow step, and present current research directions. The discussion of these methods takes into account a historical perspective, explaining the evolution of the methods over time along with their similarities and differences. The lecture also discusses the available ER tools and benchmark datasets that allow expert as well as novice users to make use of the available solutions.","PeriodicalId":187413,"journal":{"name":"Synthesis Lectures on Data Management","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124280695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信