Synthesis Lectures on Data Management最新文献_第2页

Data Profiling 数据概要分析

Synthesis Lectures on Data Management Pub Date : 2018-11-07 DOI: 10.2200/S00878ED1V01Y201810DTM052

Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock

引用次数: 1

Querying Graphs 查询的图表

Synthesis Lectures on Data Management Pub Date : 2018-10-01 DOI: 10.2200/S00873ED1V01Y201808DTM051

A. Bonifati, G. Fletcher, H. Voigt, N. Yakovets

引用次数: 91

Natural Language Data Management and Interfaces 自然语言数据管理和接口

Synthesis Lectures on Data Management Pub Date : 2018-08-13 DOI: 10.2200/S00866ED1V01Y201807DTM049

Yunyao Li, Davood Rafiei

{"title":"Natural Language Data Management and Interfaces","authors":"Yunyao Li, Davood Rafiei","doi":"10.2200/S00866ED1V01Y201807DTM049","DOIUrl":"https://doi.org/10.2200/S00866ED1V01Y201807DTM049","url":null,"abstract":"The volume of natural language text data has been rapidly increasing over the past two decades, due to factors such as the growth of the Web, the low cost associated with publishing, and the progress on the digitization of printed texts. This growth combined with the proliferation of natural language systems for search and retrieving information provides tremendous opportunities for studying some of the areas where database systems and natural language processing systems overlap. This book explores two interrelated and important areas of overlap: (1) managing natural language data and (2) developing natural language interfaces to databases. It presents relevant concepts and research questions, state-of-the-art methods, related systems, and research opportunities and challenges covering both areas. Relevant topics discussed on natural language data management include data models, data sources, queries, storage and indexing, and transforming natural language text. Under natural language interfaces, it presents the anatomy of these interfaces to databases, the challenges related to query understanding and query translation, and relevant aspects of user interactions. Each of the challenges is covered in a systematic way: first starting with a quick overview of the topics, followed by a comprehensive view of recent techniques that have been proposed to address the challenge along with illustrative examples. It also reviews some notable systems in details in terms of how they address different challenges and their contributions. Finally, it discusses open challenges and opportunities for natural language management and interfaces. The goal of this book is to provide an introduction to the methods, problems, and solutions that are used in managing natural language data and building natural language interfaces to databases. It serves as a starting point for readers who are interested in pursuing additional work on these exciting topics in both academic and industrial environments.","PeriodicalId":187413,"journal":{"name":"Synthesis Lectures on Data Management","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121319665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Query Processing over Incomplete Databases 不完整数据库的查询处理

Synthesis Lectures on Data Management Pub Date : 2018-08-13 DOI: 10.2200/S00870ED1V01Y201807DTM050

Yunjun Gao, Xiaoye Miao

引用次数: 7

Human Interaction with Graphs: A Visual Querying Perspective 人类与图形的交互:一个可视化的查询视角

Synthesis Lectures on Data Management Pub Date : 2018-08-08 DOI: 10.2200/S00855ED1V01Y201805DTM047

S. Bhowmick, Byron Choi, Chengkai Li

{"title":"Human Interaction with Graphs: A Visual Querying Perspective","authors":"S. Bhowmick, Byron Choi, Chengkai Li","doi":"10.2200/S00855ED1V01Y201805DTM047","DOIUrl":"https://doi.org/10.2200/S00855ED1V01Y201805DTM047","url":null,"abstract":"Interacting with graphs using queries has emerged as an important research problem for real-world applications that center on large graph data. Given the syntactic complexity of graph query languages (e.g., SPARQL, Cypher), visual graph query interfaces make it easy for non-programmers to query such graph data repositories. In this book, we present recent developments in the emerging area of visual graph querying paradigm that bridges traditional graph querying with human computer interaction (HCI). Specifically, we focus on techniques that emphasize deep integration between the visual graph query interface and the underlying graph query engine. We discuss various strategies and guidance for constructing graph queries visually, interleaving processing of graph queries and visual actions, visual exploration of graph query results, and automated performance study of visual graph querying frameworks. In addition, this book highlights open problems and new research directions. In summary, in this book, we review and summarize the research thus far into the integration of HCI and graph querying to facilitate user-friendly interaction with graph-structured data, giving researchers a snapshot of the current state of the art in this topic, and future research directions.","PeriodicalId":187413,"journal":{"name":"Synthesis Lectures on Data Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128966076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

On Uncertain Graphs 论不确定图

Synthesis Lectures on Data Management Pub Date : 2018-07-23 DOI: 10.2200/S00862ED1V01Y201807DTM048

Arijit Khan, Yuan Ye, Lei Chen

引用次数: 23

Full-Text (Substring) Indexes in External Memory 外部内存中的全文(子字符串)索引

Synthesis Lectures on Data Management Pub Date : 2011-12-20 DOI: 10.2200/S00396ED1V01Y201111DTM022

Marina Barsky, U. Stege, Alex Thomo

{"title":"Full-Text (Substring) Indexes in External Memory","authors":"Marina Barsky, U. Stege, Alex Thomo","doi":"10.2200/S00396ED1V01Y201111DTM022","DOIUrl":"https://doi.org/10.2200/S00396ED1V01Y201111DTM022","url":null,"abstract":"Nowadays, textual databases are among the most rapidly growing collections of data. Some of these collections contain a new type of data that differs from classical numerical or textual data. These are long sequences of symbols, not divided into well-separated small tokens (words). The most prominent among such collections are databases of biological sequences, which are experiencing today an unprecedented growth rate. Starting in 2008, the \"1000 Genomes Project\" has been launched with the ultimate goal of collecting sequences of additional 1,500 Human genomes, 500 each of European, African, and East Asian origin. This will produce an extensive catalog of Human genetic variations. The size of just the raw sequences in this catalog would be about 5 terabytes. Querying strings without well-separated tokens poses a different set of challenges, typically addressed by building full-text indexes, which provide effective structures to index all the substrings of the given strings. Since full-text indexes occupy more space than the raw data, it is often necessary to use disk space for their construction. However, until recently, the construction of full-text indexes in secondary storage was considered impractical due to excessive I/O costs. Despite this, algorithms developed in the last decade demonstrated that efficient external construction of full-text indexes is indeed possible. This book is about large-scale construction and usage of full-text indexes. We focus mainly on suffix trees, and show efficient algorithms that can convert suffix trees to other kinds of full-text indexes and vice versa. There are four parts in this book. They are a mix of string searching theory with the reality of external memory constraints. The first part introduces general concepts of full-text indexes and shows the relationships between them. The second part presents the first series of external-memory construction algorithms that can handle the construction of full-text indexes for moderately large strings in the order of few gigabytes. The third part presents algorithms that scale for very large strings. The final part examines queries that can be facilitated by disk-resident full-text indexes. Table of Contents: Structures for Indexing Substrings / External Construction of Suffix Trees / Scaling Up: When the Input Exceeds the Main Memory / Queries for Disk-based Indexes / Conclusions and Open Problems","PeriodicalId":187413,"journal":{"name":"Synthesis Lectures on Data Management","volume":"285 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122970074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Fundamentals of Physical Design and Query Compilation 物理设计与查询编译基础

Synthesis Lectures on Data Management Pub Date : 2011-08-01 DOI: 10.2200/S00363ED1V01Y201105DTM018

David Toman, G. Weddell

引用次数: 66

Access Control in Data Management Systems 数据管理系统中的访问控制

Synthesis Lectures on Data Management Pub Date : 2010-05-06 DOI: 10.2200/S00281ED1V01Y201005DTM004

E. Ferrari

{"title":"Access Control in Data Management Systems","authors":"E. Ferrari","doi":"10.2200/S00281ED1V01Y201005DTM004","DOIUrl":"https://doi.org/10.2200/S00281ED1V01Y201005DTM004","url":null,"abstract":"Access control is one of the fundamental services that any Data Management System should provide. Its main goal is to protect data from unauthorized read and write operations. This is particularly crucial in today's open and interconnected world, where each kind of information can be easily made available to a huge user population, and where a damage or misuse of data may have unpredictable consequences that go beyond the boundaries where data reside or have been generated. This book provides an overview of the various developments in access control for data management systems. Discretionary, mandatory, and role-based access control will be discussed, by surveying the most relevant proposals and analyzing the benefits and drawbacks of each paradigm in view of the requirements of different application domains. Access control mechanisms provided by commercial Data Management Systems are presented and discussed. Finally, the last part of the book is devoted to discussion of some of the most challenging and innovative research trends in the area of access control, such as those related to the Web 2.0 revolution or to the Database as a Service paradigm. This book is a valuable reference for an heterogeneous audience. It can be used as either an extended survey for people who are interested in access control or as a reference book for senior undergraduate or graduate courses in data security with a special focus on access control. It is also useful for technologists, researchers, managers, and developers who want to know more about access control and related emerging trends. Table of Contents: Access Control: Basic Concepts / Discretionary Access Control for Relational Data Management Systems / Discretionary Access Control for Advanced Data Models / Mandatory Access Control / Role-based Access Control / Emerging Trends in Access Control","PeriodicalId":187413,"journal":{"name":"Synthesis Lectures on Data Management","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126331362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

The Four Generations of Entity Resolution 四代实体解析

Synthesis Lectures on Data Management Pub Date : 1900-01-01 DOI: 10.2200/S01067ED1V01Y202012DTM064

G. Papadakis, Ekaterini Ioannou, Emanouil Thanos, Themis Palpanas

{"title":"The Four Generations of Entity Resolution","authors":"G. Papadakis, Ekaterini Ioannou, Emanouil Thanos, Themis Palpanas","doi":"10.2200/S01067ED1V01Y202012DTM064","DOIUrl":"https://doi.org/10.2200/S01067ED1V01Y202012DTM064","url":null,"abstract":"Entity Resolution (ER) lies at the core of data integration and cleaning and, thus, a bulk of the research examines ways for improving its eﬀectiveness and time eﬃciency. The initial ER methods primarily target Veracity in the context of structured (relational) data that are described by a schema of well-known quality and meaning. To achieve high eﬀectiveness, they leverage schema, expert, and/or external knowledge. Part of these methods are extended to address Volume, processing large datasets through multi-core or massive parallelization approaches, such as the MapReduce paradigm. However, these early schema-based approaches are inapplicable to Web Data, which abound in voluminous, noisy, semi-structured, and highly heterogeneous information. To address the additional challenge of Variety, recent works on ER adopt a novel, loosely schema-aware functionality that emphasizes scalability and robustness to noise. Another line of present research focuses on the additional challenge of Velocity, aiming to process data collections of a continuously increasing volume. The latest works, though, take advantage of the signiﬁcant breakthroughs in Deep Learning and Crowdsourcing, incorporating external knowledge to enhance the existing words to a signiﬁcant extent. This synthesis lecture organizes ER methods into four generations based on the challenges posed by these four Vs. For each generation, we outline the corresponding ER workﬂow, discuss the state-of-the-art methods per workﬂow step, and present current research directions. The discussion of these methods takes into account a historical perspective, explaining the evolution of the methods over time along with their similarities and diﬀerences. The lecture also discusses the available ER tools and benchmark datasets that allow expert as well as novice users to make use of the available solutions.","PeriodicalId":187413,"journal":{"name":"Synthesis Lectures on Data Management","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124280695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44