DeepDive:声明式知识库构建

IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS
Christopher De Sa, Alexander J. Ratner, Christopher Ré, Jaeho Shin, Feiran Wang, Sen Wu, Ce Zhang
{"title":"DeepDive:声明式知识库构建","authors":"Christopher De Sa, Alexander J. Ratner, Christopher Ré, Jaeho Shin, Feiran Wang, Sen Wu, Ce Zhang","doi":"10.1145/2949741.2949756","DOIUrl":null,"url":null,"abstract":"The dark data extraction or knowledge base construction (KBC) problem is to populate a SQL database with information from unstructured data sources including emails, webpages, and pdf reports. KBC is a long-standing problem in industry and research that encompasses problems of data extraction, cleaning, and integration. We describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems. The key idea in DeepDive is that statistical inference and machine learning are key tools to attack classical data problems in extraction, cleaning, and integration in a unified and more effective manner. DeepDive programs are declarative in that one cannot write probabilistic inference algorithms; instead, one interacts by defining features or rules about the domain. A key reason for this design choice is to enable domain experts to build their own KBC systems. We present the applications, abstractions, and techniques of DeepDive employed to accelerate construction of KBC systems.","PeriodicalId":49524,"journal":{"name":"Sigmod Record","volume":"45 1 1","pages":"60-67"},"PeriodicalIF":0.9000,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2949741.2949756","citationCount":"133","resultStr":"{\"title\":\"DeepDive: Declarative Knowledge Base Construction\",\"authors\":\"Christopher De Sa, Alexander J. Ratner, Christopher Ré, Jaeho Shin, Feiran Wang, Sen Wu, Ce Zhang\",\"doi\":\"10.1145/2949741.2949756\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The dark data extraction or knowledge base construction (KBC) problem is to populate a SQL database with information from unstructured data sources including emails, webpages, and pdf reports. KBC is a long-standing problem in industry and research that encompasses problems of data extraction, cleaning, and integration. We describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems. The key idea in DeepDive is that statistical inference and machine learning are key tools to attack classical data problems in extraction, cleaning, and integration in a unified and more effective manner. DeepDive programs are declarative in that one cannot write probabilistic inference algorithms; instead, one interacts by defining features or rules about the domain. A key reason for this design choice is to enable domain experts to build their own KBC systems. We present the applications, abstractions, and techniques of DeepDive employed to accelerate construction of KBC systems.\",\"PeriodicalId\":49524,\"journal\":{\"name\":\"Sigmod Record\",\"volume\":\"45 1 1\",\"pages\":\"60-67\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2016-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1145/2949741.2949756\",\"citationCount\":\"133\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sigmod Record\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/2949741.2949756\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sigmod Record","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/2949741.2949756","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 133

摘要

暗数据提取或知识库构建(KBC)问题是用来自非结构化数据源(包括电子邮件、网页和pdf报告)的信息填充SQL数据库。KBC是工业和研究中一个长期存在的问题,包括数据提取、清理和集成问题。我们描述了DeepDive,这是一个结合数据库和机器学习思想来帮助开发KBC系统的系统。DeepDive的关键思想是,统计推理和机器学习是以统一和更有效的方式解决提取、清理和集成中的经典数据问题的关键工具。DeepDive程序是声明性的,因此不能编写概率推理算法;相反,可以通过定义有关领域的特性或规则进行交互。选择这种设计的一个关键原因是使领域专家能够构建他们自己的KBC系统。我们介绍了用于加速构建KBC系统的DeepDive的应用、抽象和技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DeepDive: Declarative Knowledge Base Construction
The dark data extraction or knowledge base construction (KBC) problem is to populate a SQL database with information from unstructured data sources including emails, webpages, and pdf reports. KBC is a long-standing problem in industry and research that encompasses problems of data extraction, cleaning, and integration. We describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems. The key idea in DeepDive is that statistical inference and machine learning are key tools to attack classical data problems in extraction, cleaning, and integration in a unified and more effective manner. DeepDive programs are declarative in that one cannot write probabilistic inference algorithms; instead, one interacts by defining features or rules about the domain. A key reason for this design choice is to enable domain experts to build their own KBC systems. We present the applications, abstractions, and techniques of DeepDive employed to accelerate construction of KBC systems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Sigmod Record
Sigmod Record 工程技术-计算机:软件工程
CiteScore
3.10
自引率
9.10%
发文量
41
审稿时长
>12 weeks
期刊介绍: SIGMOD investigates the development and application of database technology to support the full range of data management needs. The scope of interests and members is wide with an almost equal mix of people from industryand academia. SIGMOD sponsors an annual conference that is regarded as one of the most important in the field, particularly for practitioners. Areas of Special Interest: Active and temporal data management, data mining and models, database programming languages, databases on the WWW, distributed data management, engineering, federated multi-database and mobile management, query processing & optimization, rapid application development tools, spatial data management, user interfaces.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信