SpannerLib: Embedding Declarative Information Extraction in an Imperative Workflow

Dean Light, Ahmad Aiashy, Mahmoud Diab, Daniel Nachmias, Stijn Vansummeren, Benny Kimelfeld
{"title":"SpannerLib: Embedding Declarative Information Extraction in an Imperative Workflow","authors":"Dean Light, Ahmad Aiashy, Mahmoud Diab, Daniel Nachmias, Stijn Vansummeren, Benny Kimelfeld","doi":"arxiv-2409.01736","DOIUrl":null,"url":null,"abstract":"Document spanners have been proposed as a formal framework for declarative\nInformation Extraction (IE) from text, following IE products from the industry\nand academia. Over the past decade, the framework has been studied thoroughly\nin terms of expressive power, complexity, and the ability to naturally combine\ntext analysis with relational querying. This demonstration presents SpannerLib\na library for embedding document spanners in Python code. SpannerLib\nfacilitates the development of IE programs by providing an implementation of\nSpannerlog (Datalog-based documentspanners) that interacts with the Python code\nin two directions: rules can be embedded inside Python, and they can invoke\ncustom Python code (e.g., calls to ML-based NLP models) via user-defined\nfunctions. The demonstration scenarios showcase IE programs, with increasing\nlevels of complexity, within Jupyter Notebook.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01736","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Document spanners have been proposed as a formal framework for declarative Information Extraction (IE) from text, following IE products from the industry and academia. Over the past decade, the framework has been studied thoroughly in terms of expressive power, complexity, and the ability to naturally combine text analysis with relational querying. This demonstration presents SpannerLib a library for embedding document spanners in Python code. SpannerLib facilitates the development of IE programs by providing an implementation of Spannerlog (Datalog-based documentspanners) that interacts with the Python code in two directions: rules can be embedded inside Python, and they can invoke custom Python code (e.g., calls to ML-based NLP models) via user-defined functions. The demonstration scenarios showcase IE programs, with increasing levels of complexity, within Jupyter Notebook.
SpannerLib:在命令式工作流中嵌入声明式信息提取
继工业界和学术界的信息提取产品之后,人们又提出了从文本中进行声明式信息提取(IE)的正式框架--文档生成器(Document Spanners)。在过去十年中,该框架在表达能力、复杂性以及将文本分析与关系查询自然结合的能力等方面都得到了深入研究。本演示介绍了 SpannerLiba 库,用于在 Python 代码中嵌入文档生成器。SpannerLib 通过提供一个与 Python 代码双向交互的 Spannerlog(基于 Datalog 的文档生成器)实现,促进了 IE 程序的开发:规则可以嵌入到 Python 中,并且可以通过用户自定义函数调用自定义 Python 代码(例如,调用基于 ML 的 NLP 模型)。演示场景展示了 Jupyter Notebook 中复杂程度不断提高的 IE 程序。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信