Weight Annotation in Information Extraction

J. Doleschal, B. Kimelfeld, W. Martens, L. Peterfreund
{"title":"Weight Annotation in Information Extraction","authors":"J. Doleschal, B. Kimelfeld, W. Martens, L. Peterfreund","doi":"10.46298/lmcs-18(1:21)2022","DOIUrl":null,"url":null,"abstract":"The framework of document spanners abstracts the task of information\nextraction from text as a function that maps every document (a string) into a\nrelation over the document's spans (intervals identified by their start and end\nindices). For instance, the regular spanners are the closure under the\nRelational Algebra (RA) of the regular expressions with capture variables, and\nthe expressive power of the regular spanners is precisely captured by the class\nof VSet-automata -- a restricted class of transducers that mark the endpoints\nof selected spans.\n In this work, we embark on the investigation of document spanners that can\nannotate extractions with auxiliary information such as confidence, support,\nand confidentiality measures. To this end, we adopt the abstraction of\nprovenance semirings by Green et al., where tuples of a relation are annotated\nwith the elements of a commutative semiring, and where the annotation\npropagates through the positive RA operators via the semiring operators. Hence,\nthe proposed spanner extension, referred to as an annotator, maps every string\ninto an annotated relation over the spans. As a specific instantiation, we\nexplore weighted VSet-automata that, similarly to weighted automata and\ntransducers, attach semiring elements to transitions. We investigate key\naspects of expressiveness, such as the closure under the positive RA, and key\naspects of computational complexity, such as the enumeration of annotated\nanswers and their ranked enumeration in the case of ordered semirings. For a\nnumber of these problems, fundamental properties of the underlying semiring,\nsuch as positivity, are crucial for establishing tractability.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46298/lmcs-18(1:21)2022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

The framework of document spanners abstracts the task of information extraction from text as a function that maps every document (a string) into a relation over the document's spans (intervals identified by their start and end indices). For instance, the regular spanners are the closure under the Relational Algebra (RA) of the regular expressions with capture variables, and the expressive power of the regular spanners is precisely captured by the class of VSet-automata -- a restricted class of transducers that mark the endpoints of selected spans. In this work, we embark on the investigation of document spanners that can annotate extractions with auxiliary information such as confidence, support, and confidentiality measures. To this end, we adopt the abstraction of provenance semirings by Green et al., where tuples of a relation are annotated with the elements of a commutative semiring, and where the annotation propagates through the positive RA operators via the semiring operators. Hence, the proposed spanner extension, referred to as an annotator, maps every string into an annotated relation over the spans. As a specific instantiation, we explore weighted VSet-automata that, similarly to weighted automata and transducers, attach semiring elements to transitions. We investigate key aspects of expressiveness, such as the closure under the positive RA, and key aspects of computational complexity, such as the enumeration of annotated answers and their ranked enumeration in the case of ordered semirings. For a number of these problems, fundamental properties of the underlying semiring, such as positivity, are crucial for establishing tractability.
信息抽取中的权重标注
文档生成器的框架将从文本中提取信息的任务抽象为一个函数,该函数将每个文档(字符串)映射到文档跨度(由其开始和结束索引标识的间隔)上的关系。例如,正则扳手是正则表达式的关系代数(RA)下的闭包,正则扳手的表达能力是由VSet-automata类精确捕获的——VSet-automata是一种有限的传感器类,它标记了所选跨度的端点。在这项工作中,我们着手研究可以用辅助信息(如信心、支持和保密措施)注释摘录的文档生成器。为此,我们采用Green等人的来源半环的抽象,其中关系的元组用交换半环的元素进行注释,并且注释通过正RA算子通过半环算子进行传播。因此,建议的扳手扩展(称为注释器)将每个字符串映射到跨上的注释关系。作为一个具体的实例,我们探索了加权vset自动机,它类似于加权自动机和换能器,将半环元素附加到转换上。我们研究了表达性的关键方面,如正RA下的闭包,以及计算复杂性的关键方面,如有序半环情况下注释答案的枚举及其排序枚举。对于许多这样的问题,底层半环的基本性质,如正性,对于建立可追溯性至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信