J. Doleschal, B. Kimelfeld, W. Martens, L. Peterfreund
{"title":"信息抽取中的权重标注","authors":"J. Doleschal, B. Kimelfeld, W. Martens, L. Peterfreund","doi":"10.46298/lmcs-18(1:21)2022","DOIUrl":null,"url":null,"abstract":"The framework of document spanners abstracts the task of information\nextraction from text as a function that maps every document (a string) into a\nrelation over the document's spans (intervals identified by their start and end\nindices). For instance, the regular spanners are the closure under the\nRelational Algebra (RA) of the regular expressions with capture variables, and\nthe expressive power of the regular spanners is precisely captured by the class\nof VSet-automata -- a restricted class of transducers that mark the endpoints\nof selected spans.\n In this work, we embark on the investigation of document spanners that can\nannotate extractions with auxiliary information such as confidence, support,\nand confidentiality measures. To this end, we adopt the abstraction of\nprovenance semirings by Green et al., where tuples of a relation are annotated\nwith the elements of a commutative semiring, and where the annotation\npropagates through the positive RA operators via the semiring operators. Hence,\nthe proposed spanner extension, referred to as an annotator, maps every string\ninto an annotated relation over the spans. As a specific instantiation, we\nexplore weighted VSet-automata that, similarly to weighted automata and\ntransducers, attach semiring elements to transitions. We investigate key\naspects of expressiveness, such as the closure under the positive RA, and key\naspects of computational complexity, such as the enumeration of annotated\nanswers and their ranked enumeration in the case of ordered semirings. For a\nnumber of these problems, fundamental properties of the underlying semiring,\nsuch as positivity, are crucial for establishing tractability.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"42 1","pages":"8:1-8:18"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Weight Annotation in Information Extraction\",\"authors\":\"J. Doleschal, B. Kimelfeld, W. Martens, L. Peterfreund\",\"doi\":\"10.46298/lmcs-18(1:21)2022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The framework of document spanners abstracts the task of information\\nextraction from text as a function that maps every document (a string) into a\\nrelation over the document's spans (intervals identified by their start and end\\nindices). For instance, the regular spanners are the closure under the\\nRelational Algebra (RA) of the regular expressions with capture variables, and\\nthe expressive power of the regular spanners is precisely captured by the class\\nof VSet-automata -- a restricted class of transducers that mark the endpoints\\nof selected spans.\\n In this work, we embark on the investigation of document spanners that can\\nannotate extractions with auxiliary information such as confidence, support,\\nand confidentiality measures. To this end, we adopt the abstraction of\\nprovenance semirings by Green et al., where tuples of a relation are annotated\\nwith the elements of a commutative semiring, and where the annotation\\npropagates through the positive RA operators via the semiring operators. Hence,\\nthe proposed spanner extension, referred to as an annotator, maps every string\\ninto an annotated relation over the spans. As a specific instantiation, we\\nexplore weighted VSet-automata that, similarly to weighted automata and\\ntransducers, attach semiring elements to transitions. We investigate key\\naspects of expressiveness, such as the closure under the positive RA, and key\\naspects of computational complexity, such as the enumeration of annotated\\nanswers and their ranked enumeration in the case of ordered semirings. For a\\nnumber of these problems, fundamental properties of the underlying semiring,\\nsuch as positivity, are crucial for establishing tractability.\",\"PeriodicalId\":90482,\"journal\":{\"name\":\"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory\",\"volume\":\"42 1\",\"pages\":\"8:1-8:18\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.46298/lmcs-18(1:21)2022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46298/lmcs-18(1:21)2022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The framework of document spanners abstracts the task of information
extraction from text as a function that maps every document (a string) into a
relation over the document's spans (intervals identified by their start and end
indices). For instance, the regular spanners are the closure under the
Relational Algebra (RA) of the regular expressions with capture variables, and
the expressive power of the regular spanners is precisely captured by the class
of VSet-automata -- a restricted class of transducers that mark the endpoints
of selected spans.
In this work, we embark on the investigation of document spanners that can
annotate extractions with auxiliary information such as confidence, support,
and confidentiality measures. To this end, we adopt the abstraction of
provenance semirings by Green et al., where tuples of a relation are annotated
with the elements of a commutative semiring, and where the annotation
propagates through the positive RA operators via the semiring operators. Hence,
the proposed spanner extension, referred to as an annotator, maps every string
into an annotated relation over the spans. As a specific instantiation, we
explore weighted VSet-automata that, similarly to weighted automata and
transducers, attach semiring elements to transitions. We investigate key
aspects of expressiveness, such as the closure under the positive RA, and key
aspects of computational complexity, such as the enumeration of annotated
answers and their ranked enumeration in the case of ordered semirings. For a
number of these problems, fundamental properties of the underlying semiring,
such as positivity, are crucial for establishing tractability.