Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records.

Biomedical informatics insights Pub Date : 2016-07-19 eCollection Date: 2016-01-01 DOI:10.4137/BII.S38916
Yuan Luo, Peter Szolovits
{"title":"Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records.","authors":"Yuan Luo,&nbsp;Peter Szolovits","doi":"10.4137/BII.S38916","DOIUrl":null,"url":null,"abstract":"<p><p>In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen's interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen's relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions. </p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"8 ","pages":"29-38"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4137/BII.S38916","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical informatics insights","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4137/BII.S38916","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2016/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen's interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen's relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions.

Abstract Image

Abstract Image

Abstract Image

面向电子病历自然语言处理的对峙注释的高效查询。
在自然语言处理中,隔离注释使用注释的起始和结束位置将其锚定到文本,并将注释内容与文本分开存储。我们解决了在电子病历(emr)中对叙事临床笔记应用自然语言处理时有效存储隔离注释的基本问题,并有效检索满足位置约束的注释。隔离注释的高效存储和检索可以促进诸如将非结构化文本映射到电子病历本体之类的任务。我们首先将该问题表述为区间查询问题,其中最优查询/更新时间是一般对数。接下来,我们对基本区间树查询算法进行了严格的时间复杂度分析,并展示了它在应用于Allen区间代数的13种查询类型集合时的非最优性。然后,我们研究了两种密切相关的最先进的间隔查询算法,提出了查询重新表述,并对第二种算法进行了扩展。该算法实现了对数时间戳戳-最大查询时间复杂度,并在对数时间内解决了所有Allen关系上的戳戳-间隔查询任务,达到了理论下界。更新时间保持对数,同时空间需求保持线性。我们还讨论了外部内存模型和更高维度中的间隔管理。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信