A deep and uniform model for semantic annotation of semi structured documents based on SHIRI

M. Thiam
{"title":"A deep and uniform model for semantic annotation of semi structured documents based on SHIRI","authors":"M. Thiam","doi":"10.1109/CEIT.2016.7929020","DOIUrl":null,"url":null,"abstract":"In the construction of the semantic web, scientists use to annotate the existing web to improve the precision in handling documents for applications. The rapid growing of the web make impossible doing this manually. Many annotation techniques are used to resolve the first and easiest problem of information search which is finding documents containing the searched data. In this work we proposed a deep annotation model for locating and extracting the more exact parts of the documents that correspond to the responses of the request. This work extends SHIRI1 which is an ontology-based system for integration of semi-structured documents related to a specific domain. The ontology is described by a set of concepts, relations and their properties. It also contains a lexical part. It relies on an automatic, unsupervised and ontology-driven approach for extraction, alignment and querying for semantic annotation of tagged elements of documents. In this paper we focus on two major improvements: (1) we apply statistical techniques to purge extracted terms and named entities and (2) we annotate documents parts with one metadata. Experiments on real datasets will show that these improvements increase greatly the recall and the returned answers are effectively more precise and ranked according to their precision.","PeriodicalId":355001,"journal":{"name":"2016 4th International Conference on Control Engineering & Information Technology (CEIT)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 4th International Conference on Control Engineering & Information Technology (CEIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEIT.2016.7929020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In the construction of the semantic web, scientists use to annotate the existing web to improve the precision in handling documents for applications. The rapid growing of the web make impossible doing this manually. Many annotation techniques are used to resolve the first and easiest problem of information search which is finding documents containing the searched data. In this work we proposed a deep annotation model for locating and extracting the more exact parts of the documents that correspond to the responses of the request. This work extends SHIRI1 which is an ontology-based system for integration of semi-structured documents related to a specific domain. The ontology is described by a set of concepts, relations and their properties. It also contains a lexical part. It relies on an automatic, unsupervised and ontology-driven approach for extraction, alignment and querying for semantic annotation of tagged elements of documents. In this paper we focus on two major improvements: (1) we apply statistical techniques to purge extracted terms and named entities and (2) we annotate documents parts with one metadata. Experiments on real datasets will show that these improvements increase greatly the recall and the returned answers are effectively more precise and ranked according to their precision.
基于SHIRI的半结构化文档语义标注深度统一模型
在构建语义网的过程中,科学家们采用对现有网络进行标注的方法来提高应用程序处理文档的精度。网络的快速发展使得手工操作变得不可能。许多注释技术用于解决信息搜索的第一个也是最简单的问题,即找到包含搜索数据的文档。在这项工作中,我们提出了一个深度注释模型,用于定位和提取与请求响应相对应的文档中更精确的部分。这项工作扩展了SHIRI1, SHIRI1是一个基于本体的系统,用于集成与特定领域相关的半结构化文档。本体由一组概念、关系及其属性来描述。它还包含一个词汇部分。它依赖于自动、无监督和本体驱动的方法来提取、对齐和查询文档标记元素的语义注释。在本文中,我们主要关注两个主要改进:(1)我们应用统计技术来清除提取的术语和命名实体;(2)我们用一个元数据注释文档部分。在真实数据集上的实验表明,这些改进大大提高了召回率,返回的答案有效地提高了精度,并根据精度进行了排名。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信