{"title":"A deep and uniform model for semantic annotation of semi structured documents based on SHIRI","authors":"M. Thiam","doi":"10.1109/CEIT.2016.7929020","DOIUrl":null,"url":null,"abstract":"In the construction of the semantic web, scientists use to annotate the existing web to improve the precision in handling documents for applications. The rapid growing of the web make impossible doing this manually. Many annotation techniques are used to resolve the first and easiest problem of information search which is finding documents containing the searched data. In this work we proposed a deep annotation model for locating and extracting the more exact parts of the documents that correspond to the responses of the request. This work extends SHIRI1 which is an ontology-based system for integration of semi-structured documents related to a specific domain. The ontology is described by a set of concepts, relations and their properties. It also contains a lexical part. It relies on an automatic, unsupervised and ontology-driven approach for extraction, alignment and querying for semantic annotation of tagged elements of documents. In this paper we focus on two major improvements: (1) we apply statistical techniques to purge extracted terms and named entities and (2) we annotate documents parts with one metadata. Experiments on real datasets will show that these improvements increase greatly the recall and the returned answers are effectively more precise and ranked according to their precision.","PeriodicalId":355001,"journal":{"name":"2016 4th International Conference on Control Engineering & Information Technology (CEIT)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 4th International Conference on Control Engineering & Information Technology (CEIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEIT.2016.7929020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In the construction of the semantic web, scientists use to annotate the existing web to improve the precision in handling documents for applications. The rapid growing of the web make impossible doing this manually. Many annotation techniques are used to resolve the first and easiest problem of information search which is finding documents containing the searched data. In this work we proposed a deep annotation model for locating and extracting the more exact parts of the documents that correspond to the responses of the request. This work extends SHIRI1 which is an ontology-based system for integration of semi-structured documents related to a specific domain. The ontology is described by a set of concepts, relations and their properties. It also contains a lexical part. It relies on an automatic, unsupervised and ontology-driven approach for extraction, alignment and querying for semantic annotation of tagged elements of documents. In this paper we focus on two major improvements: (1) we apply statistical techniques to purge extracted terms and named entities and (2) we annotate documents parts with one metadata. Experiments on real datasets will show that these improvements increase greatly the recall and the returned answers are effectively more precise and ranked according to their precision.