{"title":"Information Retrieval Based on Description Logic: Application to Biomedical Documents","authors":"Kabil Boukhari, Mohamed Nazih Omri","doi":"10.1109/HPCS.2017.128","DOIUrl":null,"url":null,"abstract":"The document indexing is a fairly sensitive phase in the information retrieval. However, terms presented in a document are not sufficient to completely represent it. Then, the exploitation of the implicit information, through external resources, is necessary for better indexing. For this purpose, a new indexing model for biomedical documents based on description logics has been proposed to generate relevant indexes. The documents and the external resource are represented by descriptive expressions; a first statistical phase consists in assigning an importance degree to each term in the document and a semantic part to extract the most important concepts of the MESH thesaurus (Medical Subject Headings). The concept extraction step uses the description logics to combine the statistical and semantic approaches followed by a cleaning part to select the most important indexes for the document representation. For the experiments phase we used the OHSUMED collection, which showed the effectiveness of the proposed approach and the importance of using description logics for the indexing process.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS.2017.128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
The document indexing is a fairly sensitive phase in the information retrieval. However, terms presented in a document are not sufficient to completely represent it. Then, the exploitation of the implicit information, through external resources, is necessary for better indexing. For this purpose, a new indexing model for biomedical documents based on description logics has been proposed to generate relevant indexes. The documents and the external resource are represented by descriptive expressions; a first statistical phase consists in assigning an importance degree to each term in the document and a semantic part to extract the most important concepts of the MESH thesaurus (Medical Subject Headings). The concept extraction step uses the description logics to combine the statistical and semantic approaches followed by a cleaning part to select the most important indexes for the document representation. For the experiments phase we used the OHSUMED collection, which showed the effectiveness of the proposed approach and the importance of using description logics for the indexing process.