{"title":"Gathering, Selecting and Preparing Unstructured Documents for Enterprise Information Extraction","authors":"Mahmoud Brahimi, Kehali Nor Elhouda","doi":"10.1109/ICRAMI52622.2021.9585994","DOIUrl":null,"url":null,"abstract":"A large amount of unstructured documents exists on the web incorporating data of paramount importance for the enterprises that can employ them to synthesize the past, to comprehend the present and to predict the future. However, it is worth noting that the unstructured nature of these documents made the handling and the extraction of knowledge from them a very critical issue. The current contribution is three-fold. First, we collect the unstructured documents which might be useful using general enterprise ontology. Then, we select the most suitable ones using specific ontologies that describe partial enterprise activities. Finally, we transform the kept documents into parsabale and requestable XML files that can be the corpus for future data extraction.","PeriodicalId":440750,"journal":{"name":"2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRAMI52622.2021.9585994","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A large amount of unstructured documents exists on the web incorporating data of paramount importance for the enterprises that can employ them to synthesize the past, to comprehend the present and to predict the future. However, it is worth noting that the unstructured nature of these documents made the handling and the extraction of knowledge from them a very critical issue. The current contribution is three-fold. First, we collect the unstructured documents which might be useful using general enterprise ontology. Then, we select the most suitable ones using specific ontologies that describe partial enterprise activities. Finally, we transform the kept documents into parsabale and requestable XML files that can be the corpus for future data extraction.