{"title":"Detecting hidden structures from Arabic electronic documents: Application to the legal field","authors":"Imen Bouaziz Mezghanni, F. Gargouri","doi":"10.1109/SERA.2016.7516131","DOIUrl":null,"url":null,"abstract":"Dealing with unstructured information is currently a hot research topic since most documents exist in an unstructured form. The effective exploitation of unstructured document, although intricate, is of paramount importance to Information Retrieval (IR). The key to using unstructured data set is to identify the hidden structures within the data set. In this paper, we present an approach to recognize the semantic structure of documents in Arabic legal data. Several main concepts of a document are expressed in this structure, which includes title, the headings of the chapters, sections, subsections, etc. This structural information is employed to obtain a richer and more fine-grained annotation of documents forming a useful and coherent infrastructure ready for IR. Some experiments were conducted in order to evaluate our approach. The initial results seem promising.","PeriodicalId":412361,"journal":{"name":"2016 IEEE 14th International Conference on Software Engineering Research, Management and Applications (SERA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 14th International Conference on Software Engineering Research, Management and Applications (SERA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERA.2016.7516131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Dealing with unstructured information is currently a hot research topic since most documents exist in an unstructured form. The effective exploitation of unstructured document, although intricate, is of paramount importance to Information Retrieval (IR). The key to using unstructured data set is to identify the hidden structures within the data set. In this paper, we present an approach to recognize the semantic structure of documents in Arabic legal data. Several main concepts of a document are expressed in this structure, which includes title, the headings of the chapters, sections, subsections, etc. This structural information is employed to obtain a richer and more fine-grained annotation of documents forming a useful and coherent infrastructure ready for IR. Some experiments were conducted in order to evaluate our approach. The initial results seem promising.