Adnan Yahya, Hala Salameh, Maram Belbeisi, Noor Shamasneh
{"title":"Information Extraction from Arabic Medications Leaflets","authors":"Adnan Yahya, Hala Salameh, Maram Belbeisi, Noor Shamasneh","doi":"10.1109/AICT55583.2022.10013568","DOIUrl":null,"url":null,"abstract":"Making information in electronic documents easily accessible has been a major concern over the past years. There has been increasing interest in gleaning information from unstructured text and presenting it as structured data using information extraction (IE). Since Arabic has seen major growth in web content, mainly unstructured text, the need for IE from Arabic documents has gained importance. The processing capacity needed for IE far exceeds human ability to extract knowledge manually. The medical field is one such area, where awareness of health issues makes the task of automating medical informatics crucial for better access to medical knowledge. Thus, work on extracting information from medical documents has increased rapidly. In this paper we address the issue of IE from Arabic drug leaflets. We use a combination of rule-based, machine learning and deep learning methods and employ a suit of tools that account for the particularities of Arabic to extract information from Arabic drug package inserts to make this information available in structured form and thus better accessible to regular users and health care providers. A prototype system that utilizes the IE results was developed with useful functionality such as alerting to possible Adverse Drug Reactions (ADR) and finding drug alternatives.","PeriodicalId":441475,"journal":{"name":"2022 IEEE 16th International Conference on Application of Information and Communication Technologies (AICT)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 16th International Conference on Application of Information and Communication Technologies (AICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICT55583.2022.10013568","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Making information in electronic documents easily accessible has been a major concern over the past years. There has been increasing interest in gleaning information from unstructured text and presenting it as structured data using information extraction (IE). Since Arabic has seen major growth in web content, mainly unstructured text, the need for IE from Arabic documents has gained importance. The processing capacity needed for IE far exceeds human ability to extract knowledge manually. The medical field is one such area, where awareness of health issues makes the task of automating medical informatics crucial for better access to medical knowledge. Thus, work on extracting information from medical documents has increased rapidly. In this paper we address the issue of IE from Arabic drug leaflets. We use a combination of rule-based, machine learning and deep learning methods and employ a suit of tools that account for the particularities of Arabic to extract information from Arabic drug package inserts to make this information available in structured form and thus better accessible to regular users and health care providers. A prototype system that utilizes the IE results was developed with useful functionality such as alerting to possible Adverse Drug Reactions (ADR) and finding drug alternatives.