{"title":"Artificial Neural Network for Multi-label Medical Text Classification","authors":"Hafida Tiaiba, L. Sabri, A. Chibani, O. Kazar","doi":"10.1109/NTIC55069.2022.10100430","DOIUrl":null,"url":null,"abstract":"The Classification of medical reports is a crucial challenge, as they are usually presented in plain text, have a particular technical vocabulary, and are almost always unstructured. Document classification aims to assign the most appropriate label to a given document. Furthermore, among the significant issues of medical document classification is text representation in a numerical format. So, in this paper, we use artificial intelligence, proposing a model of multi-layer artificial neural networks for multi-label Classification. The transformation to numerical values of the medical documents relies on four encoding modes: Term Frequency (TF), Frequency-Inverse document frequency (TF-IDF), Bag-of- Words (BOW), and Document Term Matrix (DTM) models; in this study, we compared the four types of vectorizations. Experimental results demonstrated that the best results for our proposed neural network architecture for both models denoted Simple Neural Network (SNN) and Vocabulary Neural Network (VNN). We have used the local vocabulary of 7,400 documents in the SNN model; regarding the VNN model, we use the global terminology (Ohsumed_20000). The suggested models (VNN and SNN) performed well in classifying all four representations. Furthermore, the SNN results outperform the VNN findings. The accuracy of TF is 70.32 in time 3 with an epoch number of 64. For BOW, 68.16 is the accuracy reached with an epoch number 16 in time 1. Likewise, the accuracy of DTM with 32 epochs and in time three is 70.65, whereas the 71.08% value is the accuracy achieved by TF-IDF with 16 epochs in time 1, representing the best results obtained by SNN model.","PeriodicalId":403927,"journal":{"name":"2022 2nd International Conference on New Technologies of Information and Communication (NTIC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on New Technologies of Information and Communication (NTIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NTIC55069.2022.10100430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Classification of medical reports is a crucial challenge, as they are usually presented in plain text, have a particular technical vocabulary, and are almost always unstructured. Document classification aims to assign the most appropriate label to a given document. Furthermore, among the significant issues of medical document classification is text representation in a numerical format. So, in this paper, we use artificial intelligence, proposing a model of multi-layer artificial neural networks for multi-label Classification. The transformation to numerical values of the medical documents relies on four encoding modes: Term Frequency (TF), Frequency-Inverse document frequency (TF-IDF), Bag-of- Words (BOW), and Document Term Matrix (DTM) models; in this study, we compared the four types of vectorizations. Experimental results demonstrated that the best results for our proposed neural network architecture for both models denoted Simple Neural Network (SNN) and Vocabulary Neural Network (VNN). We have used the local vocabulary of 7,400 documents in the SNN model; regarding the VNN model, we use the global terminology (Ohsumed_20000). The suggested models (VNN and SNN) performed well in classifying all four representations. Furthermore, the SNN results outperform the VNN findings. The accuracy of TF is 70.32 in time 3 with an epoch number of 64. For BOW, 68.16 is the accuracy reached with an epoch number 16 in time 1. Likewise, the accuracy of DTM with 32 epochs and in time three is 70.65, whereas the 71.08% value is the accuracy achieved by TF-IDF with 16 epochs in time 1, representing the best results obtained by SNN model.