Artificial Neural Network for Multi-label Medical Text Classification

Hafida Tiaiba, L. Sabri, A. Chibani, O. Kazar
{"title":"Artificial Neural Network for Multi-label Medical Text Classification","authors":"Hafida Tiaiba, L. Sabri, A. Chibani, O. Kazar","doi":"10.1109/NTIC55069.2022.10100430","DOIUrl":null,"url":null,"abstract":"The Classification of medical reports is a crucial challenge, as they are usually presented in plain text, have a particular technical vocabulary, and are almost always unstructured. Document classification aims to assign the most appropriate label to a given document. Furthermore, among the significant issues of medical document classification is text representation in a numerical format. So, in this paper, we use artificial intelligence, proposing a model of multi-layer artificial neural networks for multi-label Classification. The transformation to numerical values of the medical documents relies on four encoding modes: Term Frequency (TF), Frequency-Inverse document frequency (TF-IDF), Bag-of- Words (BOW), and Document Term Matrix (DTM) models; in this study, we compared the four types of vectorizations. Experimental results demonstrated that the best results for our proposed neural network architecture for both models denoted Simple Neural Network (SNN) and Vocabulary Neural Network (VNN). We have used the local vocabulary of 7,400 documents in the SNN model; regarding the VNN model, we use the global terminology (Ohsumed_20000). The suggested models (VNN and SNN) performed well in classifying all four representations. Furthermore, the SNN results outperform the VNN findings. The accuracy of TF is 70.32 in time 3 with an epoch number of 64. For BOW, 68.16 is the accuracy reached with an epoch number 16 in time 1. Likewise, the accuracy of DTM with 32 epochs and in time three is 70.65, whereas the 71.08% value is the accuracy achieved by TF-IDF with 16 epochs in time 1, representing the best results obtained by SNN model.","PeriodicalId":403927,"journal":{"name":"2022 2nd International Conference on New Technologies of Information and Communication (NTIC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on New Technologies of Information and Communication (NTIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NTIC55069.2022.10100430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The Classification of medical reports is a crucial challenge, as they are usually presented in plain text, have a particular technical vocabulary, and are almost always unstructured. Document classification aims to assign the most appropriate label to a given document. Furthermore, among the significant issues of medical document classification is text representation in a numerical format. So, in this paper, we use artificial intelligence, proposing a model of multi-layer artificial neural networks for multi-label Classification. The transformation to numerical values of the medical documents relies on four encoding modes: Term Frequency (TF), Frequency-Inverse document frequency (TF-IDF), Bag-of- Words (BOW), and Document Term Matrix (DTM) models; in this study, we compared the four types of vectorizations. Experimental results demonstrated that the best results for our proposed neural network architecture for both models denoted Simple Neural Network (SNN) and Vocabulary Neural Network (VNN). We have used the local vocabulary of 7,400 documents in the SNN model; regarding the VNN model, we use the global terminology (Ohsumed_20000). The suggested models (VNN and SNN) performed well in classifying all four representations. Furthermore, the SNN results outperform the VNN findings. The accuracy of TF is 70.32 in time 3 with an epoch number of 64. For BOW, 68.16 is the accuracy reached with an epoch number 16 in time 1. Likewise, the accuracy of DTM with 32 epochs and in time three is 70.65, whereas the 71.08% value is the accuracy achieved by TF-IDF with 16 epochs in time 1, representing the best results obtained by SNN model.
多标签医学文本分类的人工神经网络
医疗报告的分类是一项至关重要的挑战,因为它们通常以纯文本呈现,具有特定的技术词汇,并且几乎总是无结构的。文档分类的目的是为给定的文档分配最合适的标签。此外,医学文档分类的重要问题之一是数字格式的文本表示。因此,本文运用人工智能技术,提出了一种多层人工神经网络的多标签分类模型。医学文献的数值转换依赖于四种编码模式:词频(TF)、频率-逆文档频率(TF- idf)、词袋(BOW)和文档术语矩阵(DTM)模型;在这项研究中,我们比较了四种类型的矢量化。实验结果表明,简单神经网络(SNN)和词汇神经网络(VNN)两种模型的神经网络结构效果最好。我们在SNN模型中使用了7400个文档的本地词汇表;对于VNN模型,我们使用全局术语(Ohsumed_20000)。建议的模型(VNN和SNN)在对所有四种表征进行分类方面表现良好。此外,SNN的结果优于VNN的结果。TF在时间3的精度为70.32,历元数为64。对于BOW, 68.16是在时间1的历元数为16时所达到的精度。同样,DTM在时间3为32次epoch的精度为70.65,而TF-IDF在时间1为16次epoch的精度为71.08%,代表SNN模型获得的最佳结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信