{"title":"Semantic and Visual Enrichment Hierarchical Network for Medical Image Report Generation","authors":"Qian Tang, Yongbin Yu, Xiao Feng, Chenhui Peng","doi":"10.1109/CACML55074.2022.00128","DOIUrl":null,"url":null,"abstract":"This paper highlights a novel medical image report generator named SVEH-Net (Semantic and Visual Enrichment Hierarchical Network), which is based on the encoder-decoder framework and attention mechanism. With the consideration of semantic, visual, and tag features fusion, an image feature encoding (IFE) module is introduced to provide global image features for the decoder, and a hierarchical decoder (H-Decoder) which can fusion all semantic and visual features and generate two reports at one time is proposed. In the experiments, our proposed models are evaluated on the Indiana University Chest X-ray radiology report dataset (IU X-ray) and PEIR Gross dataset. On the both two datasets, our model outperforms the state-of-the-art method in BLEU-1/2/3/4, METEOR, and ROUGE-L scores.","PeriodicalId":137505,"journal":{"name":"2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CACML55074.2022.00128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper highlights a novel medical image report generator named SVEH-Net (Semantic and Visual Enrichment Hierarchical Network), which is based on the encoder-decoder framework and attention mechanism. With the consideration of semantic, visual, and tag features fusion, an image feature encoding (IFE) module is introduced to provide global image features for the decoder, and a hierarchical decoder (H-Decoder) which can fusion all semantic and visual features and generate two reports at one time is proposed. In the experiments, our proposed models are evaluated on the Indiana University Chest X-ray radiology report dataset (IU X-ray) and PEIR Gross dataset. On the both two datasets, our model outperforms the state-of-the-art method in BLEU-1/2/3/4, METEOR, and ROUGE-L scores.