A Comparative Analysis of Breast Cancer Diagnosis by Fusing Visual and Semantic Feature Descriptors

2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE) Pub Date : 2021-10-25 DOI:10.1109/BIBE52308.2021.9635481

G. Apostolopoulos, A. Koutras, D. Anyfantis, Ioanna Christoyianni

{"title":"A Comparative Analysis of Breast Cancer Diagnosis by Fusing Visual and Semantic Feature Descriptors","authors":"G. Apostolopoulos, A. Koutras, D. Anyfantis, Ioanna Christoyianni","doi":"10.1109/BIBE52308.2021.9635481","DOIUrl":null,"url":null,"abstract":"Computer-aided Diagnosis (CAD) systems have become a significant assistance tool, that are used to help identify abnormal/normal regions of interest in mammograms faster and more effectively than human readers. In this work, we propose a new approach for breast cancer identification of all type of lesions in digital mammograms by combining low-and high-level mammogram descriptors in a compact form. The proposed method consists of two major stages: Initially, a feature extraction process that utilizes two dimensional discrete transforms based on ART, Shapelets and textural representations based on Gabor filter banks, is used to extract low-level visual descriptors. To further improve our method's performance, the semantic information of each mammogram given by radiologists is encoded in a 16-bit length word high-level feature vector. All features are stored in a quaternion and fused using the L2 norm prior to their presentation to the classification module. For the classification task, each ROS is recognized using two different classification models, Ada Boost and Random Forest. The proposed method is evaluated on regions taken from the DDSM database. The results show that Ada Boost outperforms Random Forest in terms of accuracy (99.2%$(\\pm 0.527)$ against 93.78% $(\\pm 1.659))$, precision, recall and F-measure. Both classifiers achieve a mean accuracy of 33% and 38% higher than using only visual descriptors, showing that semantic information can indeed improve the diagnosis when it is combined with standard visual features.","PeriodicalId":343724,"journal":{"name":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE52308.2021.9635481","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Computer-aided Diagnosis (CAD) systems have become a significant assistance tool, that are used to help identify abnormal/normal regions of interest in mammograms faster and more effectively than human readers. In this work, we propose a new approach for breast cancer identification of all type of lesions in digital mammograms by combining low-and high-level mammogram descriptors in a compact form. The proposed method consists of two major stages: Initially, a feature extraction process that utilizes two dimensional discrete transforms based on ART, Shapelets and textural representations based on Gabor filter banks, is used to extract low-level visual descriptors. To further improve our method's performance, the semantic information of each mammogram given by radiologists is encoded in a 16-bit length word high-level feature vector. All features are stored in a quaternion and fused using the L2 norm prior to their presentation to the classification module. For the classification task, each ROS is recognized using two different classification models, Ada Boost and Random Forest. The proposed method is evaluated on regions taken from the DDSM database. The results show that Ada Boost outperforms Random Forest in terms of accuracy (99.2%$(\pm 0.527)$ against 93.78% $(\pm 1.659))$, precision, recall and F-measure. Both classifiers achieve a mean accuracy of 33% and 38% higher than using only visual descriptors, showing that semantic information can indeed improve the diagnosis when it is combined with standard visual features.

查看原文本刊更多论文

融合视觉和语义特征描述符诊断乳腺癌的比较分析

计算机辅助诊断(CAD)系统已经成为一种重要的辅助工具，用于帮助识别乳房x光片上的异常/正常区域，比人类读者更快、更有效。在这项工作中，我们提出了一种新的方法，通过将低水平和高水平的乳房x线照片描述符结合在一个紧凑的形式中，来识别数字乳房x线照片中所有类型的病变。提出的方法包括两个主要阶段:首先，使用基于ART的二维离散变换、Shapelets和基于Gabor滤波器组的纹理表示的特征提取过程来提取低级视觉描述符。为了进一步提高我们的方法的性能，放射科医生给出的每个乳房x光片的语义信息被编码成一个16位长度的单词高级特征向量。所有特征都存储在一个四元数中，并在它们呈现给分类模块之前使用L2范数进行融合。对于分类任务，每个ROS使用两种不同的分类模型，Ada Boost和Random Forest来识别。对DDSM数据库中选取的区域进行了评价。结果表明，Ada Boost在准确率(99.2%$(\pm 0.527)$对93.78% $(\pm 1.659))$、精度、召回率和F-measure方面优于Random Forest。两种分类器的平均准确率分别比仅使用视觉描述符高出33%和38%，这表明当语义信息与标准视觉特征结合时，它确实可以提高诊断。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)

自引率

0.00%

发文量