Oscar Ramos-Soto, Itzel Aranguren, Manuel Carrillo M, Diego Oliva, Sandra E Balderas-Mata
{"title":"Artificial intelligence in medical imaging diagnosis: are we ready for its clinical implementation?","authors":"Oscar Ramos-Soto, Itzel Aranguren, Manuel Carrillo M, Diego Oliva, Sandra E Balderas-Mata","doi":"10.1117/1.JMI.12.6.061405","DOIUrl":"10.1117/1.JMI.12.6.061405","url":null,"abstract":"<p><strong>Purpose: </strong>We examine the transformative potential of artificial intelligence (AI) in medical imaging diagnosis, focusing on improving diagnostic accuracy and efficiency through advanced algorithms. It addresses the significant challenges preventing immediate clinical adoption of AI, specifically from technical, ethical, and legal perspectives. The aim is to highlight the current state of AI in medical imaging and outline the necessary steps to ensure safe, effective, and ethically sound clinical implementation.</p><p><strong>Approach: </strong>We conduct a comprehensive discussion, with special emphasis on the technical requirements for robust AI models, the ethical frameworks needed for responsible deployment, and the legal implications, including data privacy and regulatory compliance. Explainable artificial intelligence (XAI) is examined as a means to increase transparency and build trust among healthcare professionals and patients.</p><p><strong>Results: </strong>The analysis reveals key challenges to AI integration in clinical settings, including the need for extensive high-quality datasets, model reliability, advanced infrastructure, and compliance with regulatory standards. The lack of explainability in AI outputs remains a barrier, with XAI identified as crucial for meeting transparency standards and enhancing trust among end users.</p><p><strong>Conclusions: </strong>Overcoming these barriers requires a collaborative, multidisciplinary approach to integrate AI into clinical practice responsibly. Addressing technical, ethical, and legal issues will support a softer transition, fostering a more accurate, efficient, and patient-centered healthcare system where AI augments traditional medical practices.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 6","pages":"061405"},"PeriodicalIF":1.9,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12177575/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144477323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali Mammadov, Loïc Le Folgoc, Julien Adam, Anne Buronfosse, Gilles Hayem, Guillaume Hocquet, Pietro Gori
{"title":"Self-supervision enhances instance-based multiple instance learning methods in digital pathology: a benchmark study.","authors":"Ali Mammadov, Loïc Le Folgoc, Julien Adam, Anne Buronfosse, Gilles Hayem, Guillaume Hocquet, Pietro Gori","doi":"10.1117/1.JMI.12.6.061404","DOIUrl":"10.1117/1.JMI.12.6.061404","url":null,"abstract":"<p><strong>Purpose: </strong>Multiple instance learning (MIL) has emerged as the best solution for whole slide image (WSI) classification. It consists of dividing each slide into patches, which are treated as a bag of instances labeled with a global label. MIL includes two main approaches: instance-based and embedding-based. In the former, each patch is classified independently, and then, the patch scores are aggregated to predict the bag label. In the latter, bag classification is performed after aggregating patch embeddings. Even if instance-based methods are naturally more interpretable, embedding-based MILs have usually been preferred in the past due to their robustness to poor feature extractors. Recently, the quality of feature embeddings has drastically increased using self-supervised learning (SSL). Nevertheless, many authors continue to endorse the superiority of embedding-based MIL.</p><p><strong>Approach: </strong>We conduct 710 experiments across 4 datasets, comparing 10 MIL strategies, 6 self-supervised methods with 4 backbones, 4 foundation models, and various pathology-adapted techniques. Furthermore, we introduce 4 instance-based MIL methods, never used before in the pathology domain.</p><p><strong>Results: </strong>We show that with a good SSL feature extractor, simple instance-based MILs, with very few parameters, obtain similar or better performance than complex, state-of-the-art (SOTA) embedding-based MIL methods, setting new SOTA results on the BRACS and Camelyon16 datasets.</p><p><strong>Conclusion: </strong>As simple instance-based MIL methods are naturally more interpretable and explainable to clinicians, our results suggest that more effort should be put into well-adapted SSL methods for WSI rather than into complex embedding-based MIL methods.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 6","pages":"061404"},"PeriodicalIF":1.9,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12134610/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144235588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai Yang, Craig K Abbey, Bruno Barufaldi, Xinhua Li, Theodore A Marschall, Bob Liu
{"title":"Frequency-based texture analysis of non-Gaussian properties of digital breast tomosynthesis images and comparison across two vendors.","authors":"Kai Yang, Craig K Abbey, Bruno Barufaldi, Xinhua Li, Theodore A Marschall, Bob Liu","doi":"10.1117/1.JMI.12.S2.S22004","DOIUrl":"10.1117/1.JMI.12.S2.S22004","url":null,"abstract":"<p><strong>Purpose: </strong>We aim to analyze higher-order textural components of digital breast tomosynthesis (DBT) images to quantify differences in the appearance of breast parenchyma produced by different vendors.</p><p><strong>Approach: </strong>We included consecutive women who had normal screening DBT exams in January 2018 from a GE system and in adjacent years from Hologic systems. Laplacian fractional entropy (LFE), as a measure of non-Gaussian statistical properties of breast tissue texture, was calculated from for-presentation Craniocaudal (CC) view DBT slices and synthetic mammograms (SMs) through frequency-based filtering with Gabor filters, which were considered mathematical models for human visual response to image textures. The LFE values were compared within and across subjects and vendors along with secondary parameters (laterality, year-to-year, modality, and breast density) via two-way analysis of variance (ANOVA) tests using frequency as one of the two independent variables, and a <math><mrow><mi>P</mi></mrow> </math> -value <math><mrow><mo><</mo> <mn>0.05</mn></mrow> </math> was considered statistically significant.</p><p><strong>Results: </strong>A total of 8529 CC view DBT slices and SM images from 73 screening exams in 25 women were analyzed. Significant differences in LFE were observed for different frequencies ( <math><mrow><mi>P</mi> <mo><</mo> <mn>0.001</mn></mrow> </math> ) and across vendors (GE versus Hologic DBT: <math><mrow><mi>P</mi> <mo><</mo> <mn>0.001</mn></mrow> </math> , GE versus Hologic SM: <math><mrow><mi>P</mi> <mo><</mo> <mn>0.001</mn></mrow> </math> ).</p><p><strong>Conclusion: </strong>Significant differences in perception of breast parenchyma textures among two DBT vendors were demonstrated via higher-order non-Gaussian statistical properties. This finding extends previously observed differences in anatomical noise power spectra in DBT images and provides quantitative evidence to support caution in across-vendor comparative reading and will be beneficial to facilitate future development of vendor-neutral artificial intelligence algorithms for breast cancer screening.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 Suppl 2","pages":"S22004"},"PeriodicalIF":1.9,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11925074/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143694062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pontus Timberg, Gustav Hellgren, Magnus Dustler, Anders Tingberg
{"title":"Investigating the effect of adding comparisons with prior mammograms to standalone digital breast tomosynthesis screening.","authors":"Pontus Timberg, Gustav Hellgren, Magnus Dustler, Anders Tingberg","doi":"10.1117/1.JMI.12.S2.S22003","DOIUrl":"10.1117/1.JMI.12.S2.S22003","url":null,"abstract":"<p><strong>Purpose: </strong>The purpose is to retrospectively investigate how the addition of prior and concurrent mammograms affects wide-angle digital breast tomosynthesis (DBT) screening false-positive recall rates, malignancy scoring, and recall agreement.</p><p><strong>Approach: </strong>A total of 200 cases were selected from the Malmö Breast Tomosynthesis Screening Trial. They consist of 150 recalled cases [30 true positives (TPs), 120 false positives (FPs), and 50 healthy, non-recalled true-negative (TN) cases]. The positive cases were categorized based on being recalled by either DBT, digital mammography (DM), or both. Each case had DBT, synthetic mammography (SM), and DM (prior screening round) images. Five radiologists participated in a reading study where detection, risk of malignancy, and recall were assessed. They read each case twice, once using only DBT and once using DBT together with SM and DM priors.</p><p><strong>Results: </strong>The results showed a significant reduction in recall rates for all FP categories, as well as for the TN cases, when adding SM and prior DM to DBT. This resulted also in a significant increase in recall agreement for these categories, with more of the negative cases being recalled by few or no readers. These categories were overall rated as appearing more malignant in the DBT reading arm. For the TP categories, there was a significant decrease in recalls for DM-recalled cancers ( <math><mrow><mi>p</mi> <mo>=</mo> <mn>0.047</mn></mrow> </math> ), but no significant difference for DBT-recalled cancers ( <math><mrow><mi>p</mi> <mo>=</mo> <mn>0.063</mn></mrow> </math> ), or DBT/DM-recalled cancers ( <math><mrow><mi>p</mi> <mo>=</mo> <mn>0.208</mn></mrow> </math> ).</p><p><strong>Conclusions: </strong>Similar to the documented effect of priors in DM screening, we suggest that added two-dimensional priors improve the specificity of DBT screening but may reduce the sensitivity.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 Suppl 2","pages":"S22003"},"PeriodicalIF":1.9,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11931293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143711591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renann F Brandão, Lucas E Soares, Lucas R Borges, Predrag R Bakic, Anders Tingberg, Marcelo A C Vieira
{"title":"Exploring the impact of image restoration in simulating higher dose mammography: effects on the detectability of microcalcifications across different sizes using model observer analysis.","authors":"Renann F Brandão, Lucas E Soares, Lucas R Borges, Predrag R Bakic, Anders Tingberg, Marcelo A C Vieira","doi":"10.1117/1.JMI.12.S2.S22013","DOIUrl":"10.1117/1.JMI.12.S2.S22013","url":null,"abstract":"<p><strong>Purpose: </strong>Breast cancer is one of the leading causes of cancer-related deaths among women, and digital mammography plays a key role in screening and early detection. The radiation dose on mammographic exams directly influences image quality and radiologists' performance. We evaluate the impact of an image restoration pipeline-designed to simulate higher dose acquisitions-on the detectability of microcalcifications of various sizes in mammograms acquired at different radiation doses.</p><p><strong>Approach: </strong>The restoration pipeline denoises the image using a Poisson-Gaussian noise model, combining it with the noisy image to achieve a signal-to-noise ratio comparable with an acquisition at twice the original dose. We created a database of images using a physical breast phantom at doses ranging from 50% to 200% of the standard dose. Clustered microcalcifications were computationally inserted into the phantom images. The channelized Hotelling observer was employed in a four-alternative forced-choice to evaluate the detectability of microcalcifications across different sizes and exposure levels.</p><p><strong>Results: </strong>The restoration of low-dose images acquired at <math><mrow><mo>∼</mo> <mn>75</mn> <mo>%</mo></mrow> </math> of the standard dose resulted in detectability levels comparable with those of images acquired at the standard dose. Moreover, images restored at the standard dose demonstrated detectability similar to those acquired at 160% of the nominal radiation dose, with no statistically significant differences.</p><p><strong>Conclusions: </strong>We demonstrate the potential of an image restoration pipeline to simulate higher quality mammography images. The results indicate that reducing noise through denoising and restoration impacts the detectability of microcalcifications. This method improves image quality without hardware modifications or additional radiation exposure.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 Suppl 2","pages":"S22013"},"PeriodicalIF":1.9,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12175087/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144334186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Su Hyun Lyu, Andrey Makeev, Dan Li, Andreu Badal, Andrew M Hernandez, John M Boone, Stephen J Glick
{"title":"Hybrid simulation of breast CT for assessing microcalcification detectability.","authors":"Su Hyun Lyu, Andrey Makeev, Dan Li, Andreu Badal, Andrew M Hernandez, John M Boone, Stephen J Glick","doi":"10.1117/1.JMI.12.S2.S22015","DOIUrl":"https://doi.org/10.1117/1.JMI.12.S2.S22015","url":null,"abstract":"<p><strong>Purpose: </strong>Virtual imaging trials (VITs) are of interest for regulatory evaluation because they enable faster and more cost-effective evaluation of new imaging technologies than patient clinical trials. Our purpose is to develop a hybrid VIT methodology for breast computed tomography (CT) applications and use it to investigate microcalcification detectability.</p><p><strong>Approach: </strong>Ray tracing was used to generate projection images of clusters of five microcalcifications which varied in diameter, chemical composition, and density. These simulated projection images were added to patient projection images acquired with the fourth-generation breast CT scanner from UC Davis (Doheny) and reconstructed using the Feldkamp filtered backprojection algorithm with varying apodization kernels. Volumes of interest and maximum intensity projections were extracted from the reconstructed volumes. Human observers (HOs) and deep learning model observers (DLMOs) were used to detect calcification clusters, and receiver operating characteristic curve analysis was used to analyze detection performance.</p><p><strong>Results: </strong>DLMO detected 0.18-mm type I calcifications with AUC = 0.80 and 0.21 mm calcifications with <math><mrow><mi>AUC</mi> <mo>=</mo> <mn>0.99</mn></mrow> </math> . HO performance was inferior to deep learning model observer performance, but both HO and DLMO detected 0.21-mm type I calcifications with <math><mrow><mi>AUC</mi> <mo>></mo> <mn>0.90</mn></mrow> </math> and 0.24-mm type I calcifications with near-perfect performance. Microcalcification clusters embedded in adipose tissue were more conspicuous than clusters embedded in fibroglandular tissue. There was superior detection performance for clusters located anteriorly within the breast compared with clusters located posteriorly within the breast.</p><p><strong>Conclusions: </strong>A hybrid approach for virtual imaging trials shows promise for the assessment of imaging systems across a broad range of parameters.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 Suppl 2","pages":"S22015"},"PeriodicalIF":1.9,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12225739/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144576688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Victor Dahlblom, Magnus Dustler, Sophia Zackrisson, Anders Tingberg
{"title":"Workload reduction of digital breast tomosynthesis screening using artificial intelligence and synthetic mammography: a simulation study.","authors":"Victor Dahlblom, Magnus Dustler, Sophia Zackrisson, Anders Tingberg","doi":"10.1117/1.JMI.12.S2.S22005","DOIUrl":"10.1117/1.JMI.12.S2.S22005","url":null,"abstract":"<p><strong>Purpose: </strong>To achieve the high sensitivity of digital breast tomosynthesis (DBT), a time-consuming reading is necessary. However, synthetic mammography (SM) images, equivalent to digital mammography (DM), can be generated from DBT images. SM is faster to read and might be sufficient in many cases. We investigate using artificial intelligence (AI) to stratify examinations into reading of either SM or DBT to minimize workload and maximize accuracy.</p><p><strong>Approach: </strong>This is a retrospective study based on double-read paired DM and one-view DBT from the Malmö Breast Tomosynthesis Screening Trial. DBT examinations were analyzed with the cancer detection AI system ScreenPoint Transpara 1.7. For low-risk examinations, SM reading was simulated by assuming equality with DM reading. For high-risk examinations, the DBT reading results were used. Different combinations of single and double reading were studied.</p><p><strong>Results: </strong>By double-reading the DBT of 30% (4452/14,772) of the cases with the highest risk, and single-reading SM for the rest, 122 cancers would be detected with the same reading workload as DM double reading. That is 28% (27/95) more cancers would be detected than with DM double reading, and in total, 96% (122/127) of the cancers detectable with full DBT double reading would be found.</p><p><strong>Conclusions: </strong>In a DBT-based screening program, AI could be used to select high-risk cases where the reading of DBT is valuable, whereas SM is sufficient for low-risk cases. Substantially, more cancers could be detected compared with DM only, with only a limited increase in reading workload. Prospective studies are necessary.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 Suppl 2","pages":"S22005"},"PeriodicalIF":1.9,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12042222/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144003543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lining Yu, Mengmeng Yin, Ruining Deng, Quan Liu, Tianyuan Yao, Can Cui, Junlin Guo, Yu Wang, Yaohong Wang, Shilin Zhao, Haichun Yang, Yuankai Huo
{"title":"Glo-In-One-v2: holistic identification of glomerular cells, tissues, and lesions in human and mouse histopathology.","authors":"Lining Yu, Mengmeng Yin, Ruining Deng, Quan Liu, Tianyuan Yao, Can Cui, Junlin Guo, Yu Wang, Yaohong Wang, Shilin Zhao, Haichun Yang, Yuankai Huo","doi":"10.1117/1.JMI.12.6.061406","DOIUrl":"10.1117/1.JMI.12.6.061406","url":null,"abstract":"<p><strong>Purpose: </strong>Segmenting intraglomerular tissue and glomerular lesions traditionally depends on detailed morphological evaluations by expert nephropathologists, a labor-intensive process susceptible to interobserver variability. Our group previously developed the Glo-In-One toolkit for integrated glomerulus detection and segmentation. We leverage the Glo-In-One toolkit to version 2 (Glo-In-One-v2), which adds fine-grained segmentation capabilities. We curated 14 distinct labels spanning tissue regions, cells, and lesions across 23,529 annotated glomeruli from human and mouse histopathology data. To our knowledge, this dataset is among the largest of its kind to date.</p><p><strong>Approach: </strong>We present a single dynamic-head deep learning architecture for segmenting 14 classes within partially labeled images from human and mouse kidney pathology. The model was trained on data derived from 368 annotated kidney whole-slide images with five key intraglomerular tissue types and nine glomerular lesion types.</p><p><strong>Results: </strong>The glomerulus segmentation model achieved a decent performance compared with baselines and achieved a 76.5% average Dice similarity coefficient. In addition, transfer learning from rodent to human for the glomerular lesion segmentation model has enhanced the average segmentation accuracy across different types of lesions by more than 3%, as measured by Dice scores.</p><p><strong>Conclusions: </strong>We introduce a convolutional neural network for multiclass segmentation of intraglomerular tissue and lesions. The Glo-In-One-v2 model and pretrained weight are publicly available at https://github.com/hrlblab/Glo-In-One_v2.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 6","pages":"061406"},"PeriodicalIF":1.7,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12303538/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144733981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantification-based explainable artificial intelligence for deep learning decisions: clustering and visualization of quantitative morphometric features in hepatocellular carcinoma discrimination.","authors":"Gen Takagi, Saori Takeyama, Tokiya Abe, Akinori Hashiguchi, Michiie Sakamoto, Kenji Suzuki, Masahiro Yamaguchi","doi":"10.1117/1.JMI.12.6.061407","DOIUrl":"https://doi.org/10.1117/1.JMI.12.6.061407","url":null,"abstract":"<p><strong>Purpose: </strong>Deep learning (DL) is rapidly advancing in computational pathology, offering high diagnostic accuracy but often functioning as a \"black box\" with limited interpretability. This lack of transparency hinders its clinical adoption, emphasizing the need for quantitative explainable artificial intelligence (QXAI) methods. We propose a QXAI approach to objectively and quantitatively elucidate the reasoning behind DL model decisions in hepatocellular carcinoma (HCC) pathological image analysis.</p><p><strong>Approach: </strong>The proposed method utilizes clustering in the latent space of embeddings generated by a DL model to identify regions that contribute to the model's discrimination. Each cluster is then quantitatively characterized by morphometric features obtained through nuclear segmentation using HoverNet and key feature selection with LightGBM. Statistical analysis is performed to assess the importance of selected features, ensuring an interpretable relationship between morphological characteristics and classification outcomes. This approach enables the quantitative interpretation of which regions and features are critical for the model's decision-making, without sacrificing accuracy.</p><p><strong>Results: </strong>Experiments on pathology images of hematoxylin-and-eosin-stained HCC tissue sections showed that the proposed method effectively identified key discriminatory regions and features, such as nuclear size, chromatin density, and shape irregularity. The clustering-based analysis provided structured insights into morphological patterns influencing classification, with explanations evaluated as clinically relevant and interpretable by a pathologist.</p><p><strong>Conclusions: </strong>Our QXAI framework enhances the interpretability of DL-based pathology analysis by linking morphological features to classification decisions. This fosters trust in DL models and facilitates their clinical integration.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 6","pages":"061407"},"PeriodicalIF":1.7,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12513858/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145281599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stepan Romanov, Sacha Howell, Elaine Harkness, Dafydd Gareth Evans, Sue Astley, Martin Fergie
{"title":"Comparing percent breast density assessments of an AI-based method with expert reader estimates: inter-observer variability.","authors":"Stepan Romanov, Sacha Howell, Elaine Harkness, Dafydd Gareth Evans, Sue Astley, Martin Fergie","doi":"10.1117/1.JMI.12.S2.S22011","DOIUrl":"10.1117/1.JMI.12.S2.S22011","url":null,"abstract":"<p><strong>Purpose: </strong>Breast density estimation is an important part of breast cancer risk assessment, as mammographic density is associated with risk. However, density assessed by multiple experts can be subject to high inter-observer variability, so automated methods are increasingly used. We investigate the inter-reader variability and risk prediction for expert assessors and a deep learning approach.</p><p><strong>Approach: </strong>Screening data from a cohort of 1328 women, case-control matched, was used to compare between two expert readers and between a single reader and a deep learning model, Manchester artificial intelligence - visual analog scale (MAI-VAS). Bland-Altman analysis was used to assess the variability and matched concordance index to assess risk.</p><p><strong>Results: </strong>Although the mean differences for the two experiments were alike, the limits of agreement between MAI-VAS and a single reader are substantially lower at +SD (standard deviation) 21 (95% CI: 19.65, 21.69) -SD 22 (95% CI: <math><mrow><mo>-</mo> <mn>22.71</mn></mrow> </math> , <math><mrow><mo>-</mo> <mn>20.68</mn></mrow> </math> ) than between two expert readers +SD 31 (95% CI: 32.08, 29.23) -SD 29 (95% CI: <math><mrow><mo>-</mo> <mn>29.94</mn></mrow> </math> , <math><mrow><mo>-</mo> <mn>27.09</mn></mrow> </math> ). In addition, breast cancer risk discrimination for the deep learning method and density readings from a single expert was similar, with a matched concordance of 0.628 (95% CI: 0.598, 0.658) and 0.624 (95% CI: 0.595, 0.654), respectively. The automatic method had a similar inter-view agreement to experts and maintained consistency across density quartiles.</p><p><strong>Conclusions: </strong>The artificial intelligence breast density assessment tool MAI-VAS has a better inter-observer agreement with a randomly selected expert reader than that between two expert readers. Deep learning-based density methods provide consistent density scores without compromising on breast cancer risk discrimination.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 Suppl 2","pages":"S22011"},"PeriodicalIF":1.9,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12159425/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144303313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}