Jun Tang, Xiang Yin, Jiangyuan Lai, Keyu Luo, Dongdong Wu
{"title":"融合x射线图像和临床数据的骨质疏松症多模态深度学习预测模型:算法开发和验证研究。","authors":"Jun Tang, Xiang Yin, Jiangyuan Lai, Keyu Luo, Dongdong Wu","doi":"10.2196/70738","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Osteoporosis is a bone disease characterized by reduced bone mineral density and mass, which increase the risk of fragility fractures in patients. Artificial intelligence can mine imaging features specific to different bone densities, shapes, and structures and fuse other multimodal features for synergistic diagnosis to improve prediction accuracy.</p><p><strong>Objective: </strong>This study aims to develop a multimodal model that fuses chest X-rays and clinical parameters for opportunistic screening of osteoporosis and to compare and analyze the experimental results with existing methods.</p><p><strong>Methods: </strong>We used multimodal data, including chest X-ray images and clinical data, from a total of 1780 patients at Chongqing Daping Hospital from January 2019 to August 2024. We adopted a probability fusion strategy to construct a multimodal model. In our model, we used a convolutional neural network as the backbone network for image processing and fine-tuned it using a transfer learning technique to suit the specific task of this study. In addition, we introduced a gradient-based wavelet feature extraction method. We combined it with an attention mechanism to assist in feature fusion, which enhanced the model's focus on key regions of the image and further improved its ability to extract image features.</p><p><strong>Results: </strong>The multimodal model proposed in this paper outperforms the traditional methods in the 4 evaluation metrics of area under the curve value, accuracy, sensitivity, and specificity. Compared with using only the X-ray image model, the multimodal model improved the area under the curve value significantly from 0.951 to 0.975 (P=.004), the accuracy from 89.32% to 92.36% (P=.045), the sensitivity from 89.82% to 91.23% (P=.03), and the specificity from 88.64% to 93.92% (P=.008).</p><p><strong>Conclusions: </strong>While the multimodal model that fuses chest X-ray images and clinical data demonstrated superior performance compared to unimodal models and traditional methods, this study has several limitations. The dataset size may not be sufficient to capture the full diversity of the population. The retrospective nature of the study may introduce selection bias, and the lack of external validation limits the generalizability of the findings. Future studies should address these limitations by incorporating larger, more diverse datasets and conducting rigorous external validation to further establish the model's clinical use.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e70738"},"PeriodicalIF":3.8000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12445622/pdf/","citationCount":"0","resultStr":"{\"title\":\"Fusion of X-Ray Images and Clinical Data for a Multimodal Deep Learning Prediction Model of Osteoporosis: Algorithm Development and Validation Study.\",\"authors\":\"Jun Tang, Xiang Yin, Jiangyuan Lai, Keyu Luo, Dongdong Wu\",\"doi\":\"10.2196/70738\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Osteoporosis is a bone disease characterized by reduced bone mineral density and mass, which increase the risk of fragility fractures in patients. Artificial intelligence can mine imaging features specific to different bone densities, shapes, and structures and fuse other multimodal features for synergistic diagnosis to improve prediction accuracy.</p><p><strong>Objective: </strong>This study aims to develop a multimodal model that fuses chest X-rays and clinical parameters for opportunistic screening of osteoporosis and to compare and analyze the experimental results with existing methods.</p><p><strong>Methods: </strong>We used multimodal data, including chest X-ray images and clinical data, from a total of 1780 patients at Chongqing Daping Hospital from January 2019 to August 2024. We adopted a probability fusion strategy to construct a multimodal model. In our model, we used a convolutional neural network as the backbone network for image processing and fine-tuned it using a transfer learning technique to suit the specific task of this study. In addition, we introduced a gradient-based wavelet feature extraction method. We combined it with an attention mechanism to assist in feature fusion, which enhanced the model's focus on key regions of the image and further improved its ability to extract image features.</p><p><strong>Results: </strong>The multimodal model proposed in this paper outperforms the traditional methods in the 4 evaluation metrics of area under the curve value, accuracy, sensitivity, and specificity. Compared with using only the X-ray image model, the multimodal model improved the area under the curve value significantly from 0.951 to 0.975 (P=.004), the accuracy from 89.32% to 92.36% (P=.045), the sensitivity from 89.82% to 91.23% (P=.03), and the specificity from 88.64% to 93.92% (P=.008).</p><p><strong>Conclusions: </strong>While the multimodal model that fuses chest X-ray images and clinical data demonstrated superior performance compared to unimodal models and traditional methods, this study has several limitations. The dataset size may not be sufficient to capture the full diversity of the population. The retrospective nature of the study may introduce selection bias, and the lack of external validation limits the generalizability of the findings. Future studies should address these limitations by incorporating larger, more diverse datasets and conducting rigorous external validation to further establish the model's clinical use.</p>\",\"PeriodicalId\":56334,\"journal\":{\"name\":\"JMIR Medical Informatics\",\"volume\":\"13 \",\"pages\":\"e70738\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12445622/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/70738\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/70738","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
Fusion of X-Ray Images and Clinical Data for a Multimodal Deep Learning Prediction Model of Osteoporosis: Algorithm Development and Validation Study.
Background: Osteoporosis is a bone disease characterized by reduced bone mineral density and mass, which increase the risk of fragility fractures in patients. Artificial intelligence can mine imaging features specific to different bone densities, shapes, and structures and fuse other multimodal features for synergistic diagnosis to improve prediction accuracy.
Objective: This study aims to develop a multimodal model that fuses chest X-rays and clinical parameters for opportunistic screening of osteoporosis and to compare and analyze the experimental results with existing methods.
Methods: We used multimodal data, including chest X-ray images and clinical data, from a total of 1780 patients at Chongqing Daping Hospital from January 2019 to August 2024. We adopted a probability fusion strategy to construct a multimodal model. In our model, we used a convolutional neural network as the backbone network for image processing and fine-tuned it using a transfer learning technique to suit the specific task of this study. In addition, we introduced a gradient-based wavelet feature extraction method. We combined it with an attention mechanism to assist in feature fusion, which enhanced the model's focus on key regions of the image and further improved its ability to extract image features.
Results: The multimodal model proposed in this paper outperforms the traditional methods in the 4 evaluation metrics of area under the curve value, accuracy, sensitivity, and specificity. Compared with using only the X-ray image model, the multimodal model improved the area under the curve value significantly from 0.951 to 0.975 (P=.004), the accuracy from 89.32% to 92.36% (P=.045), the sensitivity from 89.82% to 91.23% (P=.03), and the specificity from 88.64% to 93.92% (P=.008).
Conclusions: While the multimodal model that fuses chest X-ray images and clinical data demonstrated superior performance compared to unimodal models and traditional methods, this study has several limitations. The dataset size may not be sufficient to capture the full diversity of the population. The retrospective nature of the study may introduce selection bias, and the lack of external validation limits the generalizability of the findings. Future studies should address these limitations by incorporating larger, more diverse datasets and conducting rigorous external validation to further establish the model's clinical use.
期刊介绍:
JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals.
Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.