Vitria Wuri Handayani, Mieke Sylvia Margareth Amiatun Ruth, Riries Rulaningtyas, Muhammad Rasyad Caesarardhi, Bayu Azra Yudhantorro, Ahmad Yudianto
{"title":"Development and evaluation of a convolutional neural network model for sex prediction using cephalometric radiographs and cranial photographs.","authors":"Vitria Wuri Handayani, Mieke Sylvia Margareth Amiatun Ruth, Riries Rulaningtyas, Muhammad Rasyad Caesarardhi, Bayu Azra Yudhantorro, Ahmad Yudianto","doi":"10.1186/s12880-025-01892-x","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Accurately determining sex using features like facial bone profiles and teeth is crucial for identifying unknown victims. Lateral cephalometric radiographs effectively depict the lateral cranial structure, aiding the development of computational identification models.</p><p><strong>Objective: </strong>This study develops and evaluates a sex prediction model using cephalometric radiographs with several convolutional neural network (CNN) architectures. The primary goal is to evaluate the model's performance on standardized radiographic data and real-world cranial photographs to simulate forensic applications.</p><p><strong>Methods: </strong>Six CNN architectures-VGG16, VGG19, MobileNetV2, ResNet50V2, InceptionV3, and InceptionResNetV2-were employed to train and validate 340 cephalometric images of Indonesian individuals aged 18 to 40 years. The data were divided into training (70%), validation (15%), and testing (15%) subsets. Data augmentation was implemented to mitigate class imbalance. Additionally, a set of 40 cranial images from anatomical specimens was employed to evaluate the model's generalizability. Model performance metrics included accuracy, precision, recall, and F1-score.</p><p><strong>Results: </strong>CNN models were trained and evaluated on 340 cephalometric images (255 females and 85 males). VGG19 and ResNet50V2 achieved high F1-scores of 95% (females) and 83% (males), respectively, using cephalometric data, highlighting their strong class-specific performance. Although the overall accuracy exceeded 90%, the F1-score better reflected model performance in this imbalanced dataset. In contrast, performance notably decreased with cranial photographs, particularly when classifying female samples. That is, while InceptionResNetV2 achieved the highest F1-score for cranial photographs (62%), misclassification of females remained significant. Confusion matrices and per-class metrics further revealed persistent issues related to data imbalance and generalization across imaging modalities.</p><p><strong>Conclusions: </strong>Basic CNN models perform well on standardized cephalometric images but less effectively on photographic cranial images, indicating a domain shift between image types that limits generalizability. Improving real-world forensic performance will require further optimization and more diverse training data.</p><p><strong>Clinical trial number: </strong>Not applicable.</p>","PeriodicalId":9020,"journal":{"name":"BMC Medical Imaging","volume":"25 1","pages":"348"},"PeriodicalIF":3.2000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12379395/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12880-025-01892-x","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Accurately determining sex using features like facial bone profiles and teeth is crucial for identifying unknown victims. Lateral cephalometric radiographs effectively depict the lateral cranial structure, aiding the development of computational identification models.
Objective: This study develops and evaluates a sex prediction model using cephalometric radiographs with several convolutional neural network (CNN) architectures. The primary goal is to evaluate the model's performance on standardized radiographic data and real-world cranial photographs to simulate forensic applications.
Methods: Six CNN architectures-VGG16, VGG19, MobileNetV2, ResNet50V2, InceptionV3, and InceptionResNetV2-were employed to train and validate 340 cephalometric images of Indonesian individuals aged 18 to 40 years. The data were divided into training (70%), validation (15%), and testing (15%) subsets. Data augmentation was implemented to mitigate class imbalance. Additionally, a set of 40 cranial images from anatomical specimens was employed to evaluate the model's generalizability. Model performance metrics included accuracy, precision, recall, and F1-score.
Results: CNN models were trained and evaluated on 340 cephalometric images (255 females and 85 males). VGG19 and ResNet50V2 achieved high F1-scores of 95% (females) and 83% (males), respectively, using cephalometric data, highlighting their strong class-specific performance. Although the overall accuracy exceeded 90%, the F1-score better reflected model performance in this imbalanced dataset. In contrast, performance notably decreased with cranial photographs, particularly when classifying female samples. That is, while InceptionResNetV2 achieved the highest F1-score for cranial photographs (62%), misclassification of females remained significant. Confusion matrices and per-class metrics further revealed persistent issues related to data imbalance and generalization across imaging modalities.
Conclusions: Basic CNN models perform well on standardized cephalometric images but less effectively on photographic cranial images, indicating a domain shift between image types that limits generalizability. Improving real-world forensic performance will require further optimization and more diverse training data.
期刊介绍:
BMC Medical Imaging is an open access journal publishing original peer-reviewed research articles in the development, evaluation, and use of imaging techniques and image processing tools to diagnose and manage disease.