Yu-Xin Guo, Jun-Long Lan, Yu-Xuan Song, Wen-Qin Bu, Yu Tang, Zi-Xuan Wu, Hao-Tian Meng, Di Wu, Hui Yang, Yu-Cheng Guo
{"title":"Different machine learning methods based on maxillary sinus in sex estimation for northwestern Chinese Han population.","authors":"Yu-Xin Guo, Jun-Long Lan, Yu-Xuan Song, Wen-Qin Bu, Yu Tang, Zi-Xuan Wu, Hao-Tian Meng, Di Wu, Hui Yang, Yu-Cheng Guo","doi":"10.1007/s00414-024-03255-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background & objective: </strong>Sex estimation is a critical aspect of forensic expertise. Some special anatomical structures, such as the maxillary sinus, can still maintain integrity in harsh environmental conditions and may be served as a basis for sex estimation. Due to the complex nature of sex estimation, several studies have been conducted using different machine learning algorithms to improve the accuracy of sex prediction from anatomical measurements.</p><p><strong>Material & methods: </strong>In this study, linear data of the maxillary sinus in the population of northwest China by using Cone-Beam Computed Tomography (CBCT) were collected and utilized to develop logistic, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and random forest (RF) models for sex estimation with R 4.3.1. CBCT images from 477 samples of Han population (75 males and 81 females, aged 5-17 years; 162 males and 159 females, aged 18-72) were used to establish and verify the model. Length (MSL), width (MSW), height (MSH) of both the left and right maxillary sinuses and distance of lateral wall between two maxillary sinuses (distance) were measured. 80% of the data were randomly picked as the training set and others were testing set. Besides, these samples were grouped by age bracket and fitted models as an attempt.</p><p><strong>Results: </strong>Overall, the accuracy of the sex estimation for individuals over 18 years old on the testing set was 77.78%, with a slightly higher accuracy rate for males at 78.12% compared to females at 77.42%. However, accuracy of sex estimation for individuals under 18 was challenging. In comparison to logistic, KNN and SVM, RF exhibited higher accuracy rates. Moreover, incorporating age as a variable improved the accuracy of sex estimation, particularly in the 18-27 age group, where the accuracy rate increased to 88.46%. Meanwhile, all variables showed a linear correlation with age.</p><p><strong>Conclusion: </strong>The linear measurements of the maxillary sinus could be a valuable tool for sex estimation in individuals aged 18 and over. A robust RF model has been developed for sex estimation within the Han population residing in the northwestern region of China. The accuracy of sex estimation could be higher when age is used as a predictive variable.</p>","PeriodicalId":14071,"journal":{"name":"International Journal of Legal Medicine","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Legal Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00414-024-03255-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/18 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background & objective: Sex estimation is a critical aspect of forensic expertise. Some special anatomical structures, such as the maxillary sinus, can still maintain integrity in harsh environmental conditions and may be served as a basis for sex estimation. Due to the complex nature of sex estimation, several studies have been conducted using different machine learning algorithms to improve the accuracy of sex prediction from anatomical measurements.
Material & methods: In this study, linear data of the maxillary sinus in the population of northwest China by using Cone-Beam Computed Tomography (CBCT) were collected and utilized to develop logistic, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and random forest (RF) models for sex estimation with R 4.3.1. CBCT images from 477 samples of Han population (75 males and 81 females, aged 5-17 years; 162 males and 159 females, aged 18-72) were used to establish and verify the model. Length (MSL), width (MSW), height (MSH) of both the left and right maxillary sinuses and distance of lateral wall between two maxillary sinuses (distance) were measured. 80% of the data were randomly picked as the training set and others were testing set. Besides, these samples were grouped by age bracket and fitted models as an attempt.
Results: Overall, the accuracy of the sex estimation for individuals over 18 years old on the testing set was 77.78%, with a slightly higher accuracy rate for males at 78.12% compared to females at 77.42%. However, accuracy of sex estimation for individuals under 18 was challenging. In comparison to logistic, KNN and SVM, RF exhibited higher accuracy rates. Moreover, incorporating age as a variable improved the accuracy of sex estimation, particularly in the 18-27 age group, where the accuracy rate increased to 88.46%. Meanwhile, all variables showed a linear correlation with age.
Conclusion: The linear measurements of the maxillary sinus could be a valuable tool for sex estimation in individuals aged 18 and over. A robust RF model has been developed for sex estimation within the Han population residing in the northwestern region of China. The accuracy of sex estimation could be higher when age is used as a predictive variable.
期刊介绍:
The International Journal of Legal Medicine aims to improve the scientific resources used in the elucidation of crime and related forensic applications at a high level of evidential proof. The journal offers review articles tracing development in specific areas, with up-to-date analysis; original articles discussing significant recent research results; case reports describing interesting and exceptional examples; population data; letters to the editors; and technical notes, which appear in a section originally created for rapid publication of data in the dynamic field of DNA analysis.