Enhancing late postmortem interval prediction: a pilot study integrating proteomics and machine learning to distinguish human bone remains over 15 years.
Camila Garcés-Parra, Pablo Saldivia, Mauricio Hernández, Elena Uribe, Juan Román, Marcela Torrejón, José L Gutiérrez, Guillermo Cabrera-Vives, María de Los Ángeles García-Robles, William Aguilar, Miguel Soto, Estefanía Tarifeño-Saldivia
{"title":"Enhancing late postmortem interval prediction: a pilot study integrating proteomics and machine learning to distinguish human bone remains over 15 years.","authors":"Camila Garcés-Parra, Pablo Saldivia, Mauricio Hernández, Elena Uribe, Juan Román, Marcela Torrejón, José L Gutiérrez, Guillermo Cabrera-Vives, María de Los Ángeles García-Robles, William Aguilar, Miguel Soto, Estefanía Tarifeño-Saldivia","doi":"10.1186/s40659-024-00552-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Determining the postmortem interval (PMI) accurately remains a significant challenge in forensic sciences, especially for intervals greater than 5 years (late PMI). Traditional methods often fail due to the extensive degradation of soft tissues, necessitating reliance on bone material examinations. The precision in estimating PMIs diminishes with time, particularly for intervals between 1 and 5 years, dropping to about 50% accuracy. This study aims to address this issue by identifying key protein biomarkers through proteomics and machine learning, ultimately enhancing the accuracy of PMI estimation for intervals exceeding 15 years.</p><p><strong>Methods: </strong>Proteomic analysis was conducted using LC-MS/MS on skeletal remains, specifically focusing on the tibia and ribs. Protein identification was performed using two strategies: a tryptic-specific search and a semitryptic search, the latter being particularly beneficial in cases of natural protein degradation. The Random Forest algorithm was used to model protein abundance data, enabling the prediction of PMI. A thorough screening process, combining importance scores and SHAP values, was employed to identify the most informative proteins for model's training and accuracy.</p><p><strong>Results: </strong>A minimal set of three biomarkers-K1C13, PGS1, and CO3A1-was identified, significantly improving the prediction accuracy between PMIs of 15 and 20 years. The model, based on protein abundance data from semitryptic peptides in tibia samples, achieved sustained 100% accuracy across 100 iterations. In contrast, non-supervised methods like PCA and MCA did not yield comparable results. Additionally, the use of semitryptic peptides outperformed tryptic peptides, particularly in tibia proteomes, suggesting their potential reliability in late PMI prediction.</p><p><strong>Conclusions: </strong>Despite limitations such as sample size and PMI range, this study demonstrates the feasibility of combining proteomics and machine learning for accurate late PMI predictions. Future research should focus on broader PMI ranges and various bone types to further refine and standardize forensic proteomic methodologies for PMI estimation.</p>","PeriodicalId":9084,"journal":{"name":"Biological Research","volume":"57 1","pages":"75"},"PeriodicalIF":4.3000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11515459/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biological Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s40659-024-00552-8","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Determining the postmortem interval (PMI) accurately remains a significant challenge in forensic sciences, especially for intervals greater than 5 years (late PMI). Traditional methods often fail due to the extensive degradation of soft tissues, necessitating reliance on bone material examinations. The precision in estimating PMIs diminishes with time, particularly for intervals between 1 and 5 years, dropping to about 50% accuracy. This study aims to address this issue by identifying key protein biomarkers through proteomics and machine learning, ultimately enhancing the accuracy of PMI estimation for intervals exceeding 15 years.
Methods: Proteomic analysis was conducted using LC-MS/MS on skeletal remains, specifically focusing on the tibia and ribs. Protein identification was performed using two strategies: a tryptic-specific search and a semitryptic search, the latter being particularly beneficial in cases of natural protein degradation. The Random Forest algorithm was used to model protein abundance data, enabling the prediction of PMI. A thorough screening process, combining importance scores and SHAP values, was employed to identify the most informative proteins for model's training and accuracy.
Results: A minimal set of three biomarkers-K1C13, PGS1, and CO3A1-was identified, significantly improving the prediction accuracy between PMIs of 15 and 20 years. The model, based on protein abundance data from semitryptic peptides in tibia samples, achieved sustained 100% accuracy across 100 iterations. In contrast, non-supervised methods like PCA and MCA did not yield comparable results. Additionally, the use of semitryptic peptides outperformed tryptic peptides, particularly in tibia proteomes, suggesting their potential reliability in late PMI prediction.
Conclusions: Despite limitations such as sample size and PMI range, this study demonstrates the feasibility of combining proteomics and machine learning for accurate late PMI predictions. Future research should focus on broader PMI ranges and various bone types to further refine and standardize forensic proteomic methodologies for PMI estimation.
期刊介绍:
Biological Research is an open access, peer-reviewed journal that encompasses diverse fields of experimental biology, such as biochemistry, bioinformatics, biotechnology, cell biology, cancer, chemical biology, developmental biology, evolutionary biology, genetics, genomics, immunology, marine biology, microbiology, molecular biology, neuroscience, plant biology, physiology, stem cell research, structural biology and systems biology.