Victor de la Oliva, Alberto Esteban-Medina, Laura Alejos, Dolores Munoyerro-Muniz, Roman Villegas, Joaquin Dopazo, Carlos Loucera
{"title":"基于真实世界数据的卵巢癌风险早期预测","authors":"Victor de la Oliva, Alberto Esteban-Medina, Laura Alejos, Dolores Munoyerro-Muniz, Roman Villegas, Joaquin Dopazo, Carlos Loucera","doi":"10.1101/2024.07.26.24310994","DOIUrl":null,"url":null,"abstract":"This study presents the development of an early prediction model for high-grade serous ovarian cancer (HGSOC) using real-world data from the Andalusian Health Population Database (BPS), containing electronic health records (EHR) of over 15 million patients. Leveraging the extensive data availability, the model aims to identify individuals at high risk of HGSOC without the need for specific tumor markers or prior stratification into risk groups. Utilizing an Explainable Boosting Machine (EBM) algorithm, the model incorporates diverse clinical variables including demographics, chronic diseases, symptoms, blood test results, and healthcare utilization patterns. The model was trained and validated using a total of 3,088 HGSOC patients diagnosed between 2018 and 2022 along with 114,942 controls of similar characteristics, to emulate the prevalence of the disease, achieving a sensitivity of 0.65 and a specificity of 0.85. This study underscores the importance of using patient data from the general population, demonstrating that effective early detection models can be developed from routinely collected healthcare data. The approach addresses limitations of traditional screening methods by providing a cost-effective and broadly applicable tool for early cancer detection, potentially improving patient outcomes through timely interventions. The interpretability of the early prediction model also offers insights into the most significant predictors of cancer risk, further enhancing its utility in clinical settings.","PeriodicalId":501437,"journal":{"name":"medRxiv - Oncology","volume":"32 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Early prediction of ovarian cancer risk based on real world data\",\"authors\":\"Victor de la Oliva, Alberto Esteban-Medina, Laura Alejos, Dolores Munoyerro-Muniz, Roman Villegas, Joaquin Dopazo, Carlos Loucera\",\"doi\":\"10.1101/2024.07.26.24310994\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study presents the development of an early prediction model for high-grade serous ovarian cancer (HGSOC) using real-world data from the Andalusian Health Population Database (BPS), containing electronic health records (EHR) of over 15 million patients. Leveraging the extensive data availability, the model aims to identify individuals at high risk of HGSOC without the need for specific tumor markers or prior stratification into risk groups. Utilizing an Explainable Boosting Machine (EBM) algorithm, the model incorporates diverse clinical variables including demographics, chronic diseases, symptoms, blood test results, and healthcare utilization patterns. The model was trained and validated using a total of 3,088 HGSOC patients diagnosed between 2018 and 2022 along with 114,942 controls of similar characteristics, to emulate the prevalence of the disease, achieving a sensitivity of 0.65 and a specificity of 0.85. This study underscores the importance of using patient data from the general population, demonstrating that effective early detection models can be developed from routinely collected healthcare data. The approach addresses limitations of traditional screening methods by providing a cost-effective and broadly applicable tool for early cancer detection, potentially improving patient outcomes through timely interventions. The interpretability of the early prediction model also offers insights into the most significant predictors of cancer risk, further enhancing its utility in clinical settings.\",\"PeriodicalId\":501437,\"journal\":{\"name\":\"medRxiv - Oncology\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"medRxiv - Oncology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.07.26.24310994\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Oncology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.26.24310994","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Early prediction of ovarian cancer risk based on real world data
This study presents the development of an early prediction model for high-grade serous ovarian cancer (HGSOC) using real-world data from the Andalusian Health Population Database (BPS), containing electronic health records (EHR) of over 15 million patients. Leveraging the extensive data availability, the model aims to identify individuals at high risk of HGSOC without the need for specific tumor markers or prior stratification into risk groups. Utilizing an Explainable Boosting Machine (EBM) algorithm, the model incorporates diverse clinical variables including demographics, chronic diseases, symptoms, blood test results, and healthcare utilization patterns. The model was trained and validated using a total of 3,088 HGSOC patients diagnosed between 2018 and 2022 along with 114,942 controls of similar characteristics, to emulate the prevalence of the disease, achieving a sensitivity of 0.65 and a specificity of 0.85. This study underscores the importance of using patient data from the general population, demonstrating that effective early detection models can be developed from routinely collected healthcare data. The approach addresses limitations of traditional screening methods by providing a cost-effective and broadly applicable tool for early cancer detection, potentially improving patient outcomes through timely interventions. The interpretability of the early prediction model also offers insights into the most significant predictors of cancer risk, further enhancing its utility in clinical settings.