Nina Cassandra Wiegers, Sebastian Germer, Christiane Rudolph, Natalie Rath, Katharina Rausch, Alexander Katalinic, Heinz Handels
{"title":"利用Kaplan-Meier曲线评估生存数据的归算技术。","authors":"Nina Cassandra Wiegers, Sebastian Germer, Christiane Rudolph, Natalie Rath, Katharina Rausch, Alexander Katalinic, Heinz Handels","doi":"10.3233/SHTI251487","DOIUrl":null,"url":null,"abstract":"<p><p>Cancer registries collect data about cancer patients, such as information about the tumor histology and progress, but tend to be incomplete in some variables, which complicates further analysis like survival probabilities. Imputation can benefit these analyses. Most imputation methods aim to learn the underlying data distribution of the available data, but often only feature-wise errors are evaluated. In this paper a new approach to evaluate the learned data distribution in case of survival analysis for two state-of-the-art imputation methods is presented. To estimate the survival probability after a cancer diagnosis Kaplan-Meier (KM) curves are used and calculated for survival cohorts. Stratifying the data using the UICC tumor stadium, we aim to evaluate the imputation quality using the comparison of the survival time probability. Two KM curves are calculated for each UICC-stage, while one curve is based on the survival time of the known UICC-stage and the other is computed for the survival times of the imputed UICC-stages. Differences in KM curves will be tested with a log-rank test, a modified Manhattan-Distance and the maximum absolute distance. The best result for all evaluation metrics is achieved for the UICC-stage II, which was imputed with the imputer Miss Forest and aligns well with the qualitative result of the plotted KM curves. Especially for the survival analysis the proposed metrics can help epidemiological researchers to choose an imputation method, which can preserve the trend of the survival probabilities.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"332 ","pages":"17-21"},"PeriodicalIF":0.0000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating Imputation Techniques for Survival Data Utilizing Kaplan-Meier Curves.\",\"authors\":\"Nina Cassandra Wiegers, Sebastian Germer, Christiane Rudolph, Natalie Rath, Katharina Rausch, Alexander Katalinic, Heinz Handels\",\"doi\":\"10.3233/SHTI251487\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Cancer registries collect data about cancer patients, such as information about the tumor histology and progress, but tend to be incomplete in some variables, which complicates further analysis like survival probabilities. Imputation can benefit these analyses. Most imputation methods aim to learn the underlying data distribution of the available data, but often only feature-wise errors are evaluated. In this paper a new approach to evaluate the learned data distribution in case of survival analysis for two state-of-the-art imputation methods is presented. To estimate the survival probability after a cancer diagnosis Kaplan-Meier (KM) curves are used and calculated for survival cohorts. Stratifying the data using the UICC tumor stadium, we aim to evaluate the imputation quality using the comparison of the survival time probability. Two KM curves are calculated for each UICC-stage, while one curve is based on the survival time of the known UICC-stage and the other is computed for the survival times of the imputed UICC-stages. Differences in KM curves will be tested with a log-rank test, a modified Manhattan-Distance and the maximum absolute distance. The best result for all evaluation metrics is achieved for the UICC-stage II, which was imputed with the imputer Miss Forest and aligns well with the qualitative result of the plotted KM curves. Especially for the survival analysis the proposed metrics can help epidemiological researchers to choose an imputation method, which can preserve the trend of the survival probabilities.</p>\",\"PeriodicalId\":94357,\"journal\":{\"name\":\"Studies in health technology and informatics\",\"volume\":\"332 \",\"pages\":\"17-21\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Studies in health technology and informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/SHTI251487\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI251487","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluating Imputation Techniques for Survival Data Utilizing Kaplan-Meier Curves.
Cancer registries collect data about cancer patients, such as information about the tumor histology and progress, but tend to be incomplete in some variables, which complicates further analysis like survival probabilities. Imputation can benefit these analyses. Most imputation methods aim to learn the underlying data distribution of the available data, but often only feature-wise errors are evaluated. In this paper a new approach to evaluate the learned data distribution in case of survival analysis for two state-of-the-art imputation methods is presented. To estimate the survival probability after a cancer diagnosis Kaplan-Meier (KM) curves are used and calculated for survival cohorts. Stratifying the data using the UICC tumor stadium, we aim to evaluate the imputation quality using the comparison of the survival time probability. Two KM curves are calculated for each UICC-stage, while one curve is based on the survival time of the known UICC-stage and the other is computed for the survival times of the imputed UICC-stages. Differences in KM curves will be tested with a log-rank test, a modified Manhattan-Distance and the maximum absolute distance. The best result for all evaluation metrics is achieved for the UICC-stage II, which was imputed with the imputer Miss Forest and aligns well with the qualitative result of the plotted KM curves. Especially for the survival analysis the proposed metrics can help epidemiological researchers to choose an imputation method, which can preserve the trend of the survival probabilities.