利用Kaplan-Meier曲线评估生存数据的归算技术。

Studies in health technology and informatics Pub Date : 2025-10-02 DOI:10.3233/SHTI251487

Nina Cassandra Wiegers, Sebastian Germer, Christiane Rudolph, Natalie Rath, Katharina Rausch, Alexander Katalinic, Heinz Handels

{"title":"利用Kaplan-Meier曲线评估生存数据的归算技术。","authors":"Nina Cassandra Wiegers, Sebastian Germer, Christiane Rudolph, Natalie Rath, Katharina Rausch, Alexander Katalinic, Heinz Handels","doi":"10.3233/SHTI251487","DOIUrl":null,"url":null,"abstract":"Cancer registries collect data about cancer patients, such as information about the tumor histology and progress, but tend to be incomplete in some variables, which complicates further analysis like survival probabilities. Imputation can benefit these analyses. Most imputation methods aim to learn the underlying data distribution of the available data, but often only feature-wise errors are evaluated. In this paper a new approach to evaluate the learned data distribution in case of survival analysis for two state-of-the-art imputation methods is presented. To estimate the survival probability after a cancer diagnosis Kaplan-Meier (KM) curves are used and calculated for survival cohorts. Stratifying the data using the UICC tumor stadium, we aim to evaluate the imputation quality using the comparison of the survival time probability. Two KM curves are calculated for each UICC-stage, while one curve is based on the survival time of the known UICC-stage and the other is computed for the survival times of the imputed UICC-stages. Differences in KM curves will be tested with a log-rank test, a modified Manhattan-Distance and the maximum absolute distance. The best result for all evaluation metrics is achieved for the UICC-stage II, which was imputed with the imputer Miss Forest and aligns well with the qualitative result of the plotted KM curves. Especially for the survival analysis the proposed metrics can help epidemiological researchers to choose an imputation method, which can preserve the trend of the survival probabilities.","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"332 ","pages":"17-21"},"PeriodicalIF":0.0000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating Imputation Techniques for Survival Data Utilizing Kaplan-Meier Curves.\",\"authors\":\"Nina Cassandra Wiegers, Sebastian Germer, Christiane Rudolph, Natalie Rath, Katharina Rausch, Alexander Katalinic, Heinz Handels\",\"doi\":\"10.3233/SHTI251487\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cancer registries collect data about cancer patients, such as information about the tumor histology and progress, but tend to be incomplete in some variables, which complicates further analysis like survival probabilities. Imputation can benefit these analyses. Most imputation methods aim to learn the underlying data distribution of the available data, but often only feature-wise errors are evaluated. In this paper a new approach to evaluate the learned data distribution in case of survival analysis for two state-of-the-art imputation methods is presented. To estimate the survival probability after a cancer diagnosis Kaplan-Meier (KM) curves are used and calculated for survival cohorts. Stratifying the data using the UICC tumor stadium, we aim to evaluate the imputation quality using the comparison of the survival time probability. Two KM curves are calculated for each UICC-stage, while one curve is based on the survival time of the known UICC-stage and the other is computed for the survival times of the imputed UICC-stages. Differences in KM curves will be tested with a log-rank test, a modified Manhattan-Distance and the maximum absolute distance. The best result for all evaluation metrics is achieved for the UICC-stage II, which was imputed with the imputer Miss Forest and aligns well with the qualitative result of the plotted KM curves. Especially for the survival analysis the proposed metrics can help epidemiological researchers to choose an imputation method, which can preserve the trend of the survival probabilities.\",\"PeriodicalId\":94357,\"journal\":{\"name\":\"Studies in health technology and informatics\",\"volume\":\"332 \",\"pages\":\"17-21\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Studies in health technology and informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/SHTI251487\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI251487","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

癌症登记处收集有关癌症患者的数据，如肿瘤组织学和进展的信息，但在一些变量中往往是不完整的，这使得进一步的分析复杂化，如生存概率。假设可以使这些分析受益。大多数插值方法旨在了解可用数据的底层数据分布，但通常只评估特征错误。本文提出了一种新的方法来评估两种最先进的估算方法在生存分析情况下的学习数据分布。为了估计癌症诊断后的生存率，使用Kaplan-Meier （KM）曲线对生存队列进行计算。使用UICC肿瘤体育场对数据进行分层，我们的目的是通过生存时间概率的比较来评估植入质量。对每个uicc阶段计算两条KM曲线，其中一条曲线基于已知uicc阶段的生存时间，另一条曲线基于估算的uicc阶段的生存时间。KM曲线的差异将通过对数秩检验、修正的曼哈顿距离和最大绝对距离来检验。UICC-stage II的所有评价指标均取得了最佳结果，该指标由Miss Forest估算，与绘制的KM曲线的定性结果很好地吻合。特别是在生存分析方面，提出的指标可以帮助流行病学研究人员选择一种能够保持生存概率趋势的归算方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating Imputation Techniques for Survival Data Utilizing Kaplan-Meier Curves.

Cancer registries collect data about cancer patients, such as information about the tumor histology and progress, but tend to be incomplete in some variables, which complicates further analysis like survival probabilities. Imputation can benefit these analyses. Most imputation methods aim to learn the underlying data distribution of the available data, but often only feature-wise errors are evaluated. In this paper a new approach to evaluate the learned data distribution in case of survival analysis for two state-of-the-art imputation methods is presented. To estimate the survival probability after a cancer diagnosis Kaplan-Meier (KM) curves are used and calculated for survival cohorts. Stratifying the data using the UICC tumor stadium, we aim to evaluate the imputation quality using the comparison of the survival time probability. Two KM curves are calculated for each UICC-stage, while one curve is based on the survival time of the known UICC-stage and the other is computed for the survival times of the imputed UICC-stages. Differences in KM curves will be tested with a log-rank test, a modified Manhattan-Distance and the maximum absolute distance. The best result for all evaluation metrics is achieved for the UICC-stage II, which was imputed with the imputer Miss Forest and aligns well with the qualitative result of the plotted KM curves. Especially for the survival analysis the proposed metrics can help epidemiological researchers to choose an imputation method, which can preserve the trend of the survival probabilities.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Studies in health technology and informatics

自引率

0.00%

发文量