利用Kaplan-Meier曲线评估生存数据的归算技术。

Nina Cassandra Wiegers, Sebastian Germer, Christiane Rudolph, Natalie Rath, Katharina Rausch, Alexander Katalinic, Heinz Handels
{"title":"利用Kaplan-Meier曲线评估生存数据的归算技术。","authors":"Nina Cassandra Wiegers, Sebastian Germer, Christiane Rudolph, Natalie Rath, Katharina Rausch, Alexander Katalinic, Heinz Handels","doi":"10.3233/SHTI251487","DOIUrl":null,"url":null,"abstract":"<p><p>Cancer registries collect data about cancer patients, such as information about the tumor histology and progress, but tend to be incomplete in some variables, which complicates further analysis like survival probabilities. Imputation can benefit these analyses. Most imputation methods aim to learn the underlying data distribution of the available data, but often only feature-wise errors are evaluated. In this paper a new approach to evaluate the learned data distribution in case of survival analysis for two state-of-the-art imputation methods is presented. To estimate the survival probability after a cancer diagnosis Kaplan-Meier (KM) curves are used and calculated for survival cohorts. Stratifying the data using the UICC tumor stadium, we aim to evaluate the imputation quality using the comparison of the survival time probability. Two KM curves are calculated for each UICC-stage, while one curve is based on the survival time of the known UICC-stage and the other is computed for the survival times of the imputed UICC-stages. Differences in KM curves will be tested with a log-rank test, a modified Manhattan-Distance and the maximum absolute distance. The best result for all evaluation metrics is achieved for the UICC-stage II, which was imputed with the imputer Miss Forest and aligns well with the qualitative result of the plotted KM curves. Especially for the survival analysis the proposed metrics can help epidemiological researchers to choose an imputation method, which can preserve the trend of the survival probabilities.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"332 ","pages":"17-21"},"PeriodicalIF":0.0000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating Imputation Techniques for Survival Data Utilizing Kaplan-Meier Curves.\",\"authors\":\"Nina Cassandra Wiegers, Sebastian Germer, Christiane Rudolph, Natalie Rath, Katharina Rausch, Alexander Katalinic, Heinz Handels\",\"doi\":\"10.3233/SHTI251487\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Cancer registries collect data about cancer patients, such as information about the tumor histology and progress, but tend to be incomplete in some variables, which complicates further analysis like survival probabilities. Imputation can benefit these analyses. Most imputation methods aim to learn the underlying data distribution of the available data, but often only feature-wise errors are evaluated. In this paper a new approach to evaluate the learned data distribution in case of survival analysis for two state-of-the-art imputation methods is presented. To estimate the survival probability after a cancer diagnosis Kaplan-Meier (KM) curves are used and calculated for survival cohorts. Stratifying the data using the UICC tumor stadium, we aim to evaluate the imputation quality using the comparison of the survival time probability. Two KM curves are calculated for each UICC-stage, while one curve is based on the survival time of the known UICC-stage and the other is computed for the survival times of the imputed UICC-stages. Differences in KM curves will be tested with a log-rank test, a modified Manhattan-Distance and the maximum absolute distance. The best result for all evaluation metrics is achieved for the UICC-stage II, which was imputed with the imputer Miss Forest and aligns well with the qualitative result of the plotted KM curves. Especially for the survival analysis the proposed metrics can help epidemiological researchers to choose an imputation method, which can preserve the trend of the survival probabilities.</p>\",\"PeriodicalId\":94357,\"journal\":{\"name\":\"Studies in health technology and informatics\",\"volume\":\"332 \",\"pages\":\"17-21\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Studies in health technology and informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/SHTI251487\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI251487","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

癌症登记处收集有关癌症患者的数据,如肿瘤组织学和进展的信息,但在一些变量中往往是不完整的,这使得进一步的分析复杂化,如生存概率。假设可以使这些分析受益。大多数插值方法旨在了解可用数据的底层数据分布,但通常只评估特征错误。本文提出了一种新的方法来评估两种最先进的估算方法在生存分析情况下的学习数据分布。为了估计癌症诊断后的生存率,使用Kaplan-Meier (KM)曲线对生存队列进行计算。使用UICC肿瘤体育场对数据进行分层,我们的目的是通过生存时间概率的比较来评估植入质量。对每个uicc阶段计算两条KM曲线,其中一条曲线基于已知uicc阶段的生存时间,另一条曲线基于估算的uicc阶段的生存时间。KM曲线的差异将通过对数秩检验、修正的曼哈顿距离和最大绝对距离来检验。UICC-stage II的所有评价指标均取得了最佳结果,该指标由Miss Forest估算,与绘制的KM曲线的定性结果很好地吻合。特别是在生存分析方面,提出的指标可以帮助流行病学研究人员选择一种能够保持生存概率趋势的归算方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluating Imputation Techniques for Survival Data Utilizing Kaplan-Meier Curves.

Cancer registries collect data about cancer patients, such as information about the tumor histology and progress, but tend to be incomplete in some variables, which complicates further analysis like survival probabilities. Imputation can benefit these analyses. Most imputation methods aim to learn the underlying data distribution of the available data, but often only feature-wise errors are evaluated. In this paper a new approach to evaluate the learned data distribution in case of survival analysis for two state-of-the-art imputation methods is presented. To estimate the survival probability after a cancer diagnosis Kaplan-Meier (KM) curves are used and calculated for survival cohorts. Stratifying the data using the UICC tumor stadium, we aim to evaluate the imputation quality using the comparison of the survival time probability. Two KM curves are calculated for each UICC-stage, while one curve is based on the survival time of the known UICC-stage and the other is computed for the survival times of the imputed UICC-stages. Differences in KM curves will be tested with a log-rank test, a modified Manhattan-Distance and the maximum absolute distance. The best result for all evaluation metrics is achieved for the UICC-stage II, which was imputed with the imputer Miss Forest and aligns well with the qualitative result of the plotted KM curves. Especially for the survival analysis the proposed metrics can help epidemiological researchers to choose an imputation method, which can preserve the trend of the survival probabilities.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信