{"title":"信息学和预测分析的科学","authors":"L. Lenert","doi":"10.1093/jamia/ocz202","DOIUrl":null,"url":null,"abstract":"As an interdisciplinary technologically driven field, the science of informatics is rapidly evolving. In this issue of Journal of the American Medical Informatics Association, we bring together a series of articles and commentaries that describe various aspects of the science of predictive modeling. These articles describe work to ensure that models are useful and valid on release and, perhaps more importantly, continue to be so as clinical processes and patient populations evolve over time. The upshot of the collection is to point out a new direction for informatics research and policy advocacy in the development of models for predictive analytics. Rather than focus on the mechanics of model building and validation, scientists should now be focused on how to document the model, when it is likely to yield benefits, what the model life cycle is, how to maintain models in a sustainable way, and even which types of health care offer the optimal predictive performance. What accounts for this change in context? In the past, bringing the resources, data, and analytical methods together to develop a predictive model was viewed as an innovative and valuable contribution to the science of informatics. However, times have changed. The presence of ubiquitous electronic health record (EHR) systems makes data for modeling commonplace. Standardized clinical data models have been developed, such as the Observational Health Data Sciences and Informatics model, to support low-effort replication of methodologies across studies. Data warehousing methods also have evolved, from the mere storage of data in applications such as Informatics for Integrating Biology and the Bedside (i2b2), to the linkage of data to analytic tools to the Health Insurance Portability and Accountability Act–compliant storage in the cloud (eg, Google Health, Azure, Amazon), lowering most barriers to model development. In addition, methods for unsupervised machine learning (ML) have also evolved and become more user-friendly, bringing together algorithms for data compression, bootstrap dataset regeneration, and analytics into standardized packages. There is widespread agreement on basic statistical measures of performance such as the C-statistic and growing agreement on the importance of measures of calibration such as the Brier score—which is the primary metric in Davis et al’s article on model maintenance—as a supplement to measures of diagnostic accuracy. EHRs and clinical data warehouses ensure that there are sufficient data available in most circumstances for split-sample validation methods further ruggedized by the bootstrap resampling when necessary. As a result, unsupervised ML methods can often produce models with acceptable clinical accuracy (receiver-operating characteristic curves >0.7 or 0.8) in many circumstances; though, as Liu et al suggest, threshold performance for clinical use depends on a wide range of factors. Propensity score methods are widely recognized as important in predictions that can compensate for confounding variables and there is growing confidence in the ability of neural networks to deal with the complex problems caused by missing not-atrandom data. In sum, developers have a full toolbox of data systems and methods. So, if model development for predictive analytics using existing methods of ML is no longer “informatics science,” what is the science now? This issue offers a view. First and foremost, in van Van Calster et al’s commentary, “Predictive analytics in health care: How can we know what works?” calls for transparency in models as the foundation for the new science of clinical usefulness. There is no place for black-box algorithms in our new endeavor. Research must look at the relative performance of any given method, particularly innovations, and characterize the context for the model’s use. Liu et al propose a metric for assessing the usefulness of a model in a given clinical context, called number needed to benefit. This approach borrows from the literature on evaluation of diagnostic testing to create a metric for the number of patients that need to be screened with a model to capture benefit. This metric sums up in a single number, the decision analytically derived number, much of the Reach, Effectiveness, Adoption, Implementation, and Maintenance (RE-AIM) framework for informatics program evaluation. Lenert et al (yes, there is some relation) proposes the concept of a life cycle of predictive models. Namely, in addition to development, there is a maintenance phase, in which a model may need to be recalibrated, and eventual obsolescence. These authors argue that if widely applied, models might become victims of their own success, changing the rates of observed events and negating correlations producing the model. Davis et al go on to propose criteria for the assessment of the sensitivity of a model to changes in key attributes of the clinical context of application: changes in event rate, case mix,","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"394 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"The science of informatics and predictive analytics\",\"authors\":\"L. Lenert\",\"doi\":\"10.1093/jamia/ocz202\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As an interdisciplinary technologically driven field, the science of informatics is rapidly evolving. In this issue of Journal of the American Medical Informatics Association, we bring together a series of articles and commentaries that describe various aspects of the science of predictive modeling. These articles describe work to ensure that models are useful and valid on release and, perhaps more importantly, continue to be so as clinical processes and patient populations evolve over time. The upshot of the collection is to point out a new direction for informatics research and policy advocacy in the development of models for predictive analytics. Rather than focus on the mechanics of model building and validation, scientists should now be focused on how to document the model, when it is likely to yield benefits, what the model life cycle is, how to maintain models in a sustainable way, and even which types of health care offer the optimal predictive performance. What accounts for this change in context? In the past, bringing the resources, data, and analytical methods together to develop a predictive model was viewed as an innovative and valuable contribution to the science of informatics. However, times have changed. The presence of ubiquitous electronic health record (EHR) systems makes data for modeling commonplace. Standardized clinical data models have been developed, such as the Observational Health Data Sciences and Informatics model, to support low-effort replication of methodologies across studies. Data warehousing methods also have evolved, from the mere storage of data in applications such as Informatics for Integrating Biology and the Bedside (i2b2), to the linkage of data to analytic tools to the Health Insurance Portability and Accountability Act–compliant storage in the cloud (eg, Google Health, Azure, Amazon), lowering most barriers to model development. In addition, methods for unsupervised machine learning (ML) have also evolved and become more user-friendly, bringing together algorithms for data compression, bootstrap dataset regeneration, and analytics into standardized packages. There is widespread agreement on basic statistical measures of performance such as the C-statistic and growing agreement on the importance of measures of calibration such as the Brier score—which is the primary metric in Davis et al’s article on model maintenance—as a supplement to measures of diagnostic accuracy. EHRs and clinical data warehouses ensure that there are sufficient data available in most circumstances for split-sample validation methods further ruggedized by the bootstrap resampling when necessary. As a result, unsupervised ML methods can often produce models with acceptable clinical accuracy (receiver-operating characteristic curves >0.7 or 0.8) in many circumstances; though, as Liu et al suggest, threshold performance for clinical use depends on a wide range of factors. Propensity score methods are widely recognized as important in predictions that can compensate for confounding variables and there is growing confidence in the ability of neural networks to deal with the complex problems caused by missing not-atrandom data. In sum, developers have a full toolbox of data systems and methods. So, if model development for predictive analytics using existing methods of ML is no longer “informatics science,” what is the science now? This issue offers a view. First and foremost, in van Van Calster et al’s commentary, “Predictive analytics in health care: How can we know what works?” calls for transparency in models as the foundation for the new science of clinical usefulness. There is no place for black-box algorithms in our new endeavor. Research must look at the relative performance of any given method, particularly innovations, and characterize the context for the model’s use. Liu et al propose a metric for assessing the usefulness of a model in a given clinical context, called number needed to benefit. This approach borrows from the literature on evaluation of diagnostic testing to create a metric for the number of patients that need to be screened with a model to capture benefit. This metric sums up in a single number, the decision analytically derived number, much of the Reach, Effectiveness, Adoption, Implementation, and Maintenance (RE-AIM) framework for informatics program evaluation. Lenert et al (yes, there is some relation) proposes the concept of a life cycle of predictive models. Namely, in addition to development, there is a maintenance phase, in which a model may need to be recalibrated, and eventual obsolescence. These authors argue that if widely applied, models might become victims of their own success, changing the rates of observed events and negating correlations producing the model. Davis et al go on to propose criteria for the assessment of the sensitivity of a model to changes in key attributes of the clinical context of application: changes in event rate, case mix,\",\"PeriodicalId\":236137,\"journal\":{\"name\":\"Journal of the American Medical Informatics Association : JAMIA\",\"volume\":\"394 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Medical Informatics Association : JAMIA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jamia/ocz202\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association : JAMIA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamia/ocz202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
作为一个跨学科的技术驱动领域,信息学正在迅速发展。在本期的《美国医学信息学协会杂志》中,我们汇集了一系列描述预测建模科学各个方面的文章和评论。这些文章描述了确保模型在发布时有用和有效的工作,也许更重要的是,随着临床过程和患者群体的发展,模型将继续有效。该集合的结果是为信息学研究和预测分析模型发展的政策倡导指出了一个新的方向。科学家们现在应该关注的不是模型构建和验证的机制,而是如何记录模型,何时可能产生效益,模型生命周期是什么,如何以可持续的方式维护模型,甚至哪种类型的医疗保健提供最佳的预测性能。是什么导致了这种变化?在过去,将资源、数据和分析方法结合起来开发预测模型被视为对信息学科学的创新和有价值的贡献。然而,时代变了。无处不在的电子健康记录(EHR)系统的存在使得用于建模的数据变得司空见惯。已经开发了标准化的临床数据模型,例如观察性卫生数据科学和信息学模型,以支持跨研究方法的低成本复制。数据仓库方法也发生了变化,从仅仅在应用程序中存储数据,例如集成生物学和床边的信息学(i2b2),到将数据与分析工具联系起来,再到符合《健康保险可移植性和责任法案》的云存储(例如谷歌Health、Azure、Amazon),降低了模型开发的大多数障碍。此外,无监督机器学习(ML)的方法也在不断发展,变得更加用户友好,将数据压缩、自引导数据集再生和分析算法整合到标准化软件包中。在性能的基本统计度量(如c -统计)和校准度量(如Brier分数)的重要性(这是Davis等人关于模型维护的文章中的主要度量)作为诊断准确性度量的补充方面,存在着广泛的共识。电子病历和临床数据仓库确保在大多数情况下有足够的数据可用于分离样本验证方法,在必要时通过自举重新采样进一步加固。因此,在许多情况下,无监督ML方法通常可以产生具有可接受的临床准确性的模型(接受者操作特征曲线>0.7或0.8);然而,正如Liu等人所建议的,临床使用的阈值性能取决于广泛的因素。倾向评分方法在预测中被广泛认为是重要的,它可以补偿混杂变量,人们对神经网络处理由缺失非随机数据引起的复杂问题的能力越来越有信心。总之,开发人员拥有完整的数据系统和方法工具箱。因此,如果使用现有的机器学习方法开发预测分析的模型不再是“信息学科学”,那么现在的科学是什么?本期提供了一种观点。首先,在van van Calster等人的评论中,“医疗保健中的预测分析:我们如何知道什么是有效的?”呼吁将模型的透明度作为临床有用性新科学的基础。在我们的新努力中没有黑箱算法的容身之地。研究必须着眼于任何给定方法的相对性能,特别是创新,并描述模型使用的背景。Liu等人提出了一种评估模型在特定临床环境下有效性的指标,称为获益所需数量。这种方法借鉴了有关诊断测试评估的文献,为需要通过模型进行筛查以获取益处的患者数量创建了一个度量标准。这个度量标准将信息学项目评估的范围、有效性、采用、实施和维护(RE-AIM)框架的大部分内容总结为一个数字,即决策分析导出的数字。Lenert等人(是的,有一些关系)提出了预测模型生命周期的概念。也就是说,除了开发之外,还有一个维护阶段,在这个阶段中,模型可能需要重新校准,并最终被淘汰。这些作者认为,如果被广泛应用,模型可能会成为自身成功的受害者,改变观察到的事件的比率,并否定产生模型的相关性。Davis等人继续提出了评估模型对临床应用环境关键属性变化敏感性的标准:事件发生率、病例组合、
The science of informatics and predictive analytics
As an interdisciplinary technologically driven field, the science of informatics is rapidly evolving. In this issue of Journal of the American Medical Informatics Association, we bring together a series of articles and commentaries that describe various aspects of the science of predictive modeling. These articles describe work to ensure that models are useful and valid on release and, perhaps more importantly, continue to be so as clinical processes and patient populations evolve over time. The upshot of the collection is to point out a new direction for informatics research and policy advocacy in the development of models for predictive analytics. Rather than focus on the mechanics of model building and validation, scientists should now be focused on how to document the model, when it is likely to yield benefits, what the model life cycle is, how to maintain models in a sustainable way, and even which types of health care offer the optimal predictive performance. What accounts for this change in context? In the past, bringing the resources, data, and analytical methods together to develop a predictive model was viewed as an innovative and valuable contribution to the science of informatics. However, times have changed. The presence of ubiquitous electronic health record (EHR) systems makes data for modeling commonplace. Standardized clinical data models have been developed, such as the Observational Health Data Sciences and Informatics model, to support low-effort replication of methodologies across studies. Data warehousing methods also have evolved, from the mere storage of data in applications such as Informatics for Integrating Biology and the Bedside (i2b2), to the linkage of data to analytic tools to the Health Insurance Portability and Accountability Act–compliant storage in the cloud (eg, Google Health, Azure, Amazon), lowering most barriers to model development. In addition, methods for unsupervised machine learning (ML) have also evolved and become more user-friendly, bringing together algorithms for data compression, bootstrap dataset regeneration, and analytics into standardized packages. There is widespread agreement on basic statistical measures of performance such as the C-statistic and growing agreement on the importance of measures of calibration such as the Brier score—which is the primary metric in Davis et al’s article on model maintenance—as a supplement to measures of diagnostic accuracy. EHRs and clinical data warehouses ensure that there are sufficient data available in most circumstances for split-sample validation methods further ruggedized by the bootstrap resampling when necessary. As a result, unsupervised ML methods can often produce models with acceptable clinical accuracy (receiver-operating characteristic curves >0.7 or 0.8) in many circumstances; though, as Liu et al suggest, threshold performance for clinical use depends on a wide range of factors. Propensity score methods are widely recognized as important in predictions that can compensate for confounding variables and there is growing confidence in the ability of neural networks to deal with the complex problems caused by missing not-atrandom data. In sum, developers have a full toolbox of data systems and methods. So, if model development for predictive analytics using existing methods of ML is no longer “informatics science,” what is the science now? This issue offers a view. First and foremost, in van Van Calster et al’s commentary, “Predictive analytics in health care: How can we know what works?” calls for transparency in models as the foundation for the new science of clinical usefulness. There is no place for black-box algorithms in our new endeavor. Research must look at the relative performance of any given method, particularly innovations, and characterize the context for the model’s use. Liu et al propose a metric for assessing the usefulness of a model in a given clinical context, called number needed to benefit. This approach borrows from the literature on evaluation of diagnostic testing to create a metric for the number of patients that need to be screened with a model to capture benefit. This metric sums up in a single number, the decision analytically derived number, much of the Reach, Effectiveness, Adoption, Implementation, and Maintenance (RE-AIM) framework for informatics program evaluation. Lenert et al (yes, there is some relation) proposes the concept of a life cycle of predictive models. Namely, in addition to development, there is a maintenance phase, in which a model may need to be recalibrated, and eventual obsolescence. These authors argue that if widely applied, models might become victims of their own success, changing the rates of observed events and negating correlations producing the model. Davis et al go on to propose criteria for the assessment of the sensitivity of a model to changes in key attributes of the clinical context of application: changes in event rate, case mix,