{"title":"Protecting patient privacy in survival analyses","authors":"Luca Bonomi, Xiaoqian Jiang, L. Ohno-Machado","doi":"10.1093/jamia/ocz195","DOIUrl":"https://doi.org/10.1093/jamia/ocz195","url":null,"abstract":"OBJECTIVE\u0000Survival analysis is the cornerstone of many healthcare applications in which the \"survival\" probability (eg, time free from a certain disease, time to death) of a group of patients is computed to guide clinical decisions. It is widely used in biomedical research and healthcare applications. However, frequent sharing of exact survival curves may reveal information about the individual patients, as an adversary may infer the presence of a person of interest as a participant of a study or of a particular group. Therefore, it is imperative to develop methods to protect patient privacy in survival analysis.\u0000\u0000\u0000MATERIALS AND METHODS\u0000We develop a framework based on the formal model of differential privacy, which provides provable privacy protection against a knowledgeable adversary. We show the performance of privacy-protecting solutions for the widely used Kaplan-Meier nonparametric survival model.\u0000\u0000\u0000RESULTS\u0000We empirically evaluated the usefulness of our privacy-protecting framework and the reduced privacy risk for a popular epidemiology dataset and a synthetic dataset. Results show that our methods significantly reduce the privacy risk when compared with their nonprivate counterparts, while retaining the utility of the survival curves.\u0000\u0000\u0000DISCUSSION\u0000The proposed framework demonstrates the feasibility of conducting privacy-protecting survival analyses. We discuss future research directions to further enhance the usefulness of our proposed solutions in biomedical research applications.\u0000\u0000\u0000CONCLUSION\u0000The results suggest that our proposed privacy-protection methods provide strong privacy protections while preserving the usefulness of survival analyses.","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133842836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The science of informatics and predictive analytics","authors":"L. Lenert","doi":"10.1093/jamia/ocz202","DOIUrl":"https://doi.org/10.1093/jamia/ocz202","url":null,"abstract":"As an interdisciplinary technologically driven field, the science of informatics is rapidly evolving. In this issue of Journal of the American Medical Informatics Association, we bring together a series of articles and commentaries that describe various aspects of the science of predictive modeling. These articles describe work to ensure that models are useful and valid on release and, perhaps more importantly, continue to be so as clinical processes and patient populations evolve over time. The upshot of the collection is to point out a new direction for informatics research and policy advocacy in the development of models for predictive analytics. Rather than focus on the mechanics of model building and validation, scientists should now be focused on how to document the model, when it is likely to yield benefits, what the model life cycle is, how to maintain models in a sustainable way, and even which types of health care offer the optimal predictive performance. What accounts for this change in context? In the past, bringing the resources, data, and analytical methods together to develop a predictive model was viewed as an innovative and valuable contribution to the science of informatics. However, times have changed. The presence of ubiquitous electronic health record (EHR) systems makes data for modeling commonplace. Standardized clinical data models have been developed, such as the Observational Health Data Sciences and Informatics model, to support low-effort replication of methodologies across studies. Data warehousing methods also have evolved, from the mere storage of data in applications such as Informatics for Integrating Biology and the Bedside (i2b2), to the linkage of data to analytic tools to the Health Insurance Portability and Accountability Act–compliant storage in the cloud (eg, Google Health, Azure, Amazon), lowering most barriers to model development. In addition, methods for unsupervised machine learning (ML) have also evolved and become more user-friendly, bringing together algorithms for data compression, bootstrap dataset regeneration, and analytics into standardized packages. There is widespread agreement on basic statistical measures of performance such as the C-statistic and growing agreement on the importance of measures of calibration such as the Brier score—which is the primary metric in Davis et al’s article on model maintenance—as a supplement to measures of diagnostic accuracy. EHRs and clinical data warehouses ensure that there are sufficient data available in most circumstances for split-sample validation methods further ruggedized by the bootstrap resampling when necessary. As a result, unsupervised ML methods can often produce models with acceptable clinical accuracy (receiver-operating characteristic curves >0.7 or 0.8) in many circumstances; though, as Liu et al suggest, threshold performance for clinical use depends on a wide range of factors. Propensity score methods are widely recognized as important in","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"394 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134005013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"JAMIA reviewer thank you","authors":"","doi":"10.1093/jamia/ocz187","DOIUrl":"https://doi.org/10.1093/jamia/ocz187","url":null,"abstract":"","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122081856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explicit causal reasoning is preferred, but not necessary for pragmatic value","authors":"Matthew C. Lenert, M. Matheny, Colin G. Walsh","doi":"10.1093/jamia/ocz198","DOIUrl":"https://doi.org/10.1093/jamia/ocz198","url":null,"abstract":"In researchers Jenkins, Martin, and Peek dis-cuss some of the benefits of applying causal inference frameworks (CIFs) to predict treatment naı¨ve risk in the domain of risk model-ing. We agree that causality-based models using diagrams are a powerful tool and that these models can avoid the pitfalls of model-mediated changes to the outcome process. 1 CIFs have also demon-strated robustness to unobserved confounders. 2 There are many reasons why explicitly considering causality and estimating baseline risk in the absence of treatments are important when deploying and maintaining prognostic models in clinical operations. While these models have many desirable properties, they are not without their challenges, as Sperrin et al note. CIFs demonstrate a firm understanding of the processes one wishes to improve. Getting to the requisite level of insight to build such a diagram is a long and arduous scientific process. This is not to say many processes cannot be dia-gramed using current knowledge. We feel that incorporating causality where it is well understood is useful, but there are circumstances in which CIFs are likely to be incorrect and have the potential to cause er-ror. Furthermore, causal models require data elements that reflect how a process works. Current bulwark data streams (revenue cycle-focused electronic health records) are","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115377185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explicit causal reasoning is needed to prevent prognostic models being victims of their own success","authors":"M. Sperrin, David A. Jenkins, G. Martin, N. Peek","doi":"10.1093/jamia/ocz197","DOIUrl":"https://doi.org/10.1093/jamia/ocz197","url":null,"abstract":"The recent perspective by Lenert et al 1 provides an accessible and in-formative overview of the full life cycle of prognostic models, com-prising development, deployment, maintenance, and surveillance. The perspective focuses particularly on the fundamental issue that deployment of a prognostic model into clinical practice will lead to changes in decision making or interventions, and hence, changes in clinical outcomes. This has received little attention in the prognostic modeling literature but is important because this changes predictor-outcome associations, meaning that the performance of the model degrades over time; therefore, prognostic models become “victims of their own success.” More seriously, a prediction from such a model is challenging to interpret, as it implicitly reflects both the risk factors and the interventions that similar patients received, in the historical data used to develop the prognostic model. The authors rightly point out that “holistically modeling the outcome and interventions(s)” and “incorporat[ing] the intervention space” are required to overcome this concern. 1 However, the proposed so-lution of directly modeling interventions, or their surrogates, is not sufficient. An explicit causal inference framework is required. When the intended use of a prognostic model is to support deci-sions concerning intervention(s), the counterfactual causal framework provides a natural and powerful way to ensure that predictions issued by the prognostic model are useful, interpret-able, and less vulnerable to degradation over time. The framework allows predictions to be used to answer “what if” questions; for an introduction, see Hernan and Robbins. 2 However, challenging than pure","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133314025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fahd A. Ahmad, Philip R. O. Payne, Ian Lackey, Rachel Komeshak, Kenneth Kenney, Brianna Magnusen, Christopher L. Metts, T. Bailey
{"title":"Using REDCap and Apple ResearchKit to integrate patient questionnaires and clinical decision support into the electronic health record to improve sexually transmitted infection testing in the emergency department","authors":"Fahd A. Ahmad, Philip R. O. Payne, Ian Lackey, Rachel Komeshak, Kenneth Kenney, Brianna Magnusen, Christopher L. Metts, T. Bailey","doi":"10.1093/jamia/ocz182","DOIUrl":"https://doi.org/10.1093/jamia/ocz182","url":null,"abstract":"OBJECTIVE\u0000Audio-enhanced computer-assisted self-interviews (ACASIs) are useful adjuncts for clinical care but are rarely integrated into the electronic health record (EHR). We created a flexible framework for integrating an ACASIs with clinical decision support (CDS) into the EHR. We used this program to identify adolescents at risk for sexually transmitted infections (STIs) in the emergency department (ED). We provide an overview of the software platform and qualitative user acceptance.\u0000\u0000\u0000MATERIALS AND METHODS\u0000We created an ACASI with a CDS algorithm to identify adolescents in need of STI testing. We offered it to 15- to 21-year-old patients in our ED, regardless of ED complaint. We collected user feedback via the ACASI. These were programmed into REDCap (Research Electronic Data Capture), and an iOS application utilizing Apple ResearchKit generated a tablet compatible representation of the ACASI for patients. A custom software program created an HL7 (Health Level Seven) message containing a summary of responses, CDS recommendations, and STI test orders, which were transmitted to the EHR.\u0000\u0000\u0000RESULTS\u0000In the first year, 1788 of 6227 (28.7%) eligible adolescents completed the survey. Technical issues led to decreased use for several months. Patients rated the system favorably, with 1583 of 1787 (88.9%) indicating that they were \"somewhat\" or \"very comfortable\" answering questions electronically and 1291 of 1787 (72.2%) preferring this format over face-to-face interviews or paper questionnaires.\u0000\u0000\u0000CONCLUSIONS\u0000We present a novel use for REDCap to combine patient-answered questionnaires and CDS to improve care for adolescents at risk for STIs. Our program was well received and the platform can be used across disparate patients, topics, and information technology infrastructures.","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133814336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lingjiao Zhang, Xiruo Ding, Yanyuan Ma, Naveen Muthu, I. Ajmal, J. Moore, D. Herman, Jinbo Chen
{"title":"A maximum likelihood approach to electronic health record phenotyping using positive and unlabeled patients","authors":"Lingjiao Zhang, Xiruo Ding, Yanyuan Ma, Naveen Muthu, I. Ajmal, J. Moore, D. Herman, Jinbo Chen","doi":"10.1093/jamia/ocz170","DOIUrl":"https://doi.org/10.1093/jamia/ocz170","url":null,"abstract":"OBJECTIVE\u0000Phenotyping patients using electronic health record (EHR) data conventionally requires labeled cases and controls. Assigning labels requires manual medical chart review and therefore is labor intensive. For some phenotypes, identifying gold-standard controls is prohibitive. We developed an accurate EHR phenotyping approach that does not require labeled controls.\u0000\u0000\u0000MATERIALS AND METHODS\u0000Our framework relies on a random subset of cases, which can be specified using an anchor variable that has excellent positive predictive value and sensitivity independent of predictors. We proposed a maximum likelihood approach that efficiently leverages data from the specified cases and unlabeled patients to develop logistic regression phenotyping models, and compare model performance with existing algorithms.\u0000\u0000\u0000RESULTS\u0000Our method outperformed the existing algorithms on predictive accuracy in Monte Carlo simulation studies, application to identify hypertension patients with hypokalemia requiring oral supplementation using a simulated anchor, and application to identify primary aldosteronism patients using real-world cases and anchor variables. Our method additionally generated consistent estimates of 2 important parameters, phenotype prevalence and the proportion of true cases that are labeled.\u0000\u0000\u0000DISCUSSION\u0000Upon identification of an anchor variable that is scalable and transferable to different practices, our approach should facilitate development of scalable, transferable, and practice-specific phenotyping models.\u0000\u0000\u0000CONCLUSIONS\u0000Our proposed approach enables accurate semiautomated EHR phenotyping with minimal manual labeling and therefore should greatly facilitate EHR clinical decision support and research.","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130503984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Alawad, Shang Gao, John X. Qiu, Hong-Jun Yoon, J. B. Christian, Lynne Penberthy, B. Mumphrey, Xiao-Cheng Wu, Linda Coyle, G. Tourassi
{"title":"Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks","authors":"M. Alawad, Shang Gao, John X. Qiu, Hong-Jun Yoon, J. B. Christian, Lynne Penberthy, B. Mumphrey, Xiao-Cheng Wu, Linda Coyle, G. Tourassi","doi":"10.1093/jamia/ocz153","DOIUrl":"https://doi.org/10.1093/jamia/ocz153","url":null,"abstract":"Abstract Objective We implement 2 different multitask learning (MTL) techniques, hard parameter sharing and cross-stitch, to train a word-level convolutional neural network (CNN) specifically designed for automatic extraction of cancer data from unstructured text in pathology reports. We show the importance of learning related information extraction (IE) tasks leveraging shared representations across the tasks to achieve state-of-the-art performance in classification accuracy and computational efficiency. Materials and Methods Multitask CNN (MTCNN) attempts to tackle document information extraction by learning to extract multiple key cancer characteristics simultaneously. We trained our MTCNN to perform 5 information extraction tasks: (1) primary cancer site (65 classes), (2) laterality (4 classes), (3) behavior (3 classes), (4) histological type (63 classes), and (5) histological grade (5 classes). We evaluated the performance on a corpus of 95 231 pathology documents (71 223 unique tumors) obtained from the Louisiana Tumor Registry. We compared the performance of the MTCNN models against single-task CNN models and 2 traditional machine learning approaches, namely support vector machine (SVM) and random forest classifier (RFC). Results MTCNNs offered superior performance across all 5 tasks in terms of classification accuracy as compared with the other machine learning models. Based on retrospective evaluation, the hard parameter sharing and cross-stitch MTCNN models correctly classified 59.04% and 57.93% of the pathology reports respectively across all 5 tasks. The baseline models achieved 53.68% (CNN), 46.37% (RFC), and 36.75% (SVM). Based on prospective evaluation, the percentages of correctly classified cases across the 5 tasks were 60.11% (hard parameter sharing), 58.13% (cross-stitch), 51.30% (single-task CNN), 42.07% (RFC), and 35.16% (SVM). Moreover, hard parameter sharing MTCNNs outperformed the other models in computational efficiency by using about the same number of trainable parameters as a single-task CNN. Conclusions The hard parameter sharing MTCNN offers superior classification accuracy for automated coding support of pathology documents across a wide range of cancers and multiple information extraction tasks while maintaining similar training and inference time as those of a single task–specific model.","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123295554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corrigendum to: Real world evidence in cardiovascular medicine: assuring data validity in electronic health record-based studies","authors":"T. Hernandez-Boussard","doi":"10.1093/jamia/ocz184","DOIUrl":"https://doi.org/10.1093/jamia/ocz184","url":null,"abstract":"","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130686986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A governance model for the application of AI in health care","authors":"S. Reddy, S. Allan, S. Coghlan, P. Cooper","doi":"10.1093/jamia/ocz192","DOIUrl":"https://doi.org/10.1093/jamia/ocz192","url":null,"abstract":"As the efficacy of artificial intelligence (AI) in improving aspects of healthcare delivery is increasingly becoming evident, it becomes likely that AI will be incorporated in routine clinical care in the near future. This promise has led to growing focus and investment in AI medical applications both from governmental organizations and technological companies. However, concern has been expressed about the ethical and regulatory aspects of the application of AI in health care. These concerns include the possibility of biases, lack of transparency with certain AI algorithms, privacy concerns with the data used for training AI models, and safety and liability issues with AI application in clinical environments. While there has been extensive discussion about the ethics of AI in health care, there has been little dialogue or recommendations as to how to practically address these concerns in health care. In this article, we propose a governance model that aims to not only address the ethical and regulatory issues that arise out of the application of AI in health care, but also stimulate further discussion about governance of AI in health care.","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122642175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}