V K Cody Bumgardner, Sam Armstrong, Alexandr Virodov, Caylin Hickey
{"title":"Automated Curation and AI Workflow Management System for Digital Pathology.","authors":"V K Cody Bumgardner, Sam Armstrong, Alexandr Virodov, Caylin Hickey","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Digital pathology applications present several challenges, including the processing, storage, and distribution of gigapixel images across distributed computational resources and viewing stations. Individual slides must be available for interactive review, and large repositories must be programmatically accessible for dataset and model building. We present a platform to manage and process multi-modal pathology data (images and case information) across multiple locations. Using an agent-based system coupled with open-source automated machine learning and review tools allows not only dynamic load-balancing and cross-network operation but also the development of research and clinical AI models using the data managed by the platform. The platform presented covers end-to-end AI workflow from data acquisition and curation through model training and evaluation allowing for sharing and review. We conclude with a case study of colon and prostate cancer model development utilizing the presented system.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283146/pdf/2214.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10089113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Serena Jinchen Xie, Flavia P Kapos, Stephen J Mooney, Sean Mooney, Kari A Stephens, Cynthia Chen, Andrea L Hartzler, Abhishek Pratap
{"title":"Geospatial divide in real-world EHR data: Analytical workflow to assess regional biases and potential impact on health equity.","authors":"Serena Jinchen Xie, Flavia P Kapos, Stephen J Mooney, Sean Mooney, Kari A Stephens, Cynthia Chen, Andrea L Hartzler, Abhishek Pratap","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Real-world data (RWD) like electronic health records (EHR) has great potential for secondary use by health systems and researchers. However, collected primarily for efficient health care, EHR data may not equitably represent local regions and populations, impacting the generalizability of insights learned from it. We assessed the geospatial representativeness of regions in a large health system EHR data using a spatial analysis workflow, which provides a data-driven way to quantify geospatial representation and identify adequately represented regions. We applied the workflow to investigate geospatial patterns of overweight/obesity and depression patients to find regional \"hotspots\" for potential targeted interventions. Our findings show the presence of geospatial bias in EHR and demonstrate the workflow to identify spatial clusters after adjusting for bias due to the geospatial representativeness. This work highlights the importance of evaluating geospatial representativeness in RWD to guide targeted deployment of limited healthcare resources and generate equitable real-world evidence.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283143/pdf/2310.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9703645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Hidden Patient Connections: Predicting Hormonal Therapy Medication Discontinuation Using Hypergraph Neural Network on Clinical Communications.","authors":"Qingyuan Song, Yunfei Hu, Congning Ni, Zhijun Yin","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Hormonal therapy is an important adjuvant treatment for breast cancer patients, but medication discontinuation of such therapy is not uncommon. The goal of this paper is to conduct research on the modeling of clinic communications, which have shown value in understanding medication discontinuation, to predict the discontinuation of hormonal therapy medications. Notably, we leveraged the Hypergraph Neural Network to capture the hidden connections of patients that were inferred from clinical communications. Combining the content of clinical communications as well as the demographics, insurance, and cancer stage information, our model achieved an AUC of 67.9%, which significantly outperformed other baselines such as Graph Convolutional Network (65.3%), Random Forest (62.7%), and Support Vector Machine (62.8%). Our study suggested that incorporating the hidden patient connections encoded in clinical communications into prediction models could boost their performance. Future research would consider combining structured medical records and clinical communications to better predict medication discontinuation.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283142/pdf/2435.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9711833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pankhuri Singhal, Lindsay Guare, Colleen Morse, Anastasia Lucas, Marta Byrska-Bishop, Marie A Guerraty, Dokyoon Kim, Marylyn D Ritchie, Anurag Verma
{"title":"DETECT: Feature extraction method for disease trajectory modeling in electronic health records.","authors":"Pankhuri Singhal, Lindsay Guare, Colleen Morse, Anastasia Lucas, Marta Byrska-Bishop, Marie A Guerraty, Dokyoon Kim, Marylyn D Ritchie, Anurag Verma","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Modeling with longitudinal electronic health record (EHR) data proves challenging given the high dimensionality, redundancy, and noise captured in EHR. In order to improve precision medicine strategies and identify predictors of disease risk in advance, evaluating meaningful patient disease trajectories is essential. In this study, we develop the algorithm <b>D</b>iseas<b>E T</b>rajectory f<b>E</b>ature extra<b>CT</b>ion (<b>DETECT)</b> for feature extraction and trajectory generation in high-throughput temporal EHR data. This algorithm can 1) simulate longitudinal individual-level EHR data, specified to user parameters of scale, complexity, and noise and 2) use a convergent relative risk framework to test intermediate codes occurring between specified index code(s) and outcome code(s) to determine if they are predictive features of the outcome. Temporal range can be specified to investigate predictors occurring during a specific period of time prior to onset of the outcome. We benchmarked our method on simulated data and generated real-world disease trajectories using DETECT in a cohort of 145,575 individuals diagnosed with hypertension in Penn Medicine EHR for severe cardiometabolic outcomes.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283148/pdf/2354.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9715631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shiyi Jiang, Rungang Han, Krishnendu Chakrabarty, David Page, William W Stead, Anru R Zhang
{"title":"Timeline Registration for Electronic Health Records.","authors":"Shiyi Jiang, Rungang Han, Krishnendu Chakrabarty, David Page, William W Stead, Anru R Zhang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Electronic Health Record (EHR) data are captured over time as patients receive care. Accordingly, variations among patients, such as when a patient presents for care during the course of a disease, introduce bias into standard longitudinal EHR data analysis methods. We, therefore, aim to provide an alignment method that reduces this bias. We structure this task as a registration problem. While limited prior research on longitudinal EHR data considered registration, we propose a robust registration method to provide better data alignment by estimating the optimum time shift at each time point. We validate the proposed method for mortality prediction. We utilize a Recurrent Neural Network (RNN), time-varying Cox regression model, and Logistic Regression (LR) for mortality prediction. Results suggest our proposed registration method enhances mortality prediction with at least a 1-2% increase in major evaluation metrics utilized.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283114/pdf/2036.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9711836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erika Rasnick, Patrick Ryan, Jeff Blossom, Heike Luttmann-Gibson, Nathan Lothrop, Rima Habre, Diane R Gold, Andrew Vancil, Joel Schwartz, James E Gern, Cole Brokamp
{"title":"High Resolution and Spatiotemporal Place-Based Computable Exposures at Scale.","authors":"Erika Rasnick, Patrick Ryan, Jeff Blossom, Heike Luttmann-Gibson, Nathan Lothrop, Rima Habre, Diane R Gold, Andrew Vancil, Joel Schwartz, James E Gern, Cole Brokamp","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Place-based exposures, termed \"geomarkers\", are powerful determinants of health but are often understudied because of a lack of open data and integration tools. Existing DeGAUSS (Decentralized Geomarker Assessment for Multisite Studies) software has been successfully implemented in multi-site studies, ensuring reproducibility and protection of health information. However, DeGAUSS relies on transporting geomarker data, which is not feasible for high-resolution spatiotemporal data too large to store locally or download over the internet. We expanded the DeGAUSS framework for high-resolution spatiotemporal geomarkers. Our approach stores data subsets based on coarsened location and year in an online repository, and appropriate subsets are downloaded to complete exposure assessment locally using exact date and location. We created and validated two free and open-source DeGAUSS containers for estimation of high-resolution, daily ambient air pollutant exposures, transforming published exposure assessment models into computable exposures for geomarker assessment at scale.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283107/pdf/2349.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9712649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sumita Garai, Frederick Xu, Duy Anh Duong-Tran, Yize Zhao, Li Shen
{"title":"Mining Correlation between Fluid Intelligence and Whole-brain Large Scale Structural Connectivity.","authors":"Sumita Garai, Frederick Xu, Duy Anh Duong-Tran, Yize Zhao, Li Shen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Exploring the neural basis of intelligence and the corresponding associations with brain network has been an active area of research in network neuroscience. Up to now, the majority of explorations mining human intelligence in brain connectomics leverages whole-brain functional connectivity patterns. In this study, structural connectivity patterns are instead used to explore relationships between brain connectivity and different behavioral/cognitive measures such as fluid intelligence. Specifically, we conduct a study using the 397 unrelated subjects from Human Connectome Project (Young Adults) dataset to estimate individual level structural connectivity matrices. We show that topological features, as quantified by our proposed measurements: Average Persistence (AP) and Persistent Entropy (PE), has statistically significant associations with different behavioral/cognitive measures. We also perform a parallel study using traditional graph-theoretical measures, provided by Brain Connectivity Toolbox, as benchmarks for our study. Our findings indicate that individual's structural connectivity indeed offers reliable predictive power of different behavioral/cognitive measures, including but not limited to fluid intelligence. Our results suggest that structural connectomes provide complementary insights (compared to using functional connectomes) in predicting human intelligence and warrants future studies on human intelligence and/or other behavioral/cognitive measures involving multi-modal approach.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283120/pdf/2239.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9712653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hypergraph Transformers for EHR-based Clinical Predictions.","authors":"Ran Xu, Mohammed K Ali, Joyce C Ho, Carl Yang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Electronic health records (EHR) data contain rich information about patients' health conditions including diagnosis, procedures, medications and etc., which have been widely used to facilitate digital medicine. Despite its importance, it is often non-trivial to learn useful representations for patients' visits that support downstream clinical predictions, as each visit contains massive and diverse medical codes. As a result, the complex interactions among medical codes are often not captured, which leads to substandard predictions. To better model these complex relations, we leverage hypergraphs, which go beyond pairwise relations to jointly learn the representations for visits and medical codes. We also propose to use the self-attention mechanism to automatically identify the most relevant medical codes for each visit based on the downstream clinical predictions with better generalization power. Experiments on two EHR datasets show that our proposed method not only yields superior performance, but also provides reasonable insights towards the target tasks.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283128/pdf/2220.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9866076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shiqiang Tao, Rashmie Abeysinghe, Blanca Talavera De La Esperanza, Samden Lhatoo, Guo-Qiang Zhang, Licong Cui
{"title":"Extracting Temporal Expressions of First Seizure Onset from Epilepsy Patient Discharge Summaries.","authors":"Shiqiang Tao, Rashmie Abeysinghe, Blanca Talavera De La Esperanza, Samden Lhatoo, Guo-Qiang Zhang, Licong Cui","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Early onset of seizure is a potential risk factor for Sudden Unexpected Death in Epilepsy (SUDEP). However, the first seizure onset information is often documented as clinical narratives in epilepsy monitoring unit (EMU) discharge summaries. Manually extracting first seizure onset time from discharge summaries is time consuming and labor-intensive. In this work, we developed a rule-based natural language processing pipeline for automatically extracting the temporal information of patients' first seizure onset from EMU discharge summaries. We use the Epilepsy and Seizure Ontology (EpSO) as the core knowledge resource and construct 4 extraction rules based on 300 randomly selected EMU discharge summaries. To evaluate the effectiveness of the extraction pipeline, we apply the constructed rules on another 200 unseen discharge summaries and compare the results against the manual evaluation of a domain expert. Overall, our extraction pipeline achieved a precision of 0.75, recall of 0.651, and F1-score of 0.697. This is an encouraging initial result which will allow us to gain insights into potentially better-performing approaches.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283149/pdf/2272.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9859585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Detection of Intimate Partner Violence Victims from Social Media for Proactive Delivery of Support.","authors":"Yuting Guo, Sangmi Kim, Elise Warren, Yuan-Chi Yang, Sahithi Lakamana, Abeed Sarker","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Social media platforms are increasingly being used by intimate partner violence (IPV) victims to share experiences and seek support. If such information is automatically curated, it may be possible to conduct social media based surveillance and even design interventions over such platforms. In this paper, we describe the development of a supervised classification system that automatically characterizes IPV-related posts on the social network Reddit. We collected data from four IPV-related subreddits and manually annotated the data to indicate whether a post is a self-report of IPV or not. Using the annotated data (N=289), we trained, evaluated, and compared supervised machine learning systems. A transformer-based classifier, RoBERTa, obtained the best classification performance with overall accuracy of 78% and IPV-self-report class 𝐹<sub>1</sub> -score of 0.67. Post-classification error analyses revealed that misclassifications often occur for posts that are very long or are non-first-person reports of IPV. Despite the relatively small annotated data, our classification methods obtained promising results, indicating that it may be possible to detect and, hence, provide support to IPV victims over Reddit.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283132/pdf/2018.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9767214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}