ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine最新文献

筛选
英文 中文
Causality-based Subject and Task Fingerprints using fMRI Time-series Data. 使用fMRI时间序列数据的基于因果关系的主题和任务指纹。
Dachuan Song, Li Shen, Duy Duong-Tran, Xuan Wang
{"title":"Causality-based Subject and Task Fingerprints using fMRI Time-series Data.","authors":"Dachuan Song, Li Shen, Duy Duong-Tran, Xuan Wang","doi":"10.1145/3698587.3701342","DOIUrl":"10.1145/3698587.3701342","url":null,"abstract":"<p><p>Recently, there has been a revived interest in system neuroscience causation models due to their unique capability to unravel complex relationships in multi-scale brain networks. In this paper, our goal is to verify the feasibility and effectiveness of using a causality-based approach for fMRI fingerprinting. Specifically, we propose an innovative method that utilizes the causal dynamics activities of the brain to identify the unique cognitive patterns of individuals (e.g., subject fingerprint) and fMRI tasks (e.g., task fingerprint). The key novelty of our approach stems from the development of a two-timescale linear state-space model to extract 'spatio-temporal' (aka causal) signatures from an individual's fMRI time series data. To the best of our knowledge, we pioneer and subsequently quantify, in this paper, the concept of 'causal fingerprint.' Our method is well-separated from other fingerprint studies as we quantify fingerprints from a cause-and-effect perspective, which are then incorporated with a modal decomposition and projection method to perform subject identification and a GNN-based (Graph Neural Network) model to perform task identification. Finally, we show that the experimental results and comparisons with non-causality-based methods demonstrate the effectiveness of the proposed methods. We visualize the obtained causal signatures and discuss their biological relevance in light of the existing understanding of brain functionalities. Collectively, our work paves the way for further studies on causal fingerprints with potential applications in both healthy controls and neurodegenerative diseases.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2024 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11786950/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAPTURE: A Clustered Adaptive Patchwork Technique for Unified Registration Enhancement in Biological Imaging. 捕获:生物成像中统一配准增强的聚类自适应拼接技术。
Sahand Hamzehei, Gianna Raimondi, Mostafa Karami, Linnaea Ostroff, Sheida Nabavi
{"title":"CAPTURE: A Clustered Adaptive Patchwork Technique for Unified Registration Enhancement in Biological Imaging.","authors":"Sahand Hamzehei, Gianna Raimondi, Mostafa Karami, Linnaea Ostroff, Sheida Nabavi","doi":"10.1145/3698587.3701369","DOIUrl":"10.1145/3698587.3701369","url":null,"abstract":"<p><p>Image registration is important in biological image analysis; however, it is often challenged by distortions and non-linear transformations. In this paper, we present a novel patch-wise image registration method to address the mentioned issues. Our method begins with global registration to correct linear transformations, followed by a detailed examination of geometrical distortions. After that, each image is adaptively divided into patches to isolate and correct non-linear distortions, followed by reconstruction and combining patches using Otsu thresholding. We evaluated our method against state-of-the-art techniques using mutual information (MI), phase congruency-based (PCB), and gradient-based metrics (GBM) across four real biology datasets. Our results demonstrate superior feature alignment and image coherence, especially in serial-stack registrations. While the proposed method has longer processing times compared to linear registration methods, its enhanced accuracy and reliability to handle non-uniform distortion makes it beneficial for precision-demanding applications. We have created a public GitHub repository containing the code used in our research, available at https://github.com/NabaviLab/CAPTURE.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2024 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12123223/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144200953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Group Tensor Canonical Correlation Analysis. 多群张量典型相关分析。
Zhuoping Zhou, Boning Tong, Davoud Ataee Tarzanagh, Bojian Hou, Andrew J Saykin, Qi Long, Li Shen
{"title":"Multi-Group Tensor Canonical Correlation Analysis.","authors":"Zhuoping Zhou, Boning Tong, Davoud Ataee Tarzanagh, Bojian Hou, Andrew J Saykin, Qi Long, Li Shen","doi":"10.1145/3584371.3612962","DOIUrl":"10.1145/3584371.3612962","url":null,"abstract":"<p><p>Tensor Canonical Correlation Analysis (TCCA) is a commonly employed statistical method utilized to examine linear associations between two sets of tensor datasets. However, the existing TCCA models fail to adequately address the heterogeneity present in real-world tensor data, such as brain imaging data collected from diverse groups characterized by factors like sex and race. Consequently, these models may yield biased outcomes. In order to surmount this constraint, we propose a novel approach called Multi-Group TCCA (MG-TCCA), which enables the joint analysis of multiple subgroups. By incorporating a dual sparsity structure and a block coordinate ascent algorithm, our MG-TCCA method effectively addresses heterogeneity and leverages information across different groups to identify consistent signals. This novel approach facilitates the quantification of shared and individual structures, reduces data dimensionality, and enables visual exploration. To empirically validate our approach, we conduct a study focused on investigating correlations between two brain positron emission tomography (PET) modalities (AV-45 and FDG) within an Alzheimer's disease (AD) cohort. Our results demonstrate that MG-TCCA surpasses traditional TCCA in identifying sex-specific cross-modality imaging correlations. This heightened performance of MG-TCCA provides valuable insights for the characterization of multimodal imaging biomarkers in AD.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2023 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10593155/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50159453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Supervised Pretraining through Contrastive Categorical Positive Samplings to Improve COVID-19 Mortality Prediction. 通过对比分类阳性样本进行监督预训练以提高COVID-19死亡率预测。
Tingyi Wanyan, Mingquan Lin, Eyal Klang, Kartikeya M Menon, Faris F Gulamali, Ariful Azad, Yiye Zhang, Ying Ding, Zhangyang Wang, Fei Wang, Benjamin Glicksberg, Yifan Peng
{"title":"Supervised Pretraining through Contrastive Categorical Positive Samplings to Improve COVID-19 Mortality Prediction.","authors":"Tingyi Wanyan,&nbsp;Mingquan Lin,&nbsp;Eyal Klang,&nbsp;Kartikeya M Menon,&nbsp;Faris F Gulamali,&nbsp;Ariful Azad,&nbsp;Yiye Zhang,&nbsp;Ying Ding,&nbsp;Zhangyang Wang,&nbsp;Fei Wang,&nbsp;Benjamin Glicksberg,&nbsp;Yifan Peng","doi":"10.1145/3535508.3545541","DOIUrl":"https://doi.org/10.1145/3535508.3545541","url":null,"abstract":"<p><p>Clinical EHR data is naturally heterogeneous, where it contains abundant sub-phenotype. Such diversity creates challenges for outcome prediction using a machine learning model since it leads to high intra-class variance. To address this issue, we propose a supervised pre-training model with a unique embedded k-nearest-neighbor positive sampling strategy. We demonstrate the enhanced performance value of this framework theoretically and show that it yields highly competitive experimental results in predicting patient mortality in real-world COVID-19 EHR data with a total of over 7,000 patients admitted to a large, urban health system. Our method achieves a better AUROC prediction score of 0.872, which outperforms the alternative pre-training models and traditional machine learning methods. Additionally, our method performs much better when the training data size is small (345 training instances).</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2022 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9365529/pdf/nihms-1827823.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40609301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Segmenting Thoracic Cavities with Neoplastic Lesions: A Head-to-head Benchmark with Fully Convolutional Neural Networks. 胸腔肿瘤病灶分割:全卷积神经网络的头对头基准。
Zhao Li, Rongbin Li, Kendall J Kiser, Luca Giancardo, W Jim Zheng
{"title":"Segmenting Thoracic Cavities with Neoplastic Lesions: A Head-to-head Benchmark with Fully Convolutional Neural Networks.","authors":"Zhao Li,&nbsp;Rongbin Li,&nbsp;Kendall J Kiser,&nbsp;Luca Giancardo,&nbsp;W Jim Zheng","doi":"10.1145/3459930.3469564","DOIUrl":"https://doi.org/10.1145/3459930.3469564","url":null,"abstract":"<p><p>Automatic segmentation of thoracic cavity structures in computer tomography (CT) is a key step for applications ranging from radiotherapy planning to imaging biomarker discovery with radiomics approaches. State-of-the-art segmentation can be provided by fully convolutional neural networks such as the U-Net or V-Net. However, there is a very limited body of work on a comparative analysis of the performance of these architectures for chest CTs with significant neoplastic disease. In this work, we compared four different types of fully convolutional architectures using the same pre-processing and post-processing pipelines. These methods were evaluated using a dataset of CT images and thoracic cavity segmentations from 402 cancer patients. We found that these methods achieved very high segmentation performance by benchmarks of three evaluation criteria, i.e. Dice coefficient, average symmetric surface distance and 95% Hausdorff distance. Overall, the two-stage 3D U-Net model performed slightly better than other models, with Dice coefficients for left and right lung reaching 0.947 and 0.952, respectively. However, 3D U-Net model achieved the best performance under the evaluation of HD95 for right lung and ASSD for both left and right lung. These results demonstrate that the current state-of-art deep learning models can work very well for segmenting not only healthy lungs but also the lung containing different stages of cancerous lesions. The comprehensive types of lung masks from these evaluated methods enabled the creation of imaging-based biomarkers representing both healthy lung parenchyma and neoplastic lesions, allowing us to utilize these segmented areas for the downstream analysis, e.g. treatment planning, prognosis and survival prediction.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3459930.3469564","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40323973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assigning ICD-O-3 Codes to Pathology Reports using Neural Multi-Task Training with Hierarchical Regularization. 使用具有层次规则化的神经多任务训练将ICD-O-3代码分配给病理学报告。
Anthony Rios, Eric B Durbin, Isaac Hands, Ramakanth Kavuluru
{"title":"Assigning ICD-O-3 Codes to Pathology Reports using Neural Multi-Task Training with Hierarchical Regularization.","authors":"Anthony Rios,&nbsp;Eric B Durbin,&nbsp;Isaac Hands,&nbsp;Ramakanth Kavuluru","doi":"10.1145/3459930.3469541","DOIUrl":"10.1145/3459930.3469541","url":null,"abstract":"<p><p>Tracking population-level cancer information is essential for researchers, clinicians, policymakers, and the public. Unfortunately, much of the information is stored as unstructured data in pathology reports. Thus, too process the information, we require either automated extraction techniques or manual curation. Moreover, many of the cancer-related concepts appear infrequently in real-world training datasets. Automated extraction is difficult because of the limited data. This study introduces a novel technique that incorporates structured expert knowledge to improve histology and topography code classification models. Using pathology reports collected from the Kentucky Cancer Registry, we introduce a novel multi-task training approach with hierarchical regularization that incorporates structured information about the International Classification of Diseases for Oncology, 3rd Edition classes to improve predictive performance. Overall, we find that our method improves both micro and macro F1. For macro F1, we achieve up to a 6% absolute improvement for topography codes and up to 4% absolute improvement for histology codes.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2021 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3459930.3469541","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39453028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Fast and memory-efficient scRNA-seq k-means clustering with various distances. 快速和高效的scRNA-seq - k-means聚类与不同的距离。
Daniel N Baker, Nathan Dyjack, Vladimir Braverman, Stephanie C Hicks, Ben Langmead
{"title":"Fast and memory-efficient scRNA-seq <i>k</i>-means clustering with various distances.","authors":"Daniel N Baker,&nbsp;Nathan Dyjack,&nbsp;Vladimir Braverman,&nbsp;Stephanie C Hicks,&nbsp;Ben Langmead","doi":"10.1145/3459930.3469523","DOIUrl":"10.1145/3459930.3469523","url":null,"abstract":"Single-cell RNA-sequencing (scRNA-seq) analyses typically begin by clustering a gene-by-cell expression matrix to empirically define groups of cells with similar expression profiles. We describe new methods and a new open source library, minicore, for efficient k-means++ center finding and k-means clustering of scRNA-seq data. Minicore works with sparse count data, as it emerges from typical scRNA-seq experiments, as well as with dense data from after dimensionality reduction. Minicore's novel vectorized weighted reservoir sampling algorithm allows it to find initial k-means++ centers for a 4-million cell dataset in 1.5 minutes using 20 threads. Minicore can cluster using Euclidean distance, but also supports a wider class of measures like Jensen-Shannon Divergence, Kullback-Leibler Divergence, and the Bhattacharyya distance, which can be directly applied to count data and probability distributions. Further, minicore produces lower-cost centerings more efficiently than scikit-learn for scRNA-seq datasets with millions of cells. With careful handling of priors, minicore implements these distance measures with only minor (<2-fold) speed differences among all distances. We show that a minicore pipeline consisting of k-means++, localsearch++ and mini-batch k-means can cluster a 4-million cell dataset in minutes, using less than 10GiB of RAM. This memory-efficiency enables atlas-scale clustering on laptops and other commodity hardware. Finally, we report findings on which distance measures give clusterings that are most consistent with known cell type labels. Availability: The open source library is at https://github.com/dnbaker/minicore. Code used for experiments is at https://github.com/dnbaker/minicore-experiments.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2021 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8586878/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39733090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Concurrent Imputation and Prediction on EHR data using Bi-Directional GANs: Bi-GANs for EHR imputation and prediction. 使用双向 GANs 对电子病历数据进行同步估算和预测:用于电子病历估算和预测的双向 GANs。
Mehak Gupta, H Timothy Bunnell, Thao-Ly T Phan, Rahmatollah Beheshti
{"title":"Concurrent Imputation and Prediction on EHR data using Bi-Directional GANs: Bi-GANs for EHR imputation and prediction.","authors":"Mehak Gupta, H Timothy Bunnell, Thao-Ly T Phan, Rahmatollah Beheshti","doi":"10.1145/3459930.3469512","DOIUrl":"10.1145/3459930.3469512","url":null,"abstract":"<p><p>Working with electronic health records (EHRs) is known to be challenging due to several reasons. These reasons include not having: 1) similar lengths (per visit), 2) the same number of observations (per patient), and 3) complete entries in the available records. These issues hinder the performance of the predictive models created using EHRs. In this paper, we approach these issues by presenting a model for the combined task of imputing and predicting values for the irregularly observed and varying length EHR data with missing entries. Our proposed model (dubbed as Bi-GAN) uses a bidirectional recurrent network in a generative adversarial setting. In this architecture, the generator is a bidirectional recurrent network that receives the EHR data and imputes the existing missing values. The discriminator attempts to discriminate between the actual and the imputed values generated by the generator. Using the input data in its entirety, Bi-GAN learns how to impute missing elements in-between (imputation) or outside of the input time steps (prediction). Our method has three advantages to the state-of-the-art methods in the field: (a) one single model performs both the imputation and prediction tasks; (b) the model can perform predictions using time-series of varying length with missing data; (c) it does not require to know the observation and prediction time window during training and can be used for the predictions with different observation and prediction window lengths, for short- and long-term predictions. We evaluate our model on two large EHR datasets to impute and predict body mass index (BMI) values and show its superior performance in both settings.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2021 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8482531/pdf/nihms-1740754.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39483618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint Learning for Biomedical NER and Entity Normalization: Encoding Schemes, Counterfactual Examples, and Zero-Shot Evaluation. 生物医学NER和实体归一化的联合学习:编码方案,反事实示例和零射击评估。
Jiho Noh, Ramakanth Kavuluru
{"title":"Joint Learning for Biomedical NER and Entity Normalization: Encoding Schemes, Counterfactual Examples, and Zero-Shot Evaluation.","authors":"Jiho Noh,&nbsp;Ramakanth Kavuluru","doi":"10.1145/3459930.3469533","DOIUrl":"https://doi.org/10.1145/3459930.3469533","url":null,"abstract":"<p><p>Named entity recognition (NER) and normalization (EN) form an indispensable first step to many biomedical natural language processing applications. In biomedical information science, recognizing entities (e.g., genes, diseases, or drugs) and normalizing them to concepts in standard terminologies or thesauri (e.g., Entrez, ICD-10, or RxNorm) is crucial for identifying more informative relations among them that drive disease etiology, progression, and treatment. In this effort we pursue two high level strategies to improve biomedical ER and EN. The first is to decouple standard entity encoding tags (e.g., \"B-Drug\" for the beginning of a drug) into type tags (e.g., \"Drug\") and positional tags (e.g., \"B\"). A second strategy is to use additional counterfactual training examples to handle the issue of models learning spurious correlations between surrounding context and normalized concepts in training data. We conduct elaborate experiments using the MedMentions dataset, the largest dataset of its kind for ER and EN in biomedicine. We find that our first strategy performs better in entity normalization when compared with the standard coding scheme. The second data augmentation strategy uniformly improves performance in span detection, typing, and normalization. The gains from counterfactual examples are more prominent when evaluating in zero-shot settings, for concepts that have never been encountered during training.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2021 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3459930.3469533","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39402820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Transformer-Based Named Entity Recognition for Parsing Clinical Trial Eligibility Criteria. 基于变压器的命名实体识别分析临床试验资格标准。
Shubo Tian, Arslan Erdengasileng, Xi Yang, Yi Guo, Yonghui Wu, Jinfeng Zhang, Jiang Bian, Zhe He
{"title":"Transformer-Based Named Entity Recognition for Parsing Clinical Trial Eligibility Criteria.","authors":"Shubo Tian,&nbsp;Arslan Erdengasileng,&nbsp;Xi Yang,&nbsp;Yi Guo,&nbsp;Yonghui Wu,&nbsp;Jinfeng Zhang,&nbsp;Jiang Bian,&nbsp;Zhe He","doi":"10.1145/3459930.3469560","DOIUrl":"https://doi.org/10.1145/3459930.3469560","url":null,"abstract":"<p><p>The rapid adoption of electronic health records (EHRs) systems has made clinical data available in electronic format for research and for many downstream applications. Electronic screening of potentially eligible patients using these clinical databases for clinical trials is a critical need to improve trial recruitment efficiency. Nevertheless, manually translating free-text eligibility criteria into database queries is labor intensive and inefficient. To facilitate automated screening, free-text eligibility criteria must be structured and coded into a computable format using controlled vocabularies. Named entity recognition (NER) is thus an important first step. In this study, we evaluate 4 state-of-the-art transformer-based NER models on two publicly available annotated corpora of eligibility criteria released by Columbia University (i.e., the Chia data) and Facebook Research (i.e.the FRD data). Four transformer-based models (i.e., BERT, ALBERT, RoBERTa, and ELECTRA) pretrained with general English domain corpora vs. those pretrained with PubMed citations, clinical notes from the MIMIC-III dataset and eligibility criteria extracted from all the clinical trials on ClinicalTrials.gov were compared. Experimental results show that RoBERTa pretrained with MIMIC-III clinical notes and eligibility criteria yielded the highest strict and relaxed F-scores in both the Chia data (i.e., 0.658/0.798) and the FRD data (i.e., 0.785/0.916). With promising NER results, further investigations on building a reliable natural language processing (NLP)-assisted pipeline for automated electronic screening are needed.</p>","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2021 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3459930.3469560","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39328500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信