Bioinformatics (Oxford, England)最新文献

筛选
英文 中文
AltGosling: Automatic Generation of Text Descriptions for Accessible Genomics Data Visualization. AltGosling:为可访问的基因组学数据可视化自动生成文本描述。
Bioinformatics (Oxford, England) Pub Date : 2024-11-14 DOI: 10.1093/bioinformatics/btae670
Thomas C Smits, Sehi L'Yi, Andrew P Mar, Nils Gehlenborg
{"title":"AltGosling: Automatic Generation of Text Descriptions for Accessible Genomics Data Visualization.","authors":"Thomas C Smits, Sehi L'Yi, Andrew P Mar, Nils Gehlenborg","doi":"10.1093/bioinformatics/btae670","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae670","url":null,"abstract":"<p><strong>Motivation: </strong>Biomedical visualizations are key to accessing biomedical knowledge and detecting new patterns in large datasets. Interactive visualizations are essential for biomedical data scientists and are omnipresent in data analysis software and data portals. Without appropriate descriptions, these visualizations are not accessible to all people with blindness and low vision, who often rely on screen reader accessibility technologies to access visual information on digital devices. Screen readers require descriptions to convey image content. However, many images lack informative descriptions due to unawareness and difficulty writing such descriptions. Describing complex and interactive visualizations, like genomics data visualizations, is even more challenging. Automatic generation of descriptions could be beneficial, yet current alt text generating models are limited to basic visualizations and cannot be used for genomics.</p><p><strong>Results: </strong>We present AltGosling, an automated description generation tool focused on interactive data visualizations of genome-mapped data, created with the grammar-based genomics toolkit Gosling. The logic-based algorithm of AltGosling creates various descriptions including a tree-structured navigable panel. We co-designed AltGosling with a blind screen reader user (co-author). We show that AltGosling outperforms state-of-the-art large language models and common image-based neural networks for alt text generation of genomics data visualizations. As a first of its kind in genomic research, we lay the groundwork to increase accessibility in the field.</p><p><strong>Availability and implementation: </strong>The source code, examples, and interactive demo are accessible under the MIT License at https://github.com/gosling-lang/altgosling. The package is available at https://www.npmjs.com/package/altgosling.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FAPM: Functional Annotation of Proteins using Multi-Modal Models Beyond Structural Modeling. FAPM:超越结构建模的多模式蛋白质功能注释。
Bioinformatics (Oxford, England) Pub Date : 2024-11-14 DOI: 10.1093/bioinformatics/btae680
Wenkai Xiang, Zhaoping Xiong, Huan Chen, Jiacheng Xiong, Wei Zhang, Zunyun Fu, Mingyue Zheng, Bing Liu, Qian Shi
{"title":"FAPM: Functional Annotation of Proteins using Multi-Modal Models Beyond Structural Modeling.","authors":"Wenkai Xiang, Zhaoping Xiong, Huan Chen, Jiacheng Xiong, Wei Zhang, Zunyun Fu, Mingyue Zheng, Bing Liu, Qian Shi","doi":"10.1093/bioinformatics/btae680","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae680","url":null,"abstract":"<p><strong>Motivation: </strong>Assigning accurate property labels to proteins, like functional terms and catalytic activity, is challenging, especially for proteins without homologs and \"tail labels\" with few known examples. Previous methods mainly focused on protein sequence features, overlooking the semantic meaning of protein labels.</p><p><strong>Results: </strong>We introduce FAPM, a contrastive multi-modal model that links natural language with protein sequence language. This model combines a pretrained protein sequence model with a pretrained large language model to generate labels, such as Gene Ontology (GO) functional terms and catalytic activity predictions, in natural language. Our results show that FAPM excels in understanding protein properties, outperforming models based solely on protein sequences or structures. It achieves state-of-the-art performance on public benchmarks and in-house experimentally annotated phage proteins, which often have few known homologs. Additionally, FAPM's flexibility allows it to incorporate extra text prompts, like taxonomy information, enhancing both its predictive performance and explainability. This novel approach offers a promising alternative to current methods that rely on multiple sequence alignment for protein annotation. The online demo is at: https://huggingface.co/spaces/wenkai/FAPM_demo.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting the subcellular location of prokaryotic proteins with DeepLocPro. 利用 DeepLocPro 预测原核生物蛋白质的亚细胞位置。
Bioinformatics (Oxford, England) Pub Date : 2024-11-14 DOI: 10.1093/bioinformatics/btae677
Jaime Moreno, Henrik Nielsen, Ole Winther, Felix Teufel
{"title":"Predicting the subcellular location of prokaryotic proteins with DeepLocPro.","authors":"Jaime Moreno, Henrik Nielsen, Ole Winther, Felix Teufel","doi":"10.1093/bioinformatics/btae677","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae677","url":null,"abstract":"<p><strong>Motivation: </strong>Protein subcellular location prediction is a widely explored task in bioinformatics because of its importance in proteomics research. We propose DeepLocPro, an extension to the popular method DeepLoc, tailored specifically to archaeal and bacterial organisms.</p><p><strong>Results: </strong>DeepLocPro is a multiclass subcellular location prediction tool for prokaryotic proteins, trained on experimentally verified data curated from UniProt and PSORTdb. DeepLocPro compares favorably to the PSORTb 3.0 ensemble method, surpassing its performance across multiple metrics in our benchmark experiment.</p><p><strong>Availability: </strong>The DeepLocPro prediction tool is available online at https://ku.biolib.com/deeplocpro and https://services.healthtech.dtu.dk/services/DeepLocPro-1.0/.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepRSMA: a cross-fusion based deep learning method for RNA-small molecule binding affinity prediction. DeepRSMA:一种基于交叉融合的深度学习方法,用于 RNA-小分子结合亲和力预测。
Bioinformatics (Oxford, England) Pub Date : 2024-11-14 DOI: 10.1093/bioinformatics/btae678
Zhijian Huang, Yucheng Wang, Song Chen, Yaw Sing Tan, Lei Deng, Min Wu
{"title":"DeepRSMA: a cross-fusion based deep learning method for RNA-small molecule binding affinity prediction.","authors":"Zhijian Huang, Yucheng Wang, Song Chen, Yaw Sing Tan, Lei Deng, Min Wu","doi":"10.1093/bioinformatics/btae678","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae678","url":null,"abstract":"<p><strong>Motivation: </strong>RNA is implicated in numerous aberrant cellular functions and disease progressions, highlighting the crucial importance of RNA-targeted drugs. To accelerate the discovery of such drugs, it is essential to develop an effective computational method for predicting RNA-small molecule affinity (RSMA). Recently, deep learning based computational methods have been promising due to their powerful nonlinear modeling ability. However, the leveraging of advanced deep learning methods to mine the diverse information of RNAs, small molecules and their interaction still remains a great challenge.</p><p><strong>Results: </strong>In this study, we present DeepRSMA, an innovative cross-attention-based deep learning method for RSMA prediction. To effectively capture fine-grained features from RNA and small molecules, we developed nucleotide-level and atomic-level feature extraction modules for RNA and small molecules, respectively. Additionally, we incorporated both sequence and graph views into these modules to capture features from multiple perspectives. Moreover, a Transformer-based cross-fusion module is introduced to learn the general patterns of interactions between RNAs and small molecules. To achieve effective RSMA prediction, we integrated the RNA and small molecule representations from the feature extraction and cross-fusion modules. Our results show that DeepRSMA outperforms baseline methods in multiple test settings. The interpretability analysis and the case study on spinal muscular atrophy (SMA) demonstrate that DeepRSMA has the potential to guide RNA-targeted drug design.</p><p><strong>Availability: </strong>The codes and data are publicly available at https://github.com/Hhhzj-7/DeepRSMA.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FEHAT: Efficient, Large scale and Automated Heartbeat Detection in Medaka Fish Embryos. FEHAT:青鳉鱼胚胎中的高效、大规模自动心跳检测。
Bioinformatics (Oxford, England) Pub Date : 2024-11-07 DOI: 10.1093/bioinformatics/btae664
Marcio Soares Ferreira, Sebastian Stricker, Tomas Fitzgerald, Jack Monahan, Fanny Defranoux, Philip Watson, Bettina Welz, Omar Hammouda, Joachim Wittbrodt, Ewan Birney
{"title":"FEHAT: Efficient, Large scale and Automated Heartbeat Detection in Medaka Fish Embryos.","authors":"Marcio Soares Ferreira, Sebastian Stricker, Tomas Fitzgerald, Jack Monahan, Fanny Defranoux, Philip Watson, Bettina Welz, Omar Hammouda, Joachim Wittbrodt, Ewan Birney","doi":"10.1093/bioinformatics/btae664","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae664","url":null,"abstract":"<p><p>High resolution imaging of model organisms allows the quantification of important physiological measurements. In the case of fish with transparent embryos, these videos can visualise key physiological processes, such as heartbeat. High throughput systems can provide enough measurements for the robust investigation of developmental processes as well as the impact of system perturbations on physiological state. However, few analytical schemes have been designed to handle thousands of high-resolution videos without the need for some level of human intervention. We developed a software package, named FEHAT, to provide a fully automated solution for the analytics of large numbers of heart rate imaging datasets obtained from developing Medaka fish embryos in 96 well plate format imaged on an Acquifer machine. FEHAT uses image segmentation to define regions of the embryo showing changes in pixel intensity over time, followed by the classification of the most likely position of the heart and Fourier Transformations to estimate the heart rate. Here we describe some important features of the FEHAT software, showcasing its performance across a large set of medaka fish embryos and compare its performance to established, less automated solutions. FEHAT provides reliable heart rate estimates across a range of temperature-based perturbations and can be applied to tens of thousands of embryos without the need for any human intervention.</p><p><strong>Availability: </strong>Data used in this manuscript will be made available on request.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142607234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ranking Antibody Binding Epitopes and Proteins Across Samples from Whole Proteome Tiled Linear Peptides. 从全蛋白质组平铺线性肽对样本中的抗体结合表位和蛋白质进行排序。
Bioinformatics (Oxford, England) Pub Date : 2024-11-05 DOI: 10.1093/bioinformatics/btae637
Sean J McIlwain, Anna Hoefges, Amy K Erbe, Paul M Sondel, Irene M Ong
{"title":"Ranking Antibody Binding Epitopes and Proteins Across Samples from Whole Proteome Tiled Linear Peptides.","authors":"Sean J McIlwain, Anna Hoefges, Amy K Erbe, Paul M Sondel, Irene M Ong","doi":"10.1093/bioinformatics/btae637","DOIUrl":"10.1093/bioinformatics/btae637","url":null,"abstract":"<p><strong>Introduction: </strong>Ultradense peptide binding arrays that can probe millions of linear peptides comprising the entire proteomes of human or mouse, or hundreds of thousands of microbes, are powerful tools for studying the antibody repertoire in serum samples to understand adaptive immune responses.</p><p><strong>Motivation: </strong>There are few tools for exploring high-dimensional, significant and reproducible antibody targets for ultradense peptide binding arrays at the linear peptide, epitope (grouping of adjacent peptides), and protein level across multiple samples/subjects (i.e. epitope spread or immunogenic regions of proteins) for understanding the heterogeneity of immune responses.</p><p><strong>Results: </strong>We developed HERON (Hierarchical antibody binding Epitopes and pROteins from liNear peptides), an R package, which identifies immunogenic epitopes, using meta-analyses and spatial clustering techniques to explore antibody targets at various resolution and confidence levels, that can be found consistently across a specified number of samples through the entire proteome to study antibody responses for diagnostics or treatment. Our approach estimates significance values at the linear peptide (probe), epitope, and protein level to identify top candidates for validation. We test the performance of predictions on all three levels using correlation between technical replicates and comparison of epitope calls on two datasets, which shows HERON's competitiveness in estimating false discovery rates and finding general and sample-level regions of interest for antibody binding.</p><p><strong>Availability: </strong>The HERON R package is available at Bioconductor https://bioconductor.org/packages/release/bioc/html/HERON.html.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142585164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Afpdb - an efficient structure manipulation package for AI protein design. Afpdb - 用于人工智能蛋白质设计的高效结构操作软件包。
Bioinformatics (Oxford, England) Pub Date : 2024-11-05 DOI: 10.1093/bioinformatics/btae654
Yingyao Zhou, Jiayi Cox, Bin Zhou, Steven Zhu, Yang Zhong, Glen Spraggon
{"title":"Afpdb - an efficient structure manipulation package for AI protein design.","authors":"Yingyao Zhou, Jiayi Cox, Bin Zhou, Steven Zhu, Yang Zhong, Glen Spraggon","doi":"10.1093/bioinformatics/btae654","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae654","url":null,"abstract":"<p><strong>Motivation: </strong>The advent of AlphaFold and other protein Artificial Intelligence (AI) models has transformed protein design, necessitating efficient handling of large-scale data and complex workflows. Using existing programming packages that predate recent AI advancements often leads to inefficiencies in human coding and slow code execution. To address this gap, we developed the Afpdb package.</p><p><strong>Results: </strong>Afpdb, built on AlphaFold's NumPy architecture, offers a high-performance core. It uses RFDiffusion's contig syntax to streamline residue and atom selection, making coding simpler and more readable. Integrating PyMOL's visualization capabilities, Afpdb allows automatic visual quality control. With over 180 methods commonly used in protein AI design, which are otherwise hard to find, Afpdb enhances productivity in structural biology by supporting the development of concise, high-performance code.</p><p><strong>Availability: </strong>Code and documentation are available on GitHub (https://github.com/data2code/afpdb) and PyPI (https://pypi.org/project/afpdb). An interactive tutorial is accessible through Google Colab.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142585154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
findGSEP: estimating genome size of polyploid species using k-mer frequencies. findGSEP:利用 k-mer 频率估算多倍体物种的基因组大小。
Bioinformatics (Oxford, England) Pub Date : 2024-11-01 DOI: 10.1093/bioinformatics/btae647
Laiyi Fu, Yanxin Xie, Shunkang Ling, Ying Wang, Binzhong Wang, Hejun Du, Qinke Peng, Hequan Sun
{"title":"findGSEP: estimating genome size of polyploid species using k-mer frequencies.","authors":"Laiyi Fu, Yanxin Xie, Shunkang Ling, Ying Wang, Binzhong Wang, Hejun Du, Qinke Peng, Hequan Sun","doi":"10.1093/bioinformatics/btae647","DOIUrl":"10.1093/bioinformatics/btae647","url":null,"abstract":"<p><strong>Summary: </strong>Estimating genome size using k-mer frequencies, which plays a fundamental role in designing genome sequencing and analysis projects, has remained challenging for polyploid species, i.e., ploidy p > 2. To address this, we introduce \"findGSEP,\" which is designed based on iterative curve fitting of k-mer frequencies. Precisely, it first disentangles up to p normal distributions by analyzing k-mer frequencies in whole genome sequencing of the focal species. Second, it computes the sizes of genomic regions related to 1∼p (homologous) chromosome(s) using each respective curve fitting, from which it infers the full polyploid and average haploid genome size. \"findGSEP\" can handle any level of ploidy p, and infer more accurate genome size than other well-known tools, as shown by tests using simulated and real genomic sequencing data of various species including octoploids.</p><p><strong>Availability and implementation: </strong>\"findGSEP\" was implemented as a web server, which is freely available at http://146.56.237.198:3838/findGSEP/. Also, \"findGSEP\" was implemented as an R package for parallel processing of multiple samples. Source code and tutorial on its installation and usage is available at https://github.com/sperfu/findGSEP.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11552620/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ADAPT: Analysis of Microbiome Differential Abundance by Pooling Tobit Models. ADAPT:通过汇集 Tobit 模型分析微生物组的丰度差异。
Bioinformatics (Oxford, England) Pub Date : 2024-11-01 DOI: 10.1093/bioinformatics/btae661
Mukai Wang, Simon Fontaine, Hui Jiang, Gen Li
{"title":"ADAPT: Analysis of Microbiome Differential Abundance by Pooling Tobit Models.","authors":"Mukai Wang, Simon Fontaine, Hui Jiang, Gen Li","doi":"10.1093/bioinformatics/btae661","DOIUrl":"10.1093/bioinformatics/btae661","url":null,"abstract":"<p><strong>Motivation: </strong>Microbiome differential abundance analysis (DAA) remains a challenging problem despite multiple methods proposed in the literature. The excessive zeros and compositionality of metagenomics data are two main challenges for DAA.</p><p><strong>Results: </strong>We propose a novel method called \"Analysis of Microbiome Differential Abundance by Pooling Tobit Models\" (ADAPT) to overcome these two challenges. ADAPT interprets zero counts as left-censored observations to avoid unfounded assumptions and complex models. ADAPT also encompasses a theoretically justified way of selecting non-differentially abundant microbiome taxa as a reference to reveal differentially abundant taxa while avoiding false discoveries. We generate synthetic data using independent simulation frameworks to show that ADAPT has more consistent false discovery rate control and higher statistical power than competitors. We use ADAPT to analyze 16S rRNA sequencing of saliva samples and shotgun metagenomics sequencing of plaque samples collected from infants in the COHRA2 study. The results provide novel insights into the association between the oral microbiome and early childhood dental caries.</p><p><strong>Availability and implementation: </strong>The R package ADAPT can be installed from Bioconductor at https://bioconductor.org/packages/release/bioc/html/ADAPT.html or from Github at https://github.com/mkbwang/ADAPT. The source codes for simulation studies and real data analysis are available at https://github.com/mkbwang/ADAPT_example.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142607231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PubMed Computed Authors in 2024: an open resource of disambiguated author names in biomedical literature. 2024 年的 PubMed 计算作者:生物医学文献中已消歧作者姓名的开放资源。
Bioinformatics (Oxford, England) Pub Date : 2024-11-01 DOI: 10.1093/bioinformatics/btae672
Shubo Tian, Qingyu Chen, Donald C Comeau, W John Wilbur, Zhiyong Lu
{"title":"PubMed Computed Authors in 2024: an open resource of disambiguated author names in biomedical literature.","authors":"Shubo Tian, Qingyu Chen, Donald C Comeau, W John Wilbur, Zhiyong Lu","doi":"10.1093/bioinformatics/btae672","DOIUrl":"10.1093/bioinformatics/btae672","url":null,"abstract":"<p><strong>Summary: </strong>Over 55% of author names in PubMed are ambiguous: the same name is shared by different individual researchers. This poses significant challenges on precise literature retrieval for author name queries, a common behavior in biomedical literature search. In response, we present a comprehensive dataset of disambiguated authors. Specifically, we complement the automatic PubMed Computed Authors algorithm with the latest ORCID data for improved accuracy. As a result, the enhanced algorithm achieves high performance in author name disambiguation, and subsequently our dataset contains more than 21 million disambiguated authors for over 35 million PubMed articles and is incrementally updated on a weekly basis. More importantly, we make the dataset publicly available for the community such that it can be utilized in a wide variety of potential applications beyond assisting PubMed's author name queries. Finally, we propose a set of guidelines for best practices of authors pertaining to use of their names.</p><p><strong>Availability and implementation: </strong>The PubMed Computed Authors dataset is publicly available for bulk download at: https://ftp.ncbi.nlm.nih.gov/pub/lu/ComputedAuthors/. Additionally, it is available for query through web API at: https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/authors/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11588201/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信