Bioinformatics advances最新文献

筛选
英文 中文
Investigating alignment-free machine learning methods for HIV-1 subtype classification. 研究用于 HIV-1 亚型分类的无对齐机器学习方法。
IF 2.4
Bioinformatics advances Pub Date : 2024-07-29 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae108
Kaitlyn E Wade, Lianghong Chen, Chutong Deng, Gen Zhou, Pingzhao Hu
{"title":"Investigating alignment-free machine learning methods for HIV-1 subtype classification.","authors":"Kaitlyn E Wade, Lianghong Chen, Chutong Deng, Gen Zhou, Pingzhao Hu","doi":"10.1093/bioadv/vbae108","DOIUrl":"10.1093/bioadv/vbae108","url":null,"abstract":"<p><strong>Motivation: </strong>Many viruses are organized into taxonomies of subtypes based on their genetic similarities. For human immunodeficiency virus 1 (HIV-1), subtype classification plays a crucial role in infection management. Sequence alignment-based methods for subtype classification are impractical for large datasets because they are costly and time-consuming. Alignment-free methods involve creating numerical representations for genetic sequences and applying statistical or machine learning methods. Despite their high overall accuracy, existing models perform poorly on less common subtypes. Furthermore, there is limited work investigating the impact of sequence vectorization methods, in particular natural language-inspired embedding methods, on HIV-1 subtype classification.</p><p><strong>Results: </strong>We present a comprehensive analysis of sequence vectorization methods across machine learning methods. We report a <i>k</i>-mer-based XGBoost model with a balanced accuracy of 0.84, indicating that it has good overall performance for both common and uncommon HIV-1 subtypes. We also report a Word2Vec-based support vector machine that achieves promising results on precision and balanced accuracy. Our study sheds light on the effect of sequence vectorization methods on HIV-1 subtype classification and suggests that natural language-inspired encoding methods show promise. Our results could help to develop improved HIV-1 subtype classification methods, leading to improved individual patient outcomes, and the development of subtype-specific treatments.</p><p><strong>Availability and implementation: </strong>Source code is available at https://www.github.com/kwade4/HIV_Subtypes.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371153/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142127524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
survivalContour: visualizing predicted survival via colored contour plots. survivalContour:通过彩色等高线图直观显示预测存活率。
IF 2.4
Bioinformatics advances Pub Date : 2024-07-25 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae105
Yushu Shi, Liangliang Zhang, Kim-Anh Do, Robert R Jenq, Christine B Peterson
{"title":"survivalContour: visualizing predicted survival via colored contour plots.","authors":"Yushu Shi, Liangliang Zhang, Kim-Anh Do, Robert R Jenq, Christine B Peterson","doi":"10.1093/bioadv/vbae105","DOIUrl":"10.1093/bioadv/vbae105","url":null,"abstract":"<p><strong>Summary: </strong>Advances in survival analysis have facilitated unprecedented flexibility in data modeling, yet there remains a lack of tools for illustrating the influence of continuous covariates on predicted survival outcomes. We propose the utilization of a colored contour plot to depict the predicted survival probabilities over time. Our approach is capable of supporting conventional models, including the Cox and Fine-Gray models. However, its capability shines when coupled with cutting-edge machine learning models such as random survival forests and deep neural networks.</p><p><strong>Availability and implementation: </strong>We provide a Shiny app at https://biostatistics.mdanderson.org/shinyapps/survivalContour/ and an R package available at https://github.com/YushuShi/survivalContour as implementations of this tool.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11290613/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141861796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genotype imputation in F2 crosses of inbred lines. 近交系 F2 杂交中的基因型估算。
IF 2.4
Bioinformatics advances Pub Date : 2024-07-23 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae107
Saul Pierotti, Bettina Welz, Mireia Osuna-López, Tomas Fitzgerald, Joachim Wittbrodt, Ewan Birney
{"title":"Genotype imputation in F2 crosses of inbred lines.","authors":"Saul Pierotti, Bettina Welz, Mireia Osuna-López, Tomas Fitzgerald, Joachim Wittbrodt, Ewan Birney","doi":"10.1093/bioadv/vbae107","DOIUrl":"10.1093/bioadv/vbae107","url":null,"abstract":"<p><strong>Motivation: </strong>Crosses among inbred lines are a fundamental tool for the discovery of genetic loci associated with phenotypes of interest. In organisms for which large reference panels or SNP chips are not available, imputation from low-pass whole-genome sequencing is an effective method for obtaining genotype data from a large number of individuals. To date, a structured analysis of the conditions required for optimal genotype imputation has not been performed.</p><p><strong>Results: </strong>We report a systematic exploration of the effect of several design variables on imputation performance in F2 crosses of inbred medaka lines using the imputation software STITCH. We determined that, depending on the number of samples, imputation performance reaches a plateau when increasing the per-sample sequencing coverage. We also systematically explored the trade-offs between cost, imputation accuracy, and sample numbers. We developed a computational pipeline to streamline the process, enabling other researchers to perform a similar cost-benefit analysis on their population of interest.</p><p><strong>Availability and implementation: </strong>The source code for the pipeline is available at https://github.com/birneylab/stitchimpute. While our pipeline has been developed and tested for an F2 population, the software can also be used to analyse populations with a different structure.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11286293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining drug-target interactions from biomedical literature using chemical and gene descriptions-based ensemble transformer model. 利用基于化学和基因描述的集合变换器模型从生物医学文献中挖掘药物与靶点的相互作用。
IF 2.4
Bioinformatics advances Pub Date : 2024-07-22 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae106
Jehad Aldahdooh, Ziaurrehman Tanoli, Jing Tang
{"title":"Mining drug-target interactions from biomedical literature using chemical and gene descriptions-based ensemble transformer model.","authors":"Jehad Aldahdooh, Ziaurrehman Tanoli, Jing Tang","doi":"10.1093/bioadv/vbae106","DOIUrl":"10.1093/bioadv/vbae106","url":null,"abstract":"<p><strong>Motivation: </strong>Drug-target interactions (DTIs) play a pivotal role in drug discovery, as it aims to identify potential drug targets and elucidate their mechanism of action. In recent years, the application of natural language processing (NLP), particularly when combined with pre-trained language models, has gained considerable momentum in the biomedical domain, with the potential to mine vast amounts of texts to facilitate the efficient extraction of DTIs from the literature.</p><p><strong>Results: </strong>In this article, we approach the task of DTIs as an entity-relationship extraction problem, utilizing different pre-trained transformer language models, such as BERT, to extract DTIs. Our results indicate that an ensemble approach, by combining gene descriptions from the Entrez Gene database with chemical descriptions from the Comparative Toxicogenomics Database (CTD), is critical for achieving optimal performance. The proposed model achieves an <i>F</i>1 score of 80.6 on the hidden DrugProt test set, which is the top-ranked performance among all the submitted models in the official evaluation. Furthermore, we conduct a comparative analysis to evaluate the effectiveness of various gene textual descriptions sourced from Entrez Gene and UniProt databases to gain insights into their impact on the performance. Our findings highlight the potential of NLP-based text mining using gene and chemical descriptions to improve drug-target extraction tasks.</p><p><strong>Availability and implementation: </strong>Datasets utilized in this study are accessible at https://dtis.drugtargetcommons.org/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11293871/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141876854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Quantitative transcriptomic and epigenomic data analysis: a primer. 更正:定量转录组学和表观基因组学数据分析:入门。
IF 2.4
Bioinformatics advances Pub Date : 2024-07-18 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae091
{"title":"Correction to: Quantitative transcriptomic and epigenomic data analysis: a primer.","authors":"","doi":"10.1093/bioadv/vbae091","DOIUrl":"https://doi.org/10.1093/bioadv/vbae091","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.1093/bioadv/vbae019.].</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11257713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ScyNet: Visualizing interactions in community metabolic models. ScyNet:可视化群落代谢模型中的相互作用
IF 2.4
Bioinformatics advances Pub Date : 2024-07-17 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae104
Michael Predl, Kilian Gandolf, Michael Hofer, Thomas Rattei
{"title":"ScyNet: Visualizing interactions in community metabolic models.","authors":"Michael Predl, Kilian Gandolf, Michael Hofer, Thomas Rattei","doi":"10.1093/bioadv/vbae104","DOIUrl":"10.1093/bioadv/vbae104","url":null,"abstract":"<p><strong>Motivation: </strong>Genome-scale community metabolic models are used to gain mechanistic insights into interactions between community members. However, existing tools for visualizing metabolic models only cater to the needs of single organism models.</p><p><strong>Results: </strong>ScyNet is a Cytoscape app for visualizing community metabolic models, generating networks with reduced complexity by focusing on interactions between community members. ScyNet can incorporate the state of a metabolic model via fluxes or flux ranges, which is shown in a previously published simplified cystic fibrosis airway community model.</p><p><strong>Availability and implementation: </strong>ScyNet is freely available under an MIT licence and can be retrieved via the Cytoscape App Store (apps.cytoscape.org/apps/scynet). The source code is available at Github (github.com/univieCUBE/ScyNet).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11315608/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141918224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TemBERTure: advancing protein thermostability prediction with deep learning and attention mechanisms. TemBERTure:利用深度学习和注意力机制推进蛋白质热稳定性预测。
IF 2.4
Bioinformatics advances Pub Date : 2024-07-13 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae103
Chiara Rodella, Symela Lazaridi, Thomas Lemmin
{"title":"TemBERTure: advancing protein thermostability prediction with deep learning and attention mechanisms.","authors":"Chiara Rodella, Symela Lazaridi, Thomas Lemmin","doi":"10.1093/bioadv/vbae103","DOIUrl":"10.1093/bioadv/vbae103","url":null,"abstract":"<p><strong>Motivation: </strong>Understanding protein thermostability is essential for numerous biotechnological applications, but traditional experimental methods are time-consuming, expensive, and error-prone. Recently, deep learning (DL) techniques from natural language processing (NLP) was extended to the field of biology, since the primary sequence of proteins can be viewed as a string of amino acids that follow a physicochemical grammar.</p><p><strong>Results: </strong>In this study, we developed TemBERTure, a DL framework that predicts thermostability class and melting temperature from protein sequences. Our findings emphasize the importance of data diversity for training robust models, especially by including sequences from a wider range of organisms. Additionally, we suggest using attention scores from Deep Learning models to gain deeper insights into protein thermostability. Analyzing these scores in conjunction with the 3D protein structure can enhance understanding of the complex interactions among amino acid properties, their positioning, and the surrounding microenvironment. By addressing the limitations of current prediction methods and introducing new exploration avenues, this research paves the way for more accurate and informative protein thermostability predictions, ultimately accelerating advancements in protein engineering.</p><p><strong>Availability and implementation: </strong>TemBERTure model and the data are available at: https://github.com/ibmm-unibe-ch/TemBERTure.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11262459/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141749771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
loco-pipe: an automated pipeline for population genomics with low-coverage whole-genome sequencing. loco-pipe:利用低覆盖率全基因组测序进行群体基因组学研究的自动化管道。
IF 2.4
Bioinformatics advances Pub Date : 2024-07-11 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae098
Zehua T Zhou, Gregory L Owens, Wesley A Larson, Runyang Nicolas Lou, Peter H Sudmant
{"title":"loco-pipe: an automated pipeline for population genomics with low-coverage whole-genome sequencing.","authors":"Zehua T Zhou, Gregory L Owens, Wesley A Larson, Runyang Nicolas Lou, Peter H Sudmant","doi":"10.1093/bioadv/vbae098","DOIUrl":"10.1093/bioadv/vbae098","url":null,"abstract":"<p><strong>Summary: </strong>We developed loco-pipe, a Snakemake pipeline that seamlessly streamlines a set of essential population genomic analyses for low-coverage whole genome sequencing (lcWGS) data. loco-pipe is highly automated, easily customizable, massively parallelized, and thus is a valuable tool for both new and experienced users of lcWGS.</p><p><strong>Availability and implementation: </strong>loco-pipe is published under the GPLv3. It is freely available on GitHub (github.com/sudmantlab/loco-pipe) and archived on Zenodo (doi.org/10.5281/zenodo.10425920).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11246161/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141617759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
INCAWrapper: a Python wrapper for INCA for seamless data import, -export, and -processing. INCAWrapper:INCA 的 Python 封装器,用于无缝导入、导出和处理数据。
IF 2.4
Bioinformatics advances Pub Date : 2024-07-04 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae100
Matthias Mattanovich, Viktor Hesselberg-Thomsen, Annette Lien, Dovydas Vaitkus, Victoria Sara Saad, Douglas McCloskey
{"title":"INCAWrapper: a Python wrapper for INCA for seamless data import, -export, and -processing.","authors":"Matthias Mattanovich, Viktor Hesselberg-Thomsen, Annette Lien, Dovydas Vaitkus, Victoria Sara Saad, Douglas McCloskey","doi":"10.1093/bioadv/vbae100","DOIUrl":"10.1093/bioadv/vbae100","url":null,"abstract":"<p><strong>Motivation: </strong>INCA is a powerful tool for metabolic flux analysis, however, import and export of data and results can be tedious and limit the use of INCA in automated workflows.</p><p><strong>Results: </strong>The INCAWrapper enables the use of INCA purely through Python, which allows the use of INCA in common data science workflows.</p><p><strong>Availability and implementation: </strong>The INCAWrapper is implemented in Python and can be found at https://github.com/biosustain/incawrapper. It is freely available under an MIT License. To run INCA, the user needs their own MATLAB and INCA licenses. INCA is freely available for noncommercial use at mfa.vueinnovations.com.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11245311/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141617758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A customizable secure DIY web application for accessing, sharing, and browsing aggregate experimental results and metadata. 可定制的安全 DIY 网络应用程序,用于访问、共享和浏览综合实验结果和元数据。
IF 2.4
Bioinformatics advances Pub Date : 2024-06-28 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae087
Jaewoo Lee, Mehita Achuthan, Lucas Chen, Paulina Carmona-Mora
{"title":"A customizable secure DIY web application for accessing, sharing, and browsing aggregate experimental results and metadata.","authors":"Jaewoo Lee, Mehita Achuthan, Lucas Chen, Paulina Carmona-Mora","doi":"10.1093/bioadv/vbae087","DOIUrl":"10.1093/bioadv/vbae087","url":null,"abstract":"<p><strong>Summary: </strong>A problem spanning across many research fields is that processed data and research results are often scattered, which makes data access, analysis, extraction, and team sharing more challenging. We have developed a platform for researchers to easily manage tabular data with features like browsing, bookmarking, and linking to external open knowledge bases. The source code, originally designed for genomics research, is customizable for use by other fields or data, providing a no- to low-cost DIY system for research teams.</p><p><strong>Availability and implementation: </strong>The source code of our DIY app is available on https://github.com/Carmona-MoraUCD/Human-Genomics-Browser. It can be downloaded and run by anyone with a web browser, Python3, and Node.js on their machine. The web application is licensed under the MIT license.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11257709/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信