Journal of Integrative Bioinformatics最新文献

筛选
英文 中文
Data literacy in genome research. 基因组研究中的数据素养。
IF 1.9
Journal of Integrative Bioinformatics Pub Date : 2023-12-05 eCollection Date: 2023-12-01 DOI: 10.1515/jib-2023-0033
Katharina Wolff, Ronja Friedhoff, Friderieke Schwarzer, Boas Pucker
{"title":"Data literacy in genome research.","authors":"Katharina Wolff, Ronja Friedhoff, Friderieke Schwarzer, Boas Pucker","doi":"10.1515/jib-2023-0033","DOIUrl":"10.1515/jib-2023-0033","url":null,"abstract":"<p><p>With an ever increasing amount of research data available, it becomes constantly more important to possess data literacy skills to benefit from this valuable resource. An integrative course was developed to teach students the fundamentals of data literacy through an engaging genome sequencing project. Each cohort of students performed planning of the experiment, DNA extraction, nanopore sequencing, genome sequence assembly, prediction of genes in the assembled sequence, and assignment of functional annotation terms to predicted genes. Students learned how to communicate science through writing a protocol in the form of a scientific paper, providing comments during a peer-review process, and presenting their findings as part of an international symposium. Many students enjoyed the opportunity to own a project and to work towards a meaningful objective.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10777367/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138479289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate noise-robust classification of Bacillus species from MALDI-TOF MS spectra using a denoising autoencoder. 利用去噪自编码器对MALDI-TOF质谱中的芽孢杆菌进行精确的噪声鲁棒分类。
IF 1.9
Journal of Integrative Bioinformatics Pub Date : 2023-11-20 eCollection Date: 2023-09-01 DOI: 10.1515/jib-2023-0017
Yulia E Uvarova, Pavel S Demenkov, Irina N Kuzmicheva, Artur S Venzel, Elena L Mischenko, Timofey V Ivanisenko, Vadim M Efimov, Svetlana V Bannikova, Asya R Vasilieva, Vladimir A Ivanisenko, Sergey E Peltek
{"title":"Accurate noise-robust classification of Bacillus species from MALDI-TOF MS spectra using a denoising autoencoder.","authors":"Yulia E Uvarova, Pavel S Demenkov, Irina N Kuzmicheva, Artur S Venzel, Elena L Mischenko, Timofey V Ivanisenko, Vadim M Efimov, Svetlana V Bannikova, Asya R Vasilieva, Vladimir A Ivanisenko, Sergey E Peltek","doi":"10.1515/jib-2023-0017","DOIUrl":"10.1515/jib-2023-0017","url":null,"abstract":"<p><p>Bacillus strains are ubiquitous in the environment and are widely used in the microbiological industry as valuable enzyme sources, as well as in agriculture to stimulate plant growth. The Bacillus genus comprises several closely related groups of species. The rapid classification of these remains challenging using existing methods. Techniques based on MALDI-TOF MS data analysis hold significant promise for fast and precise microbial strains classification at both the genus and species levels. In previous work, we proposed a geometric approach to Bacillus strain classification based on mass spectra analysis via the centroid method (CM). One limitation of such methods is the noise in MS spectra. In this study, we used a denoising autoencoder (DAE) to improve bacteria classification accuracy under noisy MS spectra conditions. We employed a denoising autoencoder approach to convert noisy MS spectra into latent variables representing molecular patterns in the original MS data, and the Random Forest method to classify bacterial strains by latent variables. Comparison of the DAE-RF with the CM method using the artificially noisy test samples showed that DAE-RF offers higher noise robustness. Hence, the DAE-RF method could be utilized for noise-robust, fast, and neat classification of Bacillus species according to MALDI-TOF MS data.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10757077/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136400294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reconstruction of the regulatory hypermethylation network controlling hepatocellular carcinoma development during hepatitis C viral infection. 丙型肝炎病毒感染期间控制肝细胞癌发展的调节性超甲基化网络的重建。
IF 1.9
Journal of Integrative Bioinformatics Pub Date : 2023-11-20 eCollection Date: 2023-09-01 DOI: 10.1515/jib-2023-0013
Evgeniya A Antropova, Tamara M Khlebodarova, Pavel S Demenkov, Anastasiia R Volianskaia, Artur S Venzel, Nikita V Ivanisenko, Alexandr D Gavrilenko, Timofey V Ivanisenko, Anna V Adamovskaya, Polina M Revva, Nikolay A Kolchanov, Inna N Lavrik, Vladimir A Ivanisenko
{"title":"Reconstruction of the regulatory hypermethylation network controlling hepatocellular carcinoma development during hepatitis C viral infection.","authors":"Evgeniya A Antropova, Tamara M Khlebodarova, Pavel S Demenkov, Anastasiia R Volianskaia, Artur S Venzel, Nikita V Ivanisenko, Alexandr D Gavrilenko, Timofey V Ivanisenko, Anna V Adamovskaya, Polina M Revva, Nikolay A Kolchanov, Inna N Lavrik, Vladimir A Ivanisenko","doi":"10.1515/jib-2023-0013","DOIUrl":"10.1515/jib-2023-0013","url":null,"abstract":"<p><p>Hepatocellular carcinoma (HCC) has been associated with hepatitis C viral (HCV) infection as a potential risk factor. Nonetheless, the precise genetic regulatory mechanisms triggered by the virus, leading to virus-induced hepatocarcinogenesis, remain unclear. We hypothesized that HCV proteins might modulate the activity of aberrantly methylated HCC genes through regulatory pathways. Virus-host regulatory pathways, interactions between proteins, gene expression, transport, and stability regulation, were reconstructed using the ANDSystem. Gene expression regulation was statistically significant. Gene network analysis identified four out of 70 HCC marker genes whose expression regulation by viral proteins may be associated with HCC: <i>DNA-binding protein inhibitor ID - 1 (ID1)</i>, <i>flap endonuclease 1 (FEN1)</i>, <i>cyclin-dependent kinase inhibitor 2A (CDKN2A)</i>, and <i>telomerase reverse transcriptase (TERT)</i>. It suggested the following viral protein effects in HCV/human protein heterocomplexes: HCV NS3(p70) protein activates human STAT3 and NOTC1; NS2-3(p23), NS5B(p68), NS1(E2), and core(p21) activate SETD2; NS5A inhibits SMYD3; and NS3 inhibits CCN2. Interestingly, NS3 and E1(gp32) activate c-Jun when it positively regulates <i>CDKN2A</i> and inhibit it when it represses <i>TERT</i>. The discovered regulatory mechanisms might be key areas of focus for creating medications and preventative therapies to decrease the likelihood of HCC development during HCV infection.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10757076/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136400296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BGRS: bioinformatics of genome regulation and data integration. 基因组调控与数据整合的生物信息学。
IF 1.9
Journal of Integrative Bioinformatics Pub Date : 2023-11-16 eCollection Date: 2023-09-01 DOI: 10.1515/jib-2023-0032
Yuriy L Orlov, Ming Chen, Nikolay A Kolchanov, Ralf Hofestädt
{"title":"BGRS: bioinformatics of genome regulation and data integration.","authors":"Yuriy L Orlov, Ming Chen, Nikolay A Kolchanov, Ralf Hofestädt","doi":"10.1515/jib-2023-0032","DOIUrl":"10.1515/jib-2023-0032","url":null,"abstract":"","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10757072/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136400295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STARGATE-X: a Python package for statistical analysis on the REACTOME network. STARGATE-X:一个用于REACTOME网络统计分析的Python包。
IF 1.9
Journal of Integrative Bioinformatics Pub Date : 2023-09-21 eCollection Date: 2023-09-01 DOI: 10.1515/jib-2022-0029
Andrea Marino, Blerina Sinaimeri, Enrico Tronci, Tiziana Calamoneri
{"title":"STARGATE-X: a Python package for statistical analysis on the REACTOME network.","authors":"Andrea Marino, Blerina Sinaimeri, Enrico Tronci, Tiziana Calamoneri","doi":"10.1515/jib-2022-0029","DOIUrl":"10.1515/jib-2022-0029","url":null,"abstract":"<p><p>Many important aspects of biological knowledge at the molecular level can be represented by <i>pathways</i>. Through their analysis, we gain mechanistic insights and interpret lists of interesting genes from experiments (usually omics and functional genomic experiments). As a result, pathways play a central role in the development of bioinformatics methods and tools for computing predictions from known molecular-level mechanisms. Qualitative as well as quantitative knowledge about pathways can be effectively represented through <i>biochemical networks</i> linking the <i>biochemical reactions</i> and the compounds (<i>e.g.</i>, proteins) occurring in the considered pathways. So, repositories providing biochemical networks for known pathways play a central role in bioinformatics and in <i>systems biology</i>. Here we focus on Reactome, a free, comprehensive, and widely used repository for biochemical networks and pathways. In this paper, we: (1) introduce a tool StARGate-X (<i>STatistical Analysis of the</i> Reactome <i>multi-GrAph Through</i> nEtworkX) to carry out an automated analysis of the connectivity properties of Reactome biochemical reaction network and of its biological hierarchy (<i>i.e.</i>, cell compartments, namely, the closed parts within the cytosol, usually surrounded by a membrane); the code is freely available at https://github.com/marinoandrea/stargate-x; (2) show the effectiveness of our tool by providing an analysis of the Reactome network, in terms of centrality measures, with respect to in- and out-degree. As an example of usage of StARGate-X, we provide a detailed automated analysis of the Reactome network, in terms of centrality measures. We focus both on the subgraphs induced by single compartments and on the graph whose nodes are the strongly connected components. To the best of our knowledge, this is the first freely available tool that enables automatic analysis of the large biochemical network within Reactome through easy-to-use APIs (<i>Application Programming Interfaces</i>).</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10757075/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41168952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNAcode_Web - Convenient identification of evolutionary conserved protein coding regions. RNAcode_Web - 方便地识别进化保守的蛋白质编码区。
IF 1.9
Journal of Integrative Bioinformatics Pub Date : 2023-08-25 eCollection Date: 2023-09-01 DOI: 10.1515/jib-2022-0046
John Anders, Peter F Stadler
{"title":"RNAcode_Web - Convenient identification of evolutionary conserved protein coding regions.","authors":"John Anders, Peter F Stadler","doi":"10.1515/jib-2022-0046","DOIUrl":"10.1515/jib-2022-0046","url":null,"abstract":"<p><p>The differentiation of regions with coding potential from non-coding regions remains a key task in computational biology. Methods such as RNAcode that exploit patterns of sequence conservation for this task have a substantial advantage in classification accuracy in particular for short coding sequences, compared to methods that rely on a single input sequence. However, they require sequence alignments as input. Frequently, suitable multiple sequence alignments are not readily available and are tedious, and sometimes difficult to construct. We therefore introduce here a new web service that provides access to the well-known coding sequence detector RNAcode with minimal user overhead. It requires as input only a single target nucleotide sequence. The service automates the collection, selection, and preparation of homologous sequences from the NCBI database, as well as the construction of the multiple sequence alignment that are needed as input for RNAcode. The service automatizes the entire pre- and postprocessing and thus makes the investigation of specific genomic regions for previously unannotated coding regions, such as small peptides or additional introns, a simple task that is easily accessible to non-expert users. RNAcode_Web is accessible online at rnacode.bioinf.uni-leipzig.de.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10757073/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10057634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SnakeLines: integrated set of computational pipelines for sequencing reads. SnakeLines:用于测序读数的集成计算管道集。
IF 1.9
Journal of Integrative Bioinformatics Pub Date : 2023-08-21 eCollection Date: 2023-09-01 DOI: 10.1515/jib-2022-0059
Jaroslav Budiš, Werner Krampl, Marcel Kucharík, Rastislav Hekel, Adrián Goga, Jozef Sitarčík, Michal Lichvár, Dávid Smol'ak, Miroslav Böhmer, Andrej Baláž, František Ďuriš, Juraj Gazdarica, Katarína Šoltys, Ján Turňa, Ján Radvánszky, Tomáš Szemes
{"title":"SnakeLines: integrated set of computational pipelines for sequencing reads.","authors":"Jaroslav Budiš, Werner Krampl, Marcel Kucharík, Rastislav Hekel, Adrián Goga, Jozef Sitarčík, Michal Lichvár, Dávid Smol'ak, Miroslav Böhmer, Andrej Baláž, František Ďuriš, Juraj Gazdarica, Katarína Šoltys, Ján Turňa, Ján Radvánszky, Tomáš Szemes","doi":"10.1515/jib-2022-0059","DOIUrl":"10.1515/jib-2022-0059","url":null,"abstract":"<p><p>With the rapid growth of massively parallel sequencing technologies, still more laboratories are utilising sequenced DNA fragments for genomic analyses. Interpretation of sequencing data is, however, strongly dependent on bioinformatics processing, which is often too demanding for clinicians and researchers without a computational background. Another problem represents the reproducibility of computational analyses across separated computational centres with inconsistent versions of installed libraries and bioinformatics tools. We propose an easily extensible set of computational pipelines, called SnakeLines, for processing sequencing reads; including mapping, assembly, variant calling, viral identification, transcriptomics, and metagenomics analysis. Individual steps of an analysis, along with methods and their parameters can be readily modified in a single configuration file. Provided pipelines are embedded in virtual environments that ensure isolation of required resources from the host operating system, rapid deployment, and reproducibility of analysis across different Unix-based platforms. SnakeLines is a powerful framework for the automation of bioinformatics analyses, with emphasis on a simple set-up, modifications, extensibility, and reproducibility. The framework is already routinely used in various research projects and their applications, especially in the Slovak national surveillance of SARS-CoV-2.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10757078/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10089530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Concentration of inverted repeats along human DNA. 人类DNA中反向重复序列的浓度。
IF 1.9
Journal of Integrative Bioinformatics Pub Date : 2023-07-25 eCollection Date: 2023-06-01 DOI: 10.1515/jib-2022-0052
Carlos A C Bastos, Vera Afreixo, João M O S Rodrigues, Armando J Pinho
{"title":"Concentration of inverted repeats along human DNA.","authors":"Carlos A C Bastos,&nbsp;Vera Afreixo,&nbsp;João M O S Rodrigues,&nbsp;Armando J Pinho","doi":"10.1515/jib-2022-0052","DOIUrl":"10.1515/jib-2022-0052","url":null,"abstract":"<p><p>This work aims to describe the observed enrichment of inverted repeats in the human genome; and to identify and describe, with detailed length profiles, the regions with significant and relevant enriched occurrence of inverted repeats. The enrichment is assessed and tested with a recently proposed measure (<i>z</i>-scores based measure). We simulate a genome using an order 7 Markov model trained with the data from the real genome. The simulated genome is used to establish the critical values which are used as decision thresholds to identify the regions with significant enriched concentrations. Several human genome regions are highly enriched in the occurrence of inverted repeats. This is observed in all the human chromosomes. The distribution of inverted repeat lengths varies along the genome. The majority of the regions with severely exaggerated enrichment contain mainly short length inverted repeats. There are also regions with regular peaks along the inverted repeats lengths distribution (periodic regularities) and other regions with exaggerated enrichment for long lengths (less frequent). However, adjacent regions tend to have similar distributions.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10561070/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9895627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating omics databases for enhanced crop breeding. 整合 omics 数据库,促进作物育种。
IF 1.9
Journal of Integrative Bioinformatics Pub Date : 2023-07-25 eCollection Date: 2023-12-01 DOI: 10.1515/jib-2023-0012
Haoyu Chao, Shilong Zhang, Yueming Hu, Qingyang Ni, Saige Xin, Liang Zhao, Vladimir A Ivanisenko, Yuriy L Orlov, Ming Chen
{"title":"Integrating omics databases for enhanced crop breeding.","authors":"Haoyu Chao, Shilong Zhang, Yueming Hu, Qingyang Ni, Saige Xin, Liang Zhao, Vladimir A Ivanisenko, Yuriy L Orlov, Ming Chen","doi":"10.1515/jib-2023-0012","DOIUrl":"10.1515/jib-2023-0012","url":null,"abstract":"<p><p>Crop plant breeding involves selecting and developing new plant varieties with desirable traits such as increased yield, improved disease resistance, and enhanced nutritional value. With the development of high-throughput technologies, such as genomics, transcriptomics, and metabolomics, crop breeding has entered a new era. However, to effectively use these technologies, integration of multi-omics data from different databases is required. Integration of omics data provides a comprehensive understanding of the biological processes underlying plant traits and their interactions. This review highlights the importance of integrating omics databases in crop plant breeding, discusses available omics data and databases, describes integration challenges, and highlights recent developments and potential benefits. Taken together, the integration of omics databases is a critical step towards enhancing crop plant breeding and improving global food security.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10777369/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9912715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced identification of membrane transport proteins: a hybrid approach combining ProtBERT-BFD and convolutional neural networks. 增强膜转运蛋白的识别:结合ProtBERT-BFD和卷积神经网络的混合方法。
IF 1.9
Journal of Integrative Bioinformatics Pub Date : 2023-06-01 DOI: 10.1515/jib-2022-0055
Hamed Ghazikhani, Gregory Butler
{"title":"Enhanced identification of membrane transport proteins: a hybrid approach combining ProtBERT-BFD and convolutional neural networks.","authors":"Hamed Ghazikhani,&nbsp;Gregory Butler","doi":"10.1515/jib-2022-0055","DOIUrl":"https://doi.org/10.1515/jib-2022-0055","url":null,"abstract":"<p><p>Transmembrane transport proteins (transporters) play a crucial role in the fundamental cellular processes of all organisms by facilitating the transport of hydrophilic substrates across hydrophobic membranes. Despite the availability of numerous membrane protein sequences, their structures and functions remain largely elusive. Recently, natural language processing (NLP) techniques have shown promise in the analysis of protein sequences. Bidirectional Encoder Representations from Transformers (BERT) is an NLP technique adapted for proteins to learn contextual embeddings of individual amino acids within a protein sequence. Our previous strategy, TooT-BERT-T, differentiated transporters from non-transporters by employing a logistic regression classifier with fine-tuned representations from ProtBERT-BFD. In this study, we expand upon this approach by utilizing representations from ProtBERT, ProtBERT-BFD, and MembraneBERT in combination with classical classifiers. Additionally, we introduce TooT-BERT-CNN-T, a novel method that fine-tunes ProtBERT-BFD and discriminates transporters using a Convolutional Neural Network (CNN). Our experimental results reveal that CNN surpasses traditional classifiers in discriminating transporters from non-transporters, achieving an MCC of 0.89 and an accuracy of 95.1 % on the independent test set. This represents an improvement of 0.03 and 1.11 percentage points compared to TooT-BERT-T, respectively.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10389051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9925128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信