Tao Tang, Yiping Liu, Binshuang Zheng, Rong Li, Xiaocai Zhang, Yuansheng Liu
{"title":"Integration of hybrid and self-correction method improves the quality of long-read sequencing data.","authors":"Tao Tang, Yiping Liu, Binshuang Zheng, Rong Li, Xiaocai Zhang, Yuansheng Liu","doi":"10.1093/bfgp/elad026","DOIUrl":"10.1093/bfgp/elad026","url":null,"abstract":"<p><p>Third-generation sequencing (TGS) technologies have revolutionized genome science in the past decade. However, the long-read data produced by TGS platforms suffer from a much higher error rate than that of the previous technologies, thus complicating the downstream analysis. Several error correction tools for long-read data have been developed; these tools can be categorized into hybrid and self-correction tools. So far, these two types of tools are separately investigated, and their interplay remains understudied. Here, we integrate hybrid and self-correction methods for high-quality error correction. Our procedure leverages the inter-similarity between long-read data and high-accuracy information from short reads. We compare the performance of our method and state-of-the-art error correction tools on Escherichia coli and Arabidopsis thaliana datasets. The result shows that the integration approach outperformed the existing error correction methods and holds promise for improving the quality of downstream analyses in genomic research.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9669190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yizhi Cui, Hongzhi Liu, Yutong Ming, Zheng Zhang, Li Liu, Ruijun Liu
{"title":"Prediction of strand-specific and cell-type-specific G-quadruplexes based on high-resolution CUT&Tag data.","authors":"Yizhi Cui, Hongzhi Liu, Yutong Ming, Zheng Zhang, Li Liu, Ruijun Liu","doi":"10.1093/bfgp/elad024","DOIUrl":"10.1093/bfgp/elad024","url":null,"abstract":"<p><p>G-quadruplex (G4), a non-classical deoxyribonucleic acid structure, is widely distributed in the genome and involved in various biological processes. In vivo, high-throughput sequencing has indicated that G4s are significantly enriched at functional regions in a cell-type-specific manner. Therefore, the prediction of G4s based on computational methods is necessary instead of the time-consuming and laborious experimental methods. Recently, G4 CUT&Tag has been developed to generate higher-resolution sequencing data than ChIP-seq, which provides more accurate training samples for model construction. In this paper, we present a new dataset construction method based on G4 CUT&Tag sequencing data and an XGBoost prediction model based on the machine learning boost method. The results show that our model performs well within and across cell types. Furthermore, sequence analysis indicates that the formation of G4 structure is greatly affected by the flanking sequences, and the GC content of the G4 flanking sequences is higher than non-G4. Moreover, we also identified G4 motifs in the high-resolution dataset, among which we found several motifs for known transcription factors (TFs), such as SP2 and BPC. These TFs may directly or indirectly affect the formation of the G4 structure.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9683854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vikram S Gaur, Salej Sood, Carlos Guzmán, Kenneth M Olsen
{"title":"Molecular insights on the origin and development of waxy genotypes in major crop plants.","authors":"Vikram S Gaur, Salej Sood, Carlos Guzmán, Kenneth M Olsen","doi":"10.1093/bfgp/elad035","DOIUrl":"10.1093/bfgp/elad035","url":null,"abstract":"<p><p>Starch is a significant ingredient of the seed endosperm with commercial importance in food and industry. Crop varieties with glutinous (waxy) grain characteristics, i.e. starch with high amylopectin and low amylose, hold longstanding cultural importance in some world regions and unique properties for industrial manufacture. The waxy character in many crop species is regulated by a single gene known as GBSSI (or waxy), which encodes the enzyme Granule Bound Starch Synthase1 with null or reduced activity. Several allelic variants of the waxy gene that contribute to varying levels of amylose content have been reported in different crop plants. Phylogenetic analysis of protein sequences and the genomic DNA encoding GBSSI of major cereals and recently sequenced millets and pseudo-cereals have shown that GBSSI orthologs form distinct clusters, each representing a separate crop lineage. With the rapidly increasing demand for waxy starch in food and non-food applications, conventional crop breeding techniques and modern crop improvement technologies such as gene silencing and genome editing have been deployed to develop new waxy crop cultivars. The advances in research on waxy alleles across different crops have unveiled new possibilities for modifying the synthesis of amylose and amylopectin starch, leading to the potential creation of customized crops in the future. This article presents molecular lines of evidence on the emergence of waxy genes in various crops, including their genesis and evolution, molecular structure, comparative analysis and breeding innovations.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47913967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dayu Tan, Haijun Jiang, Haitao Li, Ying Xie, Yansen Su
{"title":"Prediction of drug-protein interaction based on dual channel neural networks with attention mechanism.","authors":"Dayu Tan, Haijun Jiang, Haitao Li, Ying Xie, Yansen Su","doi":"10.1093/bfgp/elad037","DOIUrl":"10.1093/bfgp/elad037","url":null,"abstract":"<p><p>The precise identification of drug-protein inter action (DPI) can significantly speed up the drug discovery process. Bioassay methods are time-consuming and expensive to screen for each pair of drug proteins. Machine-learning-based methods cannot accurately predict a large number of DPIs. Compared with traditional computing methods, deep learning methods need less domain knowledge and have strong data learning ability. In this study, we construct a DPI prediction model based on dual channel neural networks with an efficient path attention mechanism, called DCA-DPI. The drug molecular graph and protein sequence are used as the data input of the model, and the residual graph neural network and the residual convolution network are used to learn the feature representation of the drug and protein, respectively, to obtain the feature vector of the drug and the hidden vector of protein. To get a more accurate protein feature vector, the weighted sum of the hidden vector of protein is applied using the neural attention mechanism. In the end, drug and protein vectors are concatenated and input into the full connection layer for classification. In order to evaluate the performance of DCA-DPI, three widely used public data, Human, C.elegans and DUD-E, are used in the experiment. The evaluation metrics values in the experiment are superior to other relevant methods. Experiments show that our model is efficient for DPI prediction.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10112268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ting He, Zhipeng Gao, Ling Lin, Xu Zhang, Quan Zou
{"title":"Prognostic signature analysis and survival prediction of esophageal cancer based on N6-methyladenosine associated lncRNAs.","authors":"Ting He, Zhipeng Gao, Ling Lin, Xu Zhang, Quan Zou","doi":"10.1093/bfgp/elad028","DOIUrl":"10.1093/bfgp/elad028","url":null,"abstract":"<p><p>Esophageal cancer (ESCA) has a bad prognosis. Long non-coding RNA (lncRNA) impacts on cell proliferation. However, the prognosis function of N6-methyladenosine (m6A)-associated lncRNAs (m6A-lncRNAs) in ESCA remains unknown. Univariate Cox analysis was applied to investigate prognosis related m6A-lncRNAs, based on which the samples were clustered. Wilcoxon rank and Chi-square tests were adopted to compare the clinical traits, survival, pathway activity and immune infiltration in different clusters where overall survival, clinical traits (N stage), tumor-invasive immune cells and pathway activity were found significantly different. Through least absolute shrinkage and selection operator and proportional hazard (Lasso-Cox) model, five m6A-lncRNAs were selected to construct the prognostic signature (m6A-lncSig) and risk score. To investigate the link between risk score and clinical traits or immunological microenvironments, Chi-square test and Spearman correlation analysis were utilized. Risk score was found connected with N stage, tumor stage, different clusters, macrophages M2, B cells naive and T cells CD4 memory resting. Risk score and tumor stage were found as independent prognostic variables. And the constructed nomogram model had high accuracy in predicting prognosis. The obtained m6A-lncSig could be taken as potential prognostic biomarker for ESCA patients. This study offers a theoretical foundation for clinical diagnosis and prognosis of ESCA.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9886829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Archana Mathur, Nikhilanand Arya, Kitsuchart Pasupa, Sriparna Saha, Sudeepa Roy Dey, Snehanshu Saha
{"title":"Breast cancer prognosis through the use of multi-modal classifiers: current state of the art and the way forward","authors":"Archana Mathur, Nikhilanand Arya, Kitsuchart Pasupa, Sriparna Saha, Sudeepa Roy Dey, Snehanshu Saha","doi":"10.1093/bfgp/elae015","DOIUrl":"https://doi.org/10.1093/bfgp/elae015","url":null,"abstract":"We present a survey of the current state-of-the-art in breast cancer detection and prognosis. We analyze the evolution of Artificial Intelligence-based approaches from using just uni-modal information to multi-modality for detection and how such paradigm shift facilitates the efficacy of detection, consistent with clinical observations. We conclude that interpretable AI-based predictions and ability to handle class imbalance should be considered priority.","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140828431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Short-homology-mediated PCR-based method for gene introduction in the fission yeast Schizosaccharomyces pombe","authors":"Cai-Xia Zhang, Ying-Chun Hou","doi":"10.1093/bfgp/elae016","DOIUrl":"https://doi.org/10.1093/bfgp/elae016","url":null,"abstract":"Schizosaccharomyces pombe is a commonly utilized model organism for studying various aspects of eukaryotic cell physiology. One reason for its widespread use as an experimental system is the ease of genetic manipulations, leveraging the natural homology-targeted repair mechanism to accurately modify the genome. We conducted a study to assess the feasibility and efficiency of directly introducing exogenous genes into the fission yeast S. pombe using Polymerase Chain Reaction (PCR) with short-homology flanking sequences. Specifically, we amplified the NatMX6 gene (which provides resistance to nourseothricin) using PCR with oligonucleotides that had short flanking regions of 20 bp, 40 bp, 60 bp and 80 bp to the target gene. By using this purified PCR product, we successfully introduced the NatMX6 gene at position 171 385 on chromosome III in S. pombe. We have made a simple modification to the transformation procedure, resulting in a significant increase in transformation efficiency by at least 5-fold. The success rate of gene integration at the target position varied between 20% and 50% depending on the length of the flanking regions. Additionally, we discovered that the addition of dimethyl sulfoxide and boiled carrier DNA increased the number of transformants by ~60- and 3-fold, respectively. Furthermore, we found that the removal of the pku70+ gene improved the transformation efficiency to ~5% and reduced the formation of small background colonies. Overall, our results demonstrate that with this modified method, even very short stretches of homologous regions (as short as 20 bp) can be used to effectively target genes at a high frequency in S. pombe. This finding greatly facilitates the introduction of exogenous genes in this organism.","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140798721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genome structural dynamics: insights from Gaussian network analysis of Hi-C data.","authors":"Anupam Banerjee, She Zhang, Ivet Bahar","doi":"10.1093/bfgp/elae014","DOIUrl":"https://doi.org/10.1093/bfgp/elae014","url":null,"abstract":"Characterization of the spatiotemporal properties of the chromatin is essential to gaining insights into the physical bases of gene co-expression, transcriptional regulation and epigenetic modifications. The Gaussian network model (GNM) has proven in recent work to serve as a useful tool for modeling chromatin structural dynamics, using as input high-throughput chromosome conformation capture data. We focus here on the exploration of the collective dynamics of chromosomal structures at hierarchical levels of resolution, from single gene loci to topologically associating domains or entire chromosomes. The GNM permits us to identify long-range interactions between gene loci, shedding light on the role of cross-correlations between distal regions of the chromosomes in regulating gene expression. Notably, GNM analysis performed across diverse cell lines highlights the conservation of the global/cooperative movements of the chromatin across different types of cells. Variations driven by localized couplings between genomic loci, on the other hand, underlie cell differentiation, underscoring the significance of the four-dimensional properties of the genome in defining cellular identity. Finally, we demonstrate the close relation between the cell type-dependent mobility profiles of gene loci and their gene expression patterns, providing a clear demonstration of the role of chromosomal 4D features in defining cell-specific differential expression of genes.","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140676888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maciej Wiśniewski, Peace Babirye, Carol Musubika, Eleni Papakonstantinou, Samuel Kirimunda, Michał Łaźniewski, Teresa Szczepińska, Moses L. Joloba, Elias Eliopoulos, Erik Bongcam-Rudloff, D. Vlachakis, Anup Kumar Halder, Dariusz Plewczyński, M. Wayengera
{"title":"Use of in silico approaches, synthesis and profiling of Pan-filovirus GP-1,2 preprotein specific antibodies.","authors":"Maciej Wiśniewski, Peace Babirye, Carol Musubika, Eleni Papakonstantinou, Samuel Kirimunda, Michał Łaźniewski, Teresa Szczepińska, Moses L. Joloba, Elias Eliopoulos, Erik Bongcam-Rudloff, D. Vlachakis, Anup Kumar Halder, Dariusz Plewczyński, M. Wayengera","doi":"10.1093/bfgp/elae012","DOIUrl":"https://doi.org/10.1093/bfgp/elae012","url":null,"abstract":"Intermolecular interactions of protein-protein complexes play a principal role in the process of discovering new substances used in the diagnosis and treatment of many diseases. Among such complexes of proteins, we have to mention antibodies; they interact with specific antigens of two genera of single-stranded RNA viruses belonging to the family Filoviridae-Ebolavirus and Marburgvirus; both cause rare but fatal viral hemorrhagic fever in Africa, with pandemic potential. In this research, we conduct studies aimed at the design and evaluation of antibodies targeting the filovirus glycoprotein precursor GP-1,2 to develop potential targets for the pan-filovirus easy-to-use rapid diagnostic tests. The in silico research using the available 3D structure of the natural antibody-antigen complex was carried out to determine the stability of individual protein segments in the process of its formation and maintenance. The computed free binding energy of the complex and its decomposition for all amino acids allowed us to define the residues that play an essential role in the structure and indicated the spots where potential antibodies can be improved. Following that, the study involved targeting six epitopes of the filovirus GP1,2 with two polyclonal antibodies (pABs) and 14 monoclonal antibodies (mABs). The evaluation conducted using Enzyme Immunoassays tested 62 different sandwich combinations of monoclonal antibodies (mAbs), identifying 10 combinations that successfully captured the recombinant GP1,2 (rGP). Among these combinations, the sandwich option (3G2G12* - (rGP) - 2D8F11) exhibited the highest propensity for capturing the rGP antigen.","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140716078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comprehensive review of machine learning techniques for multi-omics data integration: challenges and applications in precision oncology.","authors":"Debabrata Acharya, Anirban Mukhopadhyay","doi":"10.1093/bfgp/elae013","DOIUrl":"https://doi.org/10.1093/bfgp/elae013","url":null,"abstract":"Multi-omics data play a crucial role in precision medicine, mainly to understand the diverse biological interaction between different omics. Machine learning approaches have been extensively employed in this context over the years. This review aims to comprehensively summarize and categorize these advancements, focusing on the integration of multi-omics data, which includes genomics, transcriptomics, proteomics and metabolomics, alongside clinical data. We discuss various machine learning techniques and computational methodologies used for integrating distinct omics datasets and provide valuable insights into their application. The review emphasizes both the challenges and opportunities present in multi-omics data integration, precision medicine and patient stratification, offering practical recommendations for method selection in various scenarios. Recent advances in deep learning and network-based approaches are also explored, highlighting their potential to harmonize diverse biological information layers. Additionally, we present a roadmap for the integration of multi-omics data in precision oncology, outlining the advantages, challenges and implementation difficulties. Hence this review offers a thorough overview of current literature, providing researchers with insights into machine learning techniques for patient stratification, particularly in precision oncology. Contact: anirban@klyuniv.ac.in.","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140719603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}