Ruibin Chen, Guobo Xie, Zhiyi Lin, Guosheng Gu, Yi Yu, Junrui Yu, Zhenguo Liu
{"title":"Predicting Microbe-Disease Associations Based on a Linear Neighborhood Label Propagation Method with Multi-order Similarity Fusion Learning.","authors":"Ruibin Chen, Guobo Xie, Zhiyi Lin, Guosheng Gu, Yi Yu, Junrui Yu, Zhenguo Liu","doi":"10.1007/s12539-024-00607-0","DOIUrl":"10.1007/s12539-024-00607-0","url":null,"abstract":"<p><p>Computational approaches employed for predicting potential microbe-disease associations often rely on similarity information between microbes and diseases. Therefore, it is important to obtain reliable similarity information by integrating multiple types of similarity information. However, existing similarity fusion methods do not consider multi-order fusion of similarity networks. To address this problem, a novel method of linear neighborhood label propagation with multi-order similarity fusion learning (MOSFL-LNP) is proposed to predict potential microbe-disease associations. Multi-order fusion learning comprises two parts: low-order global learning and high-order feature learning. Low-order global learning is used to obtain common latent features from multiple similarity sources. High-order feature learning relies on the interactions between neighboring nodes to identify high-order similarities and learn deeper interactive network structures. Coefficients are assigned to different high-order feature learning modules to balance the similarities learned from different orders and enhance the robustness of the fusion network. Overall, by combining low-order global learning with high-order feature learning, multi-order fusion learning can capture both the shared and unique features of different similarity networks, leading to more accurate predictions of microbe-disease associations. In comparison to six other advanced methods, MOSFL-LNP exhibits superior prediction performance in the leave-one-out cross-validation and 5-fold validation frameworks. In the case study, the predicted 10 microbes associated with asthma and type 1 diabetes have an accuracy rate of up to 90% and 100%, respectively.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"345-360"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140021617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ResDeepSurv: A Survival Model for Deep Neural Networks Based on Residual Blocks and Self-attention Mechanism.","authors":"Yuchen Wang, Xianchun Kong, Xiao Bi, Lizhen Cui, Hong Yu, Hao Wu","doi":"10.1007/s12539-024-00617-y","DOIUrl":"10.1007/s12539-024-00617-y","url":null,"abstract":"<p><p>Survival analysis, as a widely used method for analyzing and predicting the timing of event occurrence, plays a crucial role in the medicine field. Medical professionals utilize survival models to gain insight into the effects of patient covariates on the disease, and the correlation with the effectiveness of different treatment strategies. This knowledge is essential for the development of treatment plans and the enhancement of treatment approaches. Conventional survival models, such as the Cox proportional hazards model, require a significant amount of feature engineering or prior knowledge to facilitate personalized modeling. To address these limitations, we propose a novel residual-based self-attention deep neural network for survival modeling, called ResDeepSurv, which combines the benefits of neural networks and the Cox proportional hazards regression model. The model proposed in our study simulates the distribution of survival time and the correlation between covariates and outcomes, but does not impose strict assumptions on the basic distribution of survival data. This approach effectively accounts for both linear and nonlinear risk functions in survival data analysis. The performance of our model in analyzing survival data with various risk functions is on par with or even superior to that of other existing survival analysis methods. Furthermore, we validate the superior performance of our model in comparison to currently existing methods by evaluating multiple publicly available clinical datasets. Through this study, we prove the effectiveness of our proposed model in survival analysis, providing a promising alternative to traditional approaches. The application of deep learning techniques and the ability to capture complex relationships between covariates and survival outcomes without relying on extensive feature engineering make our model a valuable tool for personalized medicine and decision-making in clinical practice.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"405-417"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140136680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Combined Manual Annotation and Deep-Learning Natural Language Processing Study on Accurate Entity Extraction in Hereditary Disease Related Biomedical Literature.","authors":"Dao-Ling Huang, Quanlei Zeng, Yun Xiong, Shuixia Liu, Chaoqun Pang, Menglei Xia, Ting Fang, Yanli Ma, Cuicui Qiang, Yi Zhang, Yu Zhang, Hong Li, Yuying Yuan","doi":"10.1007/s12539-024-00605-2","DOIUrl":"10.1007/s12539-024-00605-2","url":null,"abstract":"<p><p>We report a combined manual annotation and deep-learning natural language processing study to make accurate entity extraction in hereditary disease related biomedical literature. A total of 400 full articles were manually annotated based on published guidelines by experienced genetic interpreters at Beijing Genomics Institute (BGI). The performance of our manual annotations was assessed by comparing our re-annotated results with those publicly available. The overall Jaccard index was calculated to be 0.866 for the four entity types-gene, variant, disease and species. Both a BERT-based large name entity recognition (NER) model and a DistilBERT-based simplified NER model were trained, validated and tested, respectively. Due to the limited manually annotated corpus, Such NER models were fine-tuned with two phases. The F1-scores of BERT-based NER for gene, variant, disease and species are 97.28%, 93.52%, 92.54% and 95.76%, respectively, while those of DistilBERT-based NER are 95.14%, 86.26%, 91.37% and 89.92%, respectively. Most importantly, the entity type of variant has been extracted by a large language model for the first time and a comparable F1-score with the state-of-the-art variant extraction model tmVar has been achieved.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"333-344"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11289304/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139716027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nan Zhao, Tong Wu, Wenda Wang, Lunchuan Zhang, Xinqi Gong
{"title":"Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure.","authors":"Nan Zhao, Tong Wu, Wenda Wang, Lunchuan Zhang, Xinqi Gong","doi":"10.1007/s12539-024-00626-x","DOIUrl":"10.1007/s12539-024-00626-x","url":null,"abstract":"<p><p>Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"261-288"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141491789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Review of the Application of Spatial Transcriptomics in Neuroscience.","authors":"Le Zhang, Zhenqi Xiong, Ming Xiao","doi":"10.1007/s12539-024-00603-4","DOIUrl":"10.1007/s12539-024-00603-4","url":null,"abstract":"<p><p>Since spatial transcriptomics can locate and distinguish the gene expression of functional genes in special regions and tissue, it is important for us to investigate the brain development, the development mechanism of brain diseases, and the relationship between brain structure and function in Neuroscience (or Brain science). While previous studies have introduced the crucial spatial transcriptomic techniques and data analysis methods, there are few studies to comprehensively overview the key methods, data resources, and technological applications of spatial transcriptomics in Neuroscience. For these reasons, we first investigate several common spatial transcriptomic data analysis approaches and data resources. Second, we introduce the applications of the spatial transcriptomic data analysis approaches in Neuroscience. Third, we summarize the integrating spatial transcriptomics with other technologies in Neuroscience. Finally, we discuss the challenges and future research directions of spatial transcriptomics in Neuroscience.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"243-260"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139905555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roha Arif, Sameera Kanwal, Saeed Ahmed, Muhammad Kabir
{"title":"A Computational Predictor for Accurate Identification of Tumor Homing Peptides by Integrating Sequential and Deep BiLSTM Features.","authors":"Roha Arif, Sameera Kanwal, Saeed Ahmed, Muhammad Kabir","doi":"10.1007/s12539-024-00628-9","DOIUrl":"10.1007/s12539-024-00628-9","url":null,"abstract":"<p><p>Cancer remains a severe illness, and current research indicates that tumor homing peptides (THPs) play an important part in cancer therapy. The identification of THPs can provide crucial insights for drug-discovery and pharmaceutical industries as they allow for tailored medication delivery towards cancer cells. These peptides have a high affinity enabling particular receptors present upon tumor surfaces, allowing for the creation of precision medications that reduce off-target consequences and enhance cancer patient treatment results. Wet-lab techniques are considered essential tools for studying THPs; however, they're labor-extensive and time-consuming, therefore making prediction of THPs a challenging task for the researchers. Computational-techniques, on the other hand, are considered significant tools in identifying THPs according to the sequence data. Despite many strategies have been presented to predict new THP, there is still a need to develop a robust method with higher rates of success. In this paper, we developed a novel framework, THP-DF, for accurately identifying THPs on a large-scale. Firstly, the peptide sequences are encoded through various sequential features. Secondly, each feature is passed to BiLSTM and attention layers to extract simplified deep features. Finally, an ensemble-framework is formed via integrating sequential- and deep features which are fed to a support vector machine which with 10-fold cross-validation to carry to validate the efficiency. The experimental results showed that THP-DF worked better on both [Formula: see text] and [Formula: see text] datasets by achieving accuracy of > 95% which are higher than existing predictors both datasets. This indicates that the proposed predictor could be a beneficial tool to precisely and rapidly identify THPs and will contribute to the cutting-edge cancer treatment strategies and pharmaceuticals.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"503-518"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140907811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Chen, Huilian Zhang, Quan Zou, Bo Liao, Xia-an Bi
{"title":"Multi-kernel Learning Fusion Algorithm Based on RNN and GRU for ASD Diagnosis and Pathogenic Brain Region Extraction","authors":"Jie Chen, Huilian Zhang, Quan Zou, Bo Liao, Xia-an Bi","doi":"10.1007/s12539-024-00629-8","DOIUrl":"https://doi.org/10.1007/s12539-024-00629-8","url":null,"abstract":"<p>Autism spectrum disorder (ASD) is a complex, severe disorder related to brain development. It impairs patient language communication and social behaviors. In recent years, ASD researches have focused on a single-modal neuroimaging data, neglecting the complementarity between multi-modal data. This omission may lead to poor classification. Therefore, it is important to study multi-modal data of ASD for revealing its pathogenesis. Furthermore, recurrent neural network (RNN) and gated recurrent unit (GRU) are effective for sequence data processing. In this paper, we introduce a novel framework for a Multi-Kernel Learning Fusion algorithm based on RNN and GRU (MKLF-RAG). The framework utilizes RNN and GRU to provide feature selection for data of different modalities. Then these features are fused by MKLF algorithm to detect the pathological mechanisms of ASD and extract the most relevant the Regions of Interest (ROIs) for the disease. The MKLF-RAG proposed in this paper has been tested in a variety of experiments with the Autism Brain Imaging Data Exchange (ABIDE) database. Experimental findings indicate that our framework notably enhances the classification accuracy for ASD. Compared with other methods, MKLF-RAG demonstrates superior efficacy across multiple evaluation metrics and could provide valuable insights into the early diagnosis of ASD.</p><h3 data-test=\"abstract-sub-heading\">Graphical abstract</h3>\u0000","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":"36 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"H-ACO with Consecutive Bases Pairing Constraint for Designing DNA Sequences","authors":"Xuwei Yang, Donglin Zhu, Can Yang, Changjun Zhou","doi":"10.1007/s12539-024-00614-1","DOIUrl":"https://doi.org/10.1007/s12539-024-00614-1","url":null,"abstract":"<p>DNA computing is a novel computing method that does not rely on traditional computers. The design of DNA sequences is a crucial step in DNA computing, and the quality of the sequence design directly affects the results of DNA computing. In this paper, a new constraint called the consecutive base pairing constraint is proposed to limit specific base pairings in DNA sequence design. Additionally, to improve the efficiency and capability of DNA sequence design, the Hierarchy-ant colony (H-ACO) algorithm is introduced, which combines the features of multiple algorithms and optimizes discrete numerical calculations. Experimental results show that the H-ACO algorithm performs well in DNA sequence design. Finally, this paper compares a series of constraint values and NUPACK simulation data with previous design results, and the DNA sequence set designed in this paper has more advantages.</p><h3 data-test=\"abstract-sub-heading\">Graphical Abstract</h3>\u0000","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":"5 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guangle Zhang, Yuan Zhang, Ling Li, Jiaying Zhou, Honglin Chen, Jinwen Ji, Yanru Li, Yue Cao, Zhihui Xu, Cong Pian
{"title":"Exploring Novel Fentanyl Analogues Using a Graph-Based Transformer Model","authors":"Guangle Zhang, Yuan Zhang, Ling Li, Jiaying Zhou, Honglin Chen, Jinwen Ji, Yanru Li, Yue Cao, Zhihui Xu, Cong Pian","doi":"10.1007/s12539-024-00623-0","DOIUrl":"https://doi.org/10.1007/s12539-024-00623-0","url":null,"abstract":"<p>The structures of fentanyl and its analogues are easy to be modified and few types have been included in database so far, which allow criminals to avoid the supervision of relevant departments. This paper introduces a molecular graph-based transformer model, which is combined with a data augmentation method based on substructure replacement to generate novel fentanyl analogues. 140,000 molecules were generated, and after a set of screening, 36,799 potential fentanyl analogues were finally obtained. We calculated the molecular properties of 36,799 potential fentanyl analogues. The results showed that the model could learn some properties of original fentanyl molecules. We compared the generated molecules from transformer model and data augmentation method based on substructure replacement with those generated by the other two molecular generation models based on deep learning, and found that the model in this paper can generate more novel potential fentanyl analogues. Finally, the findings of the paper indicate that transformer model based on molecular graph helps us explore the structure of potential fentanyl molecules as well as understand distribution of original molecules of fentanyl.</p><h3 data-test=\"abstract-sub-heading\">Graphical Abstract</h3>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":"52 1","pages":""},"PeriodicalIF":4.8,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140813055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}