Linc2function: A Comprehensive Pipeline and Webserver for Long Non-Coding RNA (lncRNA) Identification and Functional Predictions Using Deep Learning Approaches.

IF 2.5 Q3 GENETICS & HEREDITY
Yashpal Ramakrishnaiah, Adam P Morris, Jasbir Dhaliwal, Melcy Philip, Levin Kuhlmann, Sonika Tyagi
{"title":"Linc2function: A Comprehensive Pipeline and Webserver for Long Non-Coding RNA (lncRNA) Identification and Functional Predictions Using Deep Learning Approaches.","authors":"Yashpal Ramakrishnaiah,&nbsp;Adam P Morris,&nbsp;Jasbir Dhaliwal,&nbsp;Melcy Philip,&nbsp;Levin Kuhlmann,&nbsp;Sonika Tyagi","doi":"10.3390/epigenomes7030022","DOIUrl":null,"url":null,"abstract":"<p><p>Long non-coding RNAs (lncRNAs), comprising a significant portion of the human transcriptome, serve as vital regulators of cellular processes and potential disease biomarkers. However, the function of most lncRNAs remains unknown, and furthermore, existing approaches have focused on gene-level investigation. Our work emphasizes the importance of transcript-level annotation to uncover the roles of specific transcript isoforms. We propose that understanding the mechanisms of lncRNA in pathological processes requires solving their structural motifs and interactomes. A complete lncRNA annotation first involves discriminating them from their coding counterparts and then predicting their functional motifs and target bio-molecules. Current in silico methods mainly perform primary-sequence-based discrimination using a reference model, limiting their comprehensiveness and generalizability. We demonstrate that integrating secondary structure and interactome information, in addition to using transcript sequence, enables a comprehensive functional annotation. Annotating lncRNA for newly sequenced species is challenging due to inconsistencies in functional annotations, specialized computational techniques, limited accessibility to source code, and the shortcomings of reference-based methods for cross-species predictions. To address these challenges, we developed a pipeline for identifying and annotating transcript sequences at the isoform level. We demonstrate the effectiveness of the pipeline by comprehensively annotating the lncRNA associated with two specific disease groups. The source code of our pipeline is available under the MIT licensefor local use by researchers to make new predictions using the pre-trained models or to re-train models on new sequence datasets. Non-technical users can access the pipeline through a web server setup.</p>","PeriodicalId":55768,"journal":{"name":"Epigenomes","volume":"7 3","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10528440/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epigenomes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/epigenomes7030022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Long non-coding RNAs (lncRNAs), comprising a significant portion of the human transcriptome, serve as vital regulators of cellular processes and potential disease biomarkers. However, the function of most lncRNAs remains unknown, and furthermore, existing approaches have focused on gene-level investigation. Our work emphasizes the importance of transcript-level annotation to uncover the roles of specific transcript isoforms. We propose that understanding the mechanisms of lncRNA in pathological processes requires solving their structural motifs and interactomes. A complete lncRNA annotation first involves discriminating them from their coding counterparts and then predicting their functional motifs and target bio-molecules. Current in silico methods mainly perform primary-sequence-based discrimination using a reference model, limiting their comprehensiveness and generalizability. We demonstrate that integrating secondary structure and interactome information, in addition to using transcript sequence, enables a comprehensive functional annotation. Annotating lncRNA for newly sequenced species is challenging due to inconsistencies in functional annotations, specialized computational techniques, limited accessibility to source code, and the shortcomings of reference-based methods for cross-species predictions. To address these challenges, we developed a pipeline for identifying and annotating transcript sequences at the isoform level. We demonstrate the effectiveness of the pipeline by comprehensively annotating the lncRNA associated with two specific disease groups. The source code of our pipeline is available under the MIT licensefor local use by researchers to make new predictions using the pre-trained models or to re-train models on new sequence datasets. Non-technical users can access the pipeline through a web server setup.

Abstract Image

Abstract Image

Abstract Image

Linc2function:使用深度学习方法进行长非编码RNA(lncRNA)识别和功能预测的综合管道和Web服务器。
长非编码RNA(lncRNA)是人类转录组的重要组成部分,是细胞过程和潜在疾病生物标志物的重要调节因子。然而,大多数lncRNA的功能仍然未知,此外,现有的方法侧重于基因水平的研究。我们的工作强调了转录水平注释的重要性,以揭示特定转录异构体的作用。我们提出,了解lncRNA在病理过程中的机制需要解决它们的结构基序和相互作用体。完整的lncRNA注释首先包括将它们与编码对应物区分开来,然后预测它们的功能基序和靶生物分子。目前的计算机方法主要使用参考模型进行基于初级序列的判别,限制了它们的全面性和可推广性。我们证明,除了使用转录序列外,整合二级结构和相互作用组信息还可以实现全面的功能注释。由于功能注释的不一致性、专门的计算技术、源代码的可访问性有限以及基于参考的跨物种预测方法的缺点,为新测序物种注释lncRNA具有挑战性。为了应对这些挑战,我们开发了一个在异构体水平上识别和注释转录序列的管道。我们通过全面注释与两个特定疾病组相关的lncRNA来证明该管道的有效性。我们管道的源代码在麻省理工学院的许可下可用,供研究人员在本地使用,使用预先训练的模型进行新的预测,或在新的序列数据集上重新训练模型。非技术用户可以通过网络服务器设置访问管道。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Epigenomes
Epigenomes GENETICS & HEREDITY-
CiteScore
3.80
自引率
0.00%
发文量
38
审稿时长
11 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信