Classification of Long Non-Coding RNAs s Between Early and Late Stage of Liver Cancers From Non-coding RNA Profiles Using Machine-Learning Approach.

IF 2.3 Q3 BIOCHEMICAL RESEARCH METHODS
Bioinformatics and Biology Insights Pub Date : 2024-06-05 eCollection Date: 2024-01-01 DOI:10.1177/11779322241258586
Songtham Anuntakarun, Jakkrit Khamjerm, Pisit Tangkijvanich, Natthaya Chuaypen
{"title":"Classification of Long Non-Coding RNAs s Between Early and Late Stage of Liver Cancers From Non-coding RNA Profiles Using Machine-Learning Approach.","authors":"Songtham Anuntakarun, Jakkrit Khamjerm, Pisit Tangkijvanich, Natthaya Chuaypen","doi":"10.1177/11779322241258586","DOIUrl":null,"url":null,"abstract":"<p><p>Long non-coding RNAs (lncRNAs), which are RNA sequences greater than 200 nucleotides in length, play a crucial role in regulating gene expression and biological processes associated with cancer development and progression. Liver cancer is a major cause of cancer-related mortality worldwide, notably in Thailand. Although machine learning has been extensively used in analyzing RNA-sequencing data for advanced knowledge, the identification of potential lncRNA biomarkers for cancer, particularly focusing on lncRNAs as molecular biomarkers in liver cancer, remains comparatively limited. In this study, our objective was to identify candidate lncRNAs in liver cancer. We employed an expression data set of lncRNAs from patients with liver cancer, which comprised 40 699 lncRNAs sourced from The CancerLivER database. Various feature selection methods and machine-learning approaches were used to identify these candidate lncRNAs. The results showed that the random forest algorithm could predict lncRNAs using features extracted from the database, which achieved an area under the curve (AUC) of 0.840 for classifying lncRNAs between early (stage 1) and late stages (stages 2, 3, and 4) of liver cancer. Five of 23 significant lncRNAs (WAC-AS1, MAPKAPK5-AS1, ARRDC1-AS1, AC133528.2, and RP11-1094M14.11) were differentially expressed between early and late stage of liver cancer. Based on the Gene Expression Profiling Interactive Analysis (GEPIA) database, higher expression of WAC-AS1, MAPKAPK5-AS1, and ARRDC1-AS1 was associated with shorter overall survival. In conclusion, the classification model could predict the early and late stages of liver cancer using the signature expression of lncRNA genes. The identified lncRNAs might be used as early diagnostic and prognostic biomarkers for patients with liver cancer.</p>","PeriodicalId":9065,"journal":{"name":"Bioinformatics and Biology Insights","volume":"18 ","pages":"11779322241258586"},"PeriodicalIF":2.3000,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11155358/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics and Biology Insights","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/11779322241258586","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Long non-coding RNAs (lncRNAs), which are RNA sequences greater than 200 nucleotides in length, play a crucial role in regulating gene expression and biological processes associated with cancer development and progression. Liver cancer is a major cause of cancer-related mortality worldwide, notably in Thailand. Although machine learning has been extensively used in analyzing RNA-sequencing data for advanced knowledge, the identification of potential lncRNA biomarkers for cancer, particularly focusing on lncRNAs as molecular biomarkers in liver cancer, remains comparatively limited. In this study, our objective was to identify candidate lncRNAs in liver cancer. We employed an expression data set of lncRNAs from patients with liver cancer, which comprised 40 699 lncRNAs sourced from The CancerLivER database. Various feature selection methods and machine-learning approaches were used to identify these candidate lncRNAs. The results showed that the random forest algorithm could predict lncRNAs using features extracted from the database, which achieved an area under the curve (AUC) of 0.840 for classifying lncRNAs between early (stage 1) and late stages (stages 2, 3, and 4) of liver cancer. Five of 23 significant lncRNAs (WAC-AS1, MAPKAPK5-AS1, ARRDC1-AS1, AC133528.2, and RP11-1094M14.11) were differentially expressed between early and late stage of liver cancer. Based on the Gene Expression Profiling Interactive Analysis (GEPIA) database, higher expression of WAC-AS1, MAPKAPK5-AS1, and ARRDC1-AS1 was associated with shorter overall survival. In conclusion, the classification model could predict the early and late stages of liver cancer using the signature expression of lncRNA genes. The identified lncRNAs might be used as early diagnostic and prognostic biomarkers for patients with liver cancer.

利用机器学习方法从非编码 RNA 图谱对肝癌早期和晚期的长非编码 RNA 进行分类
长非编码 RNA(lncRNA)是指长度超过 200 个核苷酸的 RNA 序列,在调节基因表达以及与癌症发展和恶化相关的生物过程中发挥着至关重要的作用。肝癌是全球癌症相关死亡的主要原因,尤其是在泰国。虽然机器学习已被广泛用于分析 RNA 序列数据以获得先进的知识,但对癌症潜在 lncRNA 生物标志物的鉴定,尤其是将 lncRNA 作为肝癌分子生物标志物的鉴定,仍然相对有限。在本研究中,我们的目标是鉴定肝癌中的候选lncRNA。我们采用了肝癌患者的lncRNA表达数据集,该数据集由来自CancerLivER数据库的40 699个lncRNA组成。我们采用了多种特征选择方法和机器学习方法来识别这些候选lncRNA。结果表明,随机森林算法可以利用从数据库中提取的特征预测lncRNA,在对肝癌早期(1期)和晚期(2、3和4期)的lncRNA进行分类时,其曲线下面积(AUC)达到了0.840。在23个重要的lncRNA中,有5个(WAC-AS1、MAPKAPK5-AS1、ARRDC1-AS1、AC133528.2和RP11-1094M14.11)在肝癌早期和晚期之间有差异表达。基于基因表达谱交互分析(GEPIA)数据库,WAC-AS1、MAPKAPK5-AS1和ARRDC1-AS1的高表达与较短的总生存期相关。总之,该分类模型可以利用lncRNA基因的特征表达预测肝癌的早期和晚期。所发现的lncRNA可作为肝癌患者的早期诊断和预后生物标志物。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Bioinformatics and Biology Insights
Bioinformatics and Biology Insights BIOCHEMICAL RESEARCH METHODS-
CiteScore
6.80
自引率
1.70%
发文量
36
审稿时长
8 weeks
期刊介绍: Bioinformatics and Biology Insights is an open access, peer-reviewed journal that considers articles on bioinformatics methods and their applications which must pertain to biological insights. All papers should be easily amenable to biologists and as such help bridge the gap between theories and applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信