Diagnosis of leukemia using microarray analysis based on Hidden Markov Model and Random Convolutional Kernel Transform

IF 3.1 4区生物学 Q2 BIOLOGY

Computational Biology and Chemistry Pub Date : 2025-09-08 DOI:10.1016/j.compbiolchem.2025.108676

Sareh Baqeri Matak , Elham Askari , Sara Motamed

{"title":"Diagnosis of leukemia using microarray analysis based on Hidden Markov Model and Random Convolutional Kernel Transform","authors":"Sareh Baqeri Matak , Elham Askari , Sara Motamed","doi":"10.1016/j.compbiolchem.2025.108676","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Leukemia is one of the most prevalent cancers worldwide, and early detection is critical for effective treatment. Microarray data is a key tool in this process, given the vast number of genes involved, which makes the analysis complex and time-consuming. Identifying relevant genes is a crucial step in disease diagnosis.</div></div><div><h3>Material and methods</h3><div>This study aims to improve the diagnostic accuracy of various leukemia types by using microarray data in combination with advanced deep learning techniques. The proposed model begins with selecting essential features and sequences relevant to diagnosis. These data sequences are processed using a Generative Adversarial Network (GAN) with a U-Net architecture to generate synthetic data. Both the synthetic and original data are then labeled for analysis. Feature ranking is conducted using a Hidden Markov Model (HMM), followed by classification using the Random Convolutional Kernel Transformation (ROCKET) approach. This process ultimately predicts five leukemia categories within the sample.</div></div><div><h3>Results</h3><div>The results demonstrate that the proposed model achieves a high classification accuracy of 99.26 %, outperforming existing methods.</div></div><div><h3>Conclusion</h3><div>This research highlights the importance of leveraging DNA alterations associated with genetic mutations to improve leukemia diagnostics, emphasizing the potential for early detection and intervention. In simpler terms, identifying DNA modifications across the genome can help predict an individual's likelihood of developing leukemia. Detecting these changes can significantly aid in diagnosis.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"120 ","pages":"Article 108676"},"PeriodicalIF":3.1000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927125003378","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction

Leukemia is one of the most prevalent cancers worldwide, and early detection is critical for effective treatment. Microarray data is a key tool in this process, given the vast number of genes involved, which makes the analysis complex and time-consuming. Identifying relevant genes is a crucial step in disease diagnosis.

Material and methods

This study aims to improve the diagnostic accuracy of various leukemia types by using microarray data in combination with advanced deep learning techniques. The proposed model begins with selecting essential features and sequences relevant to diagnosis. These data sequences are processed using a Generative Adversarial Network (GAN) with a U-Net architecture to generate synthetic data. Both the synthetic and original data are then labeled for analysis. Feature ranking is conducted using a Hidden Markov Model (HMM), followed by classification using the Random Convolutional Kernel Transformation (ROCKET) approach. This process ultimately predicts five leukemia categories within the sample.

Results

The results demonstrate that the proposed model achieves a high classification accuracy of 99.26 %, outperforming existing methods.

Conclusion

This research highlights the importance of leveraging DNA alterations associated with genetic mutations to improve leukemia diagnostics, emphasizing the potential for early detection and intervention. In simpler terms, identifying DNA modifications across the genome can help predict an individual's likelihood of developing leukemia. Detecting these changes can significantly aid in diagnosis.

查看原文本刊更多论文

基于隐马尔可夫模型和随机卷积核变换的微阵列分析诊断白血病

白血病是世界上最常见的癌症之一，早期发现对有效治疗至关重要。微阵列数据是这一过程中的关键工具，因为涉及的基因数量庞大，这使得分析变得复杂和耗时。识别相关基因是疾病诊断的关键步骤。材料与方法本研究旨在利用微阵列数据与先进的深度学习技术相结合，提高各种白血病类型的诊断准确性。该模型首先选择与诊断相关的基本特征和序列。这些数据序列使用具有U-Net架构的生成对抗网络（GAN）进行处理以生成合成数据。然后，合成数据和原始数据都被标记以供分析。使用隐马尔可夫模型（HMM）进行特征排序，然后使用随机卷积核变换（ROCKET）方法进行分类。这个过程最终预测了样本中的五种白血病类型。结果该模型的分类准确率达到99.26 %，优于现有的分类方法。本研究强调了利用与基因突变相关的DNA改变来提高白血病诊断的重要性，强调了早期发现和干预的潜力。简单地说，识别基因组中的DNA修饰可以帮助预测个体患白血病的可能性。检测这些变化可以显著帮助诊断。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Biology and Chemistry 生物-计算机：跨学科应用

CiteScore

6.10

自引率

3.20%

发文量

142

审稿时长

24 days

期刊介绍： Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered. Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered. Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.