{"title":"多组学数据聚类的半监督深度矩阵分解模型。","authors":"Khanh Luong, Nirav Joshi, Richi Nayak","doi":"10.1016/j.cmpb.2025.109094","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objective: </strong>Multi-omics data are inherently high-dimensional, sparse, and noisy, posing significant challenges for clustering and integration. Conventional clustering and linear dimensionality reduction methods often fail to handle noise effectively or provide interpretability, while standard non-negative matrix factorization approaches are too shallow to capture non-linear patterns. Multi-view non-negative matrix factorization enables integration of complementary views, but it remains primarily unsupervised and seldom leverages available label information.</p><p><strong>Methods: </strong>We propose SSD-MO, a Semi-Supervised Deep Non-Negative Matrix Factorization model for Multi-Omics Data, designed to address these challenges by leveraging both labelled and unlabelled samples for enhanced data integration and clustering performance. SSD-MO combines semi-supervised learning with a multi-layer deep factorization framework, preserving local geometric structure and incorporating orthogonal and diversity constraints. Its effectiveness was validated on six multi-omics datasets from The Cancer Genome Atlas, using evaluation metrics such as clustering accuracy, normalized mutual information, and F-scores.</p><p><strong>Results: </strong>SSD-MO significantly improved clustering accuracy, achieving an increase in F-score by 9%-24% compared to unsupervised baselines and 7%-20% over semi-supervised benchmarks. Precision (64%-73%) and Recall (70%-88%) values further demonstrated its robust performance across datasets.</p><p><strong>Conclusion: </strong>This method provides a robust framework for multi-omics data integration and holds promise for applications in genomics and precision medicine.</p>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"273 ","pages":"109094"},"PeriodicalIF":4.8000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semi-supervised deep matrix factorization model for clustering multi-omics data.\",\"authors\":\"Khanh Luong, Nirav Joshi, Richi Nayak\",\"doi\":\"10.1016/j.cmpb.2025.109094\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background and objective: </strong>Multi-omics data are inherently high-dimensional, sparse, and noisy, posing significant challenges for clustering and integration. Conventional clustering and linear dimensionality reduction methods often fail to handle noise effectively or provide interpretability, while standard non-negative matrix factorization approaches are too shallow to capture non-linear patterns. Multi-view non-negative matrix factorization enables integration of complementary views, but it remains primarily unsupervised and seldom leverages available label information.</p><p><strong>Methods: </strong>We propose SSD-MO, a Semi-Supervised Deep Non-Negative Matrix Factorization model for Multi-Omics Data, designed to address these challenges by leveraging both labelled and unlabelled samples for enhanced data integration and clustering performance. SSD-MO combines semi-supervised learning with a multi-layer deep factorization framework, preserving local geometric structure and incorporating orthogonal and diversity constraints. Its effectiveness was validated on six multi-omics datasets from The Cancer Genome Atlas, using evaluation metrics such as clustering accuracy, normalized mutual information, and F-scores.</p><p><strong>Results: </strong>SSD-MO significantly improved clustering accuracy, achieving an increase in F-score by 9%-24% compared to unsupervised baselines and 7%-20% over semi-supervised benchmarks. Precision (64%-73%) and Recall (70%-88%) values further demonstrated its robust performance across datasets.</p><p><strong>Conclusion: </strong>This method provides a robust framework for multi-omics data integration and holds promise for applications in genomics and precision medicine.</p>\",\"PeriodicalId\":10624,\"journal\":{\"name\":\"Computer methods and programs in biomedicine\",\"volume\":\"273 \",\"pages\":\"109094\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer methods and programs in biomedicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1016/j.cmpb.2025.109094\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.cmpb.2025.109094","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Semi-supervised deep matrix factorization model for clustering multi-omics data.
Background and objective: Multi-omics data are inherently high-dimensional, sparse, and noisy, posing significant challenges for clustering and integration. Conventional clustering and linear dimensionality reduction methods often fail to handle noise effectively or provide interpretability, while standard non-negative matrix factorization approaches are too shallow to capture non-linear patterns. Multi-view non-negative matrix factorization enables integration of complementary views, but it remains primarily unsupervised and seldom leverages available label information.
Methods: We propose SSD-MO, a Semi-Supervised Deep Non-Negative Matrix Factorization model for Multi-Omics Data, designed to address these challenges by leveraging both labelled and unlabelled samples for enhanced data integration and clustering performance. SSD-MO combines semi-supervised learning with a multi-layer deep factorization framework, preserving local geometric structure and incorporating orthogonal and diversity constraints. Its effectiveness was validated on six multi-omics datasets from The Cancer Genome Atlas, using evaluation metrics such as clustering accuracy, normalized mutual information, and F-scores.
Results: SSD-MO significantly improved clustering accuracy, achieving an increase in F-score by 9%-24% compared to unsupervised baselines and 7%-20% over semi-supervised benchmarks. Precision (64%-73%) and Recall (70%-88%) values further demonstrated its robust performance across datasets.
Conclusion: This method provides a robust framework for multi-omics data integration and holds promise for applications in genomics and precision medicine.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.