Heterogeneity-Preserving Discriminative Feature Selection for Subtype Discovery.

IF 5.4 2区 医学 Q1 CHEMISTRY, MEDICINAL
Abdur Rahman M A Basher, Caleb Hallinan, Kwonmoo Lee
{"title":"Heterogeneity-Preserving Discriminative Feature Selection for Subtype Discovery.","authors":"Abdur Rahman M A Basher, Caleb Hallinan, Kwonmoo Lee","doi":"10.1101/2023.05.14.540686","DOIUrl":null,"url":null,"abstract":"<p><p>The discovery of subtypes is pivotal for disease diagnosis and targeted therapy, considering the diverse responses of different cells or patients to specific treatments. Exploring the heterogeneity within disease or cell states provides insights into disease progression mechanisms and cell differentiation. The advent of high-throughput technologies has enabled the generation and analysis of various molecular data types, such as single-cell RNA-seq, proteomic, and imaging datasets, at large scales. While presenting opportunities for subtype discovery, these datasets pose challenges in finding relevant signatures due to their high dimensionality. Feature selection, a crucial step in the analysis pipeline, involves choosing signatures that reduce the feature size for more efficient downstream computational analysis. Numerous existing methods focus on selecting signatures that differentiate known diseases or cell states, yet they often fall short in identifying features that preserve heterogeneity and reveal subtypes. To identify features that can capture the diversity within each class while also maintaining the discrimination of known disease states, we employed deep metric learning-based feature embedding to conduct a detailed exploration of the statistical properties of features essential in preserving heterogeneity. Our analysis revealed that features with a significant difference in interquartile range (IQR) between classes possess crucial subtype information. Guided by this insight, we developed a robust statistical method, termed PHet (Preserving Heterogeneity) that performs iterative subsampling differential analysis of IQR and Fisher's method between classes, identifying a minimal set of heterogeneity-preserving discriminative features to optimize subtype clustering quality. Validation using public single-cell RNA-seq and microarray datasets showcased PHet's effectiveness in preserving sample heterogeneity while maintaining discrimination of known disease/cell states, surpassing the performance of previous outlier-based methods. Furthermore, analysis of a single-cell RNA-seq dataset from mouse tracheal epithelial cells revealed, through PHet-based features, the presence of two distinct basal cell subtypes undergoing differentiation toward a luminal secretory phenotype. Notably, one of these subtypes exhibited high expression of BPIFA1. Interestingly, previous studies have linked BPIFA1 secretion to the emergence of secretory cells during mucociliary differentiation of airway epithelial cells. PHet successfully pinpointed the basal cell subtype associated with this phenomenon, a distinction that pre-annotated markers and dispersion-based features failed to make due to their admixed feature expression profiles. These findings underscore the potential of our method to deepen our understanding of the mechanisms underlying diseases and cell differentiation and contribute significantly to personalized medicine.</p>","PeriodicalId":12314,"journal":{"name":"Expert Opinion on Therapeutic Patents","volume":"6 1","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10769187/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Opinion on Therapeutic Patents","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.05.14.540686","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

The discovery of subtypes is pivotal for disease diagnosis and targeted therapy, considering the diverse responses of different cells or patients to specific treatments. Exploring the heterogeneity within disease or cell states provides insights into disease progression mechanisms and cell differentiation. The advent of high-throughput technologies has enabled the generation and analysis of various molecular data types, such as single-cell RNA-seq, proteomic, and imaging datasets, at large scales. While presenting opportunities for subtype discovery, these datasets pose challenges in finding relevant signatures due to their high dimensionality. Feature selection, a crucial step in the analysis pipeline, involves choosing signatures that reduce the feature size for more efficient downstream computational analysis. Numerous existing methods focus on selecting signatures that differentiate known diseases or cell states, yet they often fall short in identifying features that preserve heterogeneity and reveal subtypes. To identify features that can capture the diversity within each class while also maintaining the discrimination of known disease states, we employed deep metric learning-based feature embedding to conduct a detailed exploration of the statistical properties of features essential in preserving heterogeneity. Our analysis revealed that features with a significant difference in interquartile range (IQR) between classes possess crucial subtype information. Guided by this insight, we developed a robust statistical method, termed PHet (Preserving Heterogeneity) that performs iterative subsampling differential analysis of IQR and Fisher's method between classes, identifying a minimal set of heterogeneity-preserving discriminative features to optimize subtype clustering quality. Validation using public single-cell RNA-seq and microarray datasets showcased PHet's effectiveness in preserving sample heterogeneity while maintaining discrimination of known disease/cell states, surpassing the performance of previous outlier-based methods. Furthermore, analysis of a single-cell RNA-seq dataset from mouse tracheal epithelial cells revealed, through PHet-based features, the presence of two distinct basal cell subtypes undergoing differentiation toward a luminal secretory phenotype. Notably, one of these subtypes exhibited high expression of BPIFA1. Interestingly, previous studies have linked BPIFA1 secretion to the emergence of secretory cells during mucociliary differentiation of airway epithelial cells. PHet successfully pinpointed the basal cell subtype associated with this phenomenon, a distinction that pre-annotated markers and dispersion-based features failed to make due to their admixed feature expression profiles. These findings underscore the potential of our method to deepen our understanding of the mechanisms underlying diseases and cell differentiation and contribute significantly to personalized medicine.

用于发现亚型的异质性保护判别特征选择。
考虑到不同细胞或患者对特定治疗的不同反应,亚型的发现对于疾病诊断和靶向治疗至关重要。通过探索疾病或细胞状态的异质性,可以深入了解疾病进展机制和细胞分化情况。高通量技术的出现使得各种分子数据类型(如单细胞 RNA-seq、蛋白质组和成像数据集)得以大规模生成和分析。这些数据集在为亚型发现提供机会的同时,也因其高维度而为寻找相关特征带来了挑战。特征选择是分析流水线中的关键步骤,包括选择能缩小特征大小的特征,以便更有效地进行下游计算分析。现有的许多方法都侧重于选择能区分已知疾病或细胞状态的特征,但这些方法往往无法识别能保持异质性和揭示亚型的特征。为了找出既能捕捉每个类别内的多样性,又能保持对已知疾病状态的区分度的特征,我们采用了基于深度度量学习的特征嵌入,对保持异质性所必需的特征的统计特性进行了详细探索。我们的分析表明,类间四分位数范围(IQR)差异显著的特征具有关键的亚型信息。在这一洞察力的指导下,我们开发了一种稳健的统计方法,称为 PHet(Preserving Heterogeneity,保留异质性),该方法对类别间的 IQR 和费雪法进行迭代子取样差异分析,识别出最小的一组保留异质性的判别特征,以优化亚型聚类质量。利用公开的单细胞 RNA-seq 和微阵列数据集进行的验证表明,PHet 在保持样本异质性的同时,还能有效区分已知的疾病/细胞状态,其性能超过了以前基于离群值的方法。此外,对小鼠气管上皮细胞单细胞RNA-seq数据集的分析显示,通过基于PHet的特征,有两种不同的基底细胞亚型正在向腔分泌表型分化。值得注意的是,其中一种亚型表现出 BPIFA1 的高表达。有趣的是,之前的研究已将 BPIFA1 的分泌与气道上皮细胞粘膜分化过程中分泌细胞的出现联系起来。PHet 成功地确定了与这一现象相关的基底细胞亚型,而预先标注的标记物和基于离散度的特征由于其混合特征表达谱而无法进行区分。这些发现强调了我们的方法在加深我们对疾病和细胞分化内在机制的理解方面的潜力,并为个性化医疗做出了重大贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
12.10
自引率
1.50%
发文量
50
审稿时长
6-12 weeks
期刊介绍: Expert Opinion on Therapeutic Patents (ISSN 1354-3776 [print], 1744-7674 [electronic]) is a MEDLINE-indexed, peer-reviewed, international journal publishing review articles on recent pharmaceutical patent claims, providing expert opinion the scope for future development, in the context of the scientific literature. The Editors welcome: Reviews covering recent patent claims on compounds or applications with therapeutic potential, including biotherapeutics and small-molecule agents with specific molecular targets; and patenting trends in a particular therapeutic area Patent Evaluations examining the aims and chemical and biological claims of individual patents Perspectives on issues relating to intellectual property The audience consists of scientists, managers and decision-makers in the pharmaceutical industry and others closely involved in R&D Sample our Bioscience journals, sign in here to start your access, Latest two full volumes FREE to you for 14 days.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信