scRNA-seq数据分析中的固定函数参数值：生物学解释的潜在缺陷和改进。

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Frontiers in bioinformatics Pub Date : 2025-02-12 eCollection Date: 2025-01-01 DOI:10.3389/fbinf.2025.1519468

Mikhail Arbatsky, Ekaterina Vasilyeva, Veronika Sysoeva, Ekaterina Semina, Valeri Saveliev, Kseniya Rubina

{"title":"scRNA-seq数据分析中的固定函数参数值：生物学解释的潜在缺陷和改进。","authors":"Mikhail Arbatsky, Ekaterina Vasilyeva, Veronika Sysoeva, Ekaterina Semina, Valeri Saveliev, Kseniya Rubina","doi":"10.3389/fbinf.2025.1519468","DOIUrl":null,"url":null,"abstract":"Processing biological data is a challenge of paramount importance as the amount of accumulated data has been annually increasing along with the emergence of new methods for studying biological objects. Blind application of mathematical methods in biology may lead to erroneous hypotheses and conclusions. Here we narrow our focus down to a small set of mathematical methods applied upon standard processing of scRNA-seq data: preprocessing, dimensionality reduction, integration, and clustering (using machine learning methods for clustering). Normalization and scaling are standard manipulations for the pre-processing with LogNormalize (natural-log transformation), CLR (centered log ratio transformation), and RC (relative counts) being employed as methods for data transformation. The justification for applying these methods in biology is not discussed in methodological articles. The essential aspect of dimensionality reduction is to identify the stable patterns which are deliberately removed upon mathematical data processing as being redundant, albeit containing important minor details for biological interpretation. There are no established rules for integration of datasets obtained at different sampling times or conditions. Clustering calls for reconsidering its application specifically for biological data processing. The novelty of the present study lies in an integrated approach of biology and bioinformatics to elucidate biological insights upon data processing.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1519468"},"PeriodicalIF":2.8000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11861183/pdf/","citationCount":"0","resultStr":"{\"title\":\"Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation.\",\"authors\":\"Mikhail Arbatsky, Ekaterina Vasilyeva, Veronika Sysoeva, Ekaterina Semina, Valeri Saveliev, Kseniya Rubina\",\"doi\":\"10.3389/fbinf.2025.1519468\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Processing biological data is a challenge of paramount importance as the amount of accumulated data has been annually increasing along with the emergence of new methods for studying biological objects. Blind application of mathematical methods in biology may lead to erroneous hypotheses and conclusions. Here we narrow our focus down to a small set of mathematical methods applied upon standard processing of scRNA-seq data: preprocessing, dimensionality reduction, integration, and clustering (using machine learning methods for clustering). Normalization and scaling are standard manipulations for the pre-processing with LogNormalize (natural-log transformation), CLR (centered log ratio transformation), and RC (relative counts) being employed as methods for data transformation. The justification for applying these methods in biology is not discussed in methodological articles. The essential aspect of dimensionality reduction is to identify the stable patterns which are deliberately removed upon mathematical data processing as being redundant, albeit containing important minor details for biological interpretation. There are no established rules for integration of datasets obtained at different sampling times or conditions. Clustering calls for reconsidering its application specifically for biological data processing. The novelty of the present study lies in an integrated approach of biology and bioinformatics to elucidate biological insights upon data processing.\",\"PeriodicalId\":73066,\"journal\":{\"name\":\"Frontiers in bioinformatics\",\"volume\":\"5 \",\"pages\":\"1519468\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-02-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11861183/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fbinf.2025.1519468\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fbinf.2025.1519468","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

处理生物数据是一个至关重要的挑战，因为随着研究生物对象的新方法的出现，积累的数据量每年都在增加。在生物学中盲目应用数学方法可能导致错误的假设和结论。在这里，我们将重点缩小到应用于scRNA-seq数据标准处理的一小组数学方法：预处理、降维、集成和聚类（使用机器学习方法进行聚类）。规范化和缩放是预处理的标准操作，使用LogNormalize（自然对数转换）、CLR（中心对数比转换）和RC（相对计数）作为数据转换的方法。在方法学文章中没有讨论在生物学中应用这些方法的理由。降维的基本方面是识别稳定的模式，这些模式在数学数据处理中被故意删除为冗余，尽管包含重要的次要细节用于生物学解释。对于在不同采样时间或条件下获得的数据集的整合，没有既定的规则。聚类需要重新考虑它在生物数据处理中的应用。本研究的新颖之处在于采用生物学和生物信息学的综合方法来阐明数据处理的生物学见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation.

Processing biological data is a challenge of paramount importance as the amount of accumulated data has been annually increasing along with the emergence of new methods for studying biological objects. Blind application of mathematical methods in biology may lead to erroneous hypotheses and conclusions. Here we narrow our focus down to a small set of mathematical methods applied upon standard processing of scRNA-seq data: preprocessing, dimensionality reduction, integration, and clustering (using machine learning methods for clustering). Normalization and scaling are standard manipulations for the pre-processing with LogNormalize (natural-log transformation), CLR (centered log ratio transformation), and RC (relative counts) being employed as methods for data transformation. The justification for applying these methods in biology is not discussed in methodological articles. The essential aspect of dimensionality reduction is to identify the stable patterns which are deliberately removed upon mathematical data processing as being redundant, albeit containing important minor details for biological interpretation. There are no established rules for integration of datasets obtained at different sampling times or conditions. Clustering calls for reconsidering its application specifically for biological data processing. The novelty of the present study lies in an integrated approach of biology and bioinformatics to elucidate biological insights upon data processing.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in bioinformatics

CiteScore

2.60

自引率

0.00%

发文量