ArXiv最新文献

筛选
英文 中文
HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data. HEIST:空间转录组学和蛋白质组学数据的图形基础模型。
ArXiv Pub Date : 2025-09-25
Hiren Madhu, João Felipe Rocha, Tinglin Huang, Siddharth Viswanath, Smita Krishnaswamy, Rex Ying
{"title":"HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data.","authors":"Hiren Madhu, João Felipe Rocha, Tinglin Huang, Siddharth Viswanath, Smita Krishnaswamy, Rex Ying","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Single-cell transcriptomics and proteomics have become a great source for data-driven insights into biology, enabling the use of advanced deep learning methods to understand cellular heterogeneity and gene expression at the single-cell level. With the advent of spatial-omics data, we have the promise of characterizing cells within their tissue context as it provides both spatial coordinates and intra-cellular transcriptional or protein counts. Proteomics offers a complementary view by directly measuring proteins, which are the primary effectors of cellular function and key therapeutic targets. However, existing models either ignore the spatial information or the complex genetic and proteomic programs within cells. Thus they cannot infer how cell internal regulation adapts to microenvironmental cues. Furthermore, these models often utilize fixed gene vocabularies, hindering their generalizability unseen genes. In this paper, we introduce HEIST, a hierarchical graph transformer foundation model for spatial transcriptomics and proteomics. HEIST models tissues as hierarchical graphs. The higher level graph is a spatial cell graph, and each cell in turn, is represented by its lower level gene co-expression network graph. HEIST achieves this by performing both intra-level and cross-level message passing to utilize the hierarchy in its embeddings and can thus generalize to novel datatypes including spatial proteomics without retraining. HEIST is pretrained on 22.3M cells from 124 tissues across 15 organs using spatially-aware contrastive and masked autoencoding objectives. Unsupervised analysis of HEIST embeddings reveals spatially informed subpopulations missed by prior models. Downstream evaluations demonstrate generalizability to proteomics data and state-of-the-art performance in clinical outcome prediction, cell type annotation, and gene imputation across multiple technologies.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12486056/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145214616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal AI predicts clinical outcomes of drug combinations from preclinical data. 多模式人工智能根据临床前数据预测药物组合的临床结果。
ArXiv Pub Date : 2025-09-24
Yepeng Huang, Xiaorui Su, Varun Ullanat, Intae Moon, Ivy Liang, Lindsay Clegg, Damilola Olabode, Ruthie Johnson, Nicholas Ho, Megan Gibbs, Megan Gibbs, Alexander Gusev, Bino John, Marinka Zitnik
{"title":"Multimodal AI predicts clinical outcomes of drug combinations from preclinical data.","authors":"Yepeng Huang, Xiaorui Su, Varun Ullanat, Intae Moon, Ivy Liang, Lindsay Clegg, Damilola Olabode, Ruthie Johnson, Nicholas Ho, Megan Gibbs, Megan Gibbs, Alexander Gusev, Bino John, Marinka Zitnik","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Predicting clinical outcomes from preclinical data is essential for identifying safe and effective drug combinations, reducing late-stage clinical failures, and accelerating the development of precision therapies. Current AI models rely on structural or target-based features but fail to incorporate the multimodal data necessary for accurate, clinically relevant predictions. Here, we introduce Madrigal, a multimodal AI model that learns from structural, pathway, cell viability, and transcriptomic data to predict drug-combination effects across 953 clinical outcomes and 21,842 compounds, including combinations of approved drugs and novel compounds in development. Madrigal uses an attention bottleneck module to unify preclinical drug data modalities while handling missing data during training and inference, a major challenge in multimodal learning. It outperforms single-modality methods and state-of-the-art models in predicting adverse drug interactions, and ablations show both modality alignment and multimodality are necessary. It captures transporter-mediated interactions and aligns with head-to-head clinical trial differences for neutropenia, anemia, alopecia, and hypoglycemia. In type 2 diabetes and MASH, Madrigal supports polypharmacy decisions and prioritizes resmetirom among safer candidates. Extending to personalization, Madrigal improves patient-level adverse-event prediction in a longitudinal EHR cohort and an independent oncology cohort, and predicts ex vivo efficacy in primary acute myeloid leukemia samples and patient-derived xenograft models. Madrigal links preclinical multimodal readouts to safety risks of drug combinations and offers a generalizable foundation for safer combination design.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11908363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143652591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ising dynamics on multilayer networks with heterogeneous layers. 具有异构层的多层网络的Ising动力学。
ArXiv Pub Date : 2025-09-24
Suman S Kulkarni, Christopher W Lynn, Mason A Porter, Dani S Bassett
{"title":"Ising dynamics on multilayer networks with heterogeneous layers.","authors":"Suman S Kulkarni, Christopher W Lynn, Mason A Porter, Dani S Bassett","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Multilayer networks provide a framework to study complex systems with multiple types of interactions, multiple dynamical processes, and/or multiple subsystems. When studying a dynamical process on a multilayer network, it is important to consider how both layer structure and heterogeneity across layers impacts the overall dynamics. As a concrete example, we study Ising dynamics on multilayer networks and investigate how network structure affects its qualitative features. We focus primarily on multiplex networks, which are multilayer networks in which interlayer edges occur only between manifestations of the same entity on different layers, although we also consider one empirical example with a more general multilayer structure. We use numerical simulations and a mean-field approximation to examine the steady-state behavior of the Ising dynamics as a function of temperature (which is a key model parameter) for a variety of two-layer multilayer networks from both models and empirical data. We examine both the steady-state behavior and a metastable state in which the two layers are anti-aligned, and we explore the effects of interlayer coupling strength and structural heterogeneity. In synthetic multilayer networks with core--periphery structure, we show that interlayer edges that involve peripheral nodes can exert more influence than interlayer edges that involve only core nodes. Finally, we consider empirical multilayer networks from biological and social systems. Our work illustrates how heterogeneity across the layers of a multilayer network influences dynamics on the whole network.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12486055/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145214668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assay2Mol: large language model-based drug design using BioAssay context. Assay2Mol:使用BioAssay上下文进行基于大语言模型的药物设计。
ArXiv Pub Date : 2025-09-24
Yifan Deng, Spencer S Ericksen, Anthony Gitter
{"title":"Assay2Mol: large language model-based drug design using BioAssay context.","authors":"Yifan Deng, Spencer S Ericksen, Anthony Gitter","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Scientific databases aggregate vast amounts of quantitative data alongside descriptive text. In biochemistry, molecule screening assays evaluate candidate molecules' functional responses against disease targets. Unstructured text that describes the biological mechanisms through which these targets operate, experimental screening protocols, and other attributes of assays offer rich information for drug discovery campaigns but has been untapped because of that unstructured format. We present Assay2Mol, a large language model-based workflow that can capitalize on the vast existing biochemical screening assays for early-stage drug discovery. Assay2Mol retrieves existing assay records involving targets similar to the new target and generates candidate molecules using in-context learning with the retrieved assay screening data. Assay2Mol outperforms recent machine learning approaches that generate candidate ligand molecules for target protein structures, while also promoting more synthesizable molecule generation.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288650/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144710259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPU-accelerated FREDopt package for simultaneous dose and LETd proton radiotherapy plan optimization via superiorization methods. gpu加速FREDopt包同步剂量和LETd质子放疗方案优化的优越性方法。
ArXiv Pub Date : 2025-09-24
Damian Borys, Jan Gajewski, Tobias Becher, Yair Censor, Renata Kopeć, Marzena Rydygier, Angelo Schiavi, Tomasz Skóra, Anna Spaleniak, Niklas Wahl, Agnieszka Wochnik, Antoni Ruciński
{"title":"GPU-accelerated FREDopt package for simultaneous dose and LETd proton radiotherapy plan optimization via superiorization methods.","authors":"Damian Borys, Jan Gajewski, Tobias Becher, Yair Censor, Renata Kopeć, Marzena Rydygier, Angelo Schiavi, Tomasz Skóra, Anna Spaleniak, Niklas Wahl, Agnieszka Wochnik, Antoni Ruciński","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This study presents FREDopt, a newly developed GPU-accelerated open-source optimization software for simultaneous proton dose and dose-averaged LET (LETd) optimization in IMPT treatment planning. FREDopt was implemented entirely in Python, leveraging CuPy for GPU acceleration and incorporating fast Monte Carlo (MC) simulations from the FRED code. The treatment plan optimization workflow includes pre-optimization and optimization, the latter equipped with a novel superiorization of feasibility-seeking algorithms. Feasibility-seeking requires finding a point that satisfies prescribed constraints. Superiorization interlaces computational perturbations into iterative feasibility-seeking steps to steer them toward a superior feasible point, replacing the need for costly full-fledged constrained optimization. The method was validated on two treatment plans of patients treated in a clinical proton therapy center, with dose and LETd distributions compared before and after reoptimization. Simultaneous dose and LETd optimization using FREDopt led to a substantial reduction of LETd and (dose)x(LETd) in organs at risk (OARs) while preserving target dose conformity. Computational performance evaluation showed execution times of 14-50 minutes, depending on the algorithm and target volume size-satisfactory for clinical and research applications while enabling further development of the well-tested, documented open-source software.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12486063/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145214647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Suppression of errors in collectively coded information. 集体编码信息中错误的抑制。
ArXiv Pub Date : 2025-09-24
Martin J Falk, Leon Zhou, Yoshiya J Matsubara, Kabir Husain, Jack W Szostak, Arvind Murugan
{"title":"Suppression of errors in collectively coded information.","authors":"Martin J Falk, Leon Zhou, Yoshiya J Matsubara, Kabir Husain, Jack W Szostak, Arvind Murugan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Modern life largely transmits genetic information from mother to daughter through the duplication of single physically intact molecules that encode information. However, copying an extended molecule requires complex copying machinery and high fidelity that scales with the genome size to avoid the error catastrophe. Here, we explore these fidelity requirements in an alternative architecture, the virtual circular genome, in which no one physical molecule encodes the full genetic information. Instead, information is encoded and transmitted in a collective of overlapping and interacting segments. Using a model experimental system of a complex mixture of DNA oligomers that can partly anneal and extend off each other, we find that mutant oligomers are suppressed relative to a model without collective encoding. Through simulations and theory, we show that this suppression of mutants can be explained by competition for productive binding partners. As a consequence, information can be propagated robustly in a virtual circular genome even at mutation rates expected under prebiotic conditions.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12407629/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145002157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A decentralized future for the open-science databases. 开放科学数据库的去中心化未来。
ArXiv Pub Date : 2025-09-23
Gaurav Sharma, Viorel Munteanu, Nika Mansouri Ghiasi, Jineta Banerjee, Susheel Varma, Luca Foschini, Kyle Ellrott, Onur Mutlu, Dumitru Ciorbă, Roel A Ophoff, Viorel Bostan, Christopher E Mason, Jason H Moore, Despoina Sousoni, Arunkumar Krishnan, Christopher E Mason, Mihai Dimian, Gustavo Stolovitzky, Fabio G Liberante, Taras K Oleksyk, Serghei Mangul
{"title":"A decentralized future for the open-science databases.","authors":"Gaurav Sharma, Viorel Munteanu, Nika Mansouri Ghiasi, Jineta Banerjee, Susheel Varma, Luca Foschini, Kyle Ellrott, Onur Mutlu, Dumitru Ciorbă, Roel A Ophoff, Viorel Bostan, Christopher E Mason, Jason H Moore, Despoina Sousoni, Arunkumar Krishnan, Christopher E Mason, Mihai Dimian, Gustavo Stolovitzky, Fabio G Liberante, Taras K Oleksyk, Serghei Mangul","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Continuous and reliable access to curated biological data repositories is indispensable for accelerating rigorous scientific inquiry and fostering reproducible research. Centralized repositories, though widely used, are vulnerable to single points of failure arising from cyberattacks, technical faults, natural disasters, or funding and political uncertainties. This can lead to widespread data unavailability, data loss, integrity compromises, and substantial delays in critical research, ultimately impeding scientific progress. Centralizing essential scientific resources in a single geopolitical or institutional hub is inherently dangerous, as any disruption can paralyze diverse ongoing research. The rapid acceleration of data generation, combined with an increasingly volatile global landscape, necessitates a critical re-evaluation of the sustainability of centralized models. Implementing federated and decentralized architectures presents a compelling and future-oriented pathway to substantially strengthen the resilience of scientific data infrastructures, thereby mitigating vulnerabilities and ensuring the long-term integrity of data. Here, we examine the structural limitations of centralized repositories, evaluate federated and decentralized models, and propose a hybrid framework for resilient, FAIR, and sustainable scientific data stewardship. Such an approach offers a significant reduction in exposure to governance instability, infrastructural fragility, and funding volatility, and also fosters fairness and global accessibility. The future of open science depends on integrating these complementary approaches to establish a globally distributed, economically sustainable, and institutionally robust infrastructure that safeguards scientific data as a public good, further ensuring continued accessibility, interoperability, and preservation for generations to come.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12486051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145214650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BRAID: Input-Driven Nonlinear Dynamical Modeling of Neural-Behavioral Data. 神经行为数据的输入驱动非线性动态建模。
ArXiv Pub Date : 2025-09-23
Parsa Vahidi, Omid G Sani, Maryam M Shanechi
{"title":"BRAID: Input-Driven Nonlinear Dynamical Modeling of Neural-Behavioral Data.","authors":"Parsa Vahidi, Omid G Sani, Maryam M Shanechi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Neural populations exhibit complex recurrent structures that drive behavior, while continuously receiving and integrating external inputs from sensory stimuli, upstream regions, and neurostimulation. However, neural populations are often modeled as autonomous dynamical systems, with little consideration given to the influence of external inputs that shape the population activity and behavioral outcomes. Here, we introduce BRAID, a deep learning framework that models nonlinear neural dynamics underlying behavior while explicitly incorporating any measured external inputs. Our method disentangles intrinsic recurrent neural population dynamics from the effects of inputs by including a forecasting objective within input-driven recurrent neural networks. BRAID further prioritizes the learning of intrinsic dynamics that are related to a behavior of interest by using a multi-stage optimization scheme. We validate BRAID with nonlinear simulations, showing that it can accurately learn the intrinsic dynamics shared between neural and behavioral modalities. We then apply BRAID to motor cortical activity recorded during a motor task and demonstrate that our method more accurately fits the neural-behavioral data by incorporating measured sensory stimuli into the model and improves the forecasting of neural-behavioral data compared with various baseline methods, whether input-driven or not.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12486053/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145214597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamical Modeling of Behaviorally Relevant Spatiotemporal Patterns in Neural Imaging Data. 神经成像数据中行为相关时空模式的动态建模。
ArXiv Pub Date : 2025-09-23
Mohammad Hosseini, Maryam M Shanechi
{"title":"Dynamical Modeling of Behaviorally Relevant Spatiotemporal Patterns in Neural Imaging Data.","authors":"Mohammad Hosseini, Maryam M Shanechi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>High-dimensional imaging of neural activity, such as widefield calcium and functional ultrasound imaging, provide a rich source of information for understanding the relationship between brain activity and behavior. Accurately modeling neural dynamics in these modalities is crucial for understanding this relationship but is hindered by the high-dimensionality, complex spatiotemporal dependencies, and prevalent behaviorally irrelevant dynamics in these modalities. Existing dynamical models often employ preprocessing steps to obtain low-dimensional representations from neural image modalities. However, this process can discard behaviorally relevant information and miss spatiotemporal structure. We propose SBIND, a novel data-driven deep learning framework to model spatiotemporal dependencies in neural images and disentangle their behaviorally relevant dynamics from other neural dynamics. We validate SBIND on widefield imaging datasets, and show its extension to functional ultrasound imaging, a recent modality whose dynamical modeling has largely remained unexplored. We find that our model effectively identifies both local and long-range spatial dependencies across the brain while also dissociating behaviorally relevant neural dynamics. Doing so, SBIND outperforms existing models in neural-behavioral prediction. Overall, SBIND provides a versatile tool for investigating the neural mechanisms underlying behavior using imaging modalities.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12486059/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145214600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Probabilistic Geometric Principal Component Analysis with application to neural data. 概率几何主成分分析及其在神经数据中的应用。
ArXiv Pub Date : 2025-09-22
Han-Lin Hsieh, Maryam M Shanechi
{"title":"Probabilistic Geometric Principal Component Analysis with application to neural data.","authors":"Han-Lin Hsieh, Maryam M Shanechi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Dimensionality reduction is critical across various domains of science including neuroscience. Probabilistic Principal Component Analysis (PPCA) is a prominent dimensionality reduction method that provides a probabilistic approach unlike the deterministic approach of PCA and serves as a connection between PCA and Factor Analysis (FA). Despite their power, PPCA and its extensions are mainly based on linear models and can only describe the data in a Euclidean coordinate system. However, in many neuroscience applications, data may be distributed around a nonlinear geometry (i.e., manifold) rather than lying in the Euclidean space. We develop Probabilistic Geometric Principal Component Analysis (PGPCA) for such datasets as a new dimensionality reduction algorithm that can explicitly incorporate knowledge about a given nonlinear manifold that is first fitted from these data. Further, we show how in addition to the Euclidean coordinate system, a geometric coordinate system can be derived for the manifold to capture the deviations of data from the manifold and noise. We also derive a data-driven EM algorithm for learning the PGPCA model parameters. As such, PGPCA generalizes PPCA to better describe data distributions by incorporating a nonlinear manifold geometry. In simulations and brain data analyses, we show that PGPCA can effectively model the data distribution around various given manifolds and outperforms PPCA for such data. Moreover, PGPCA provides the capability to test whether the new geometric coordinate system better describes the data than the Euclidean one. Finally, PGPCA can perform dimensionality reduction and learn the data distribution both around and on the manifold. These capabilities make PGPCA valuable for enhancing the efficacy of dimensionality reduction for analysis of high-dimensional data that exhibit noise and are distributed around a nonlinear manifold.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12486060/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145214614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信