用于多变量生物医学数据序列预测的 HMMs 集合

BioMedInformatics Pub Date : 2024-07-03 DOI:10.3390/biomedinformatics4030090

Richard Fechner, Jens Dörpinghaus, R. Rockenfeller, Jennifer Faber

{"title":"用于多变量生物医学数据序列预测的 HMMs 集合","authors":"Richard Fechner, Jens Dörpinghaus, R. Rockenfeller, Jennifer Faber","doi":"10.3390/biomedinformatics4030090","DOIUrl":null,"url":null,"abstract":"Background: Biomedical data are usually collections of longitudinal data assessed at certain points in time. Clinical observations assess the presences and severity of symptoms, which are the basis for the description and modeling of disease progression. Deciphering potential underlying unknowns from the distinct observation would substantially improve the understanding of pathological cascades. Hidden Markov Models (HMMs) have been successfully applied to the processing of possibly noisy continuous signals. We apply ensembles of HMMs to categorically distributed multivariate time series data, leaving space for expert domain knowledge in the prediction process. Methods: We use an ensemble of HMMs to predict the loss of free walking ability as one major clinical deterioration in the most common autosomal dominantly inherited ataxia disorder worldwide. Results: We present a prediction pipeline that processes data paired with a configuration file, enabling us to train, validate and query an ensemble of HMMs. In particular, we provide a theoretical and practical framework for multivariate time-series inference based on HMMs that includes constructing multiple HMMs, each to predict a particular observable variable. Our analysis is conducted on pseudo-data, but also on biomedical data based on Spinocerebellar ataxia type 3 disease. Conclusions: We find that the model shows promising results for the data we tested. The strength of this approach is that HMMs are well understood, probabilistic and interpretable models, setting it apart from most Deep Learning approaches. We publish all code and evaluation pseudo-data in an open-source repository.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":"47 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Ensemble of HMMs for Sequence Prediction on Multivariate Biomedical Data\",\"authors\":\"Richard Fechner, Jens Dörpinghaus, R. Rockenfeller, Jennifer Faber\",\"doi\":\"10.3390/biomedinformatics4030090\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Biomedical data are usually collections of longitudinal data assessed at certain points in time. Clinical observations assess the presences and severity of symptoms, which are the basis for the description and modeling of disease progression. Deciphering potential underlying unknowns from the distinct observation would substantially improve the understanding of pathological cascades. Hidden Markov Models (HMMs) have been successfully applied to the processing of possibly noisy continuous signals. We apply ensembles of HMMs to categorically distributed multivariate time series data, leaving space for expert domain knowledge in the prediction process. Methods: We use an ensemble of HMMs to predict the loss of free walking ability as one major clinical deterioration in the most common autosomal dominantly inherited ataxia disorder worldwide. Results: We present a prediction pipeline that processes data paired with a configuration file, enabling us to train, validate and query an ensemble of HMMs. In particular, we provide a theoretical and practical framework for multivariate time-series inference based on HMMs that includes constructing multiple HMMs, each to predict a particular observable variable. Our analysis is conducted on pseudo-data, but also on biomedical data based on Spinocerebellar ataxia type 3 disease. Conclusions: We find that the model shows promising results for the data we tested. The strength of this approach is that HMMs are well understood, probabilistic and interpretable models, setting it apart from most Deep Learning approaches. We publish all code and evaluation pseudo-data in an open-source repository.\",\"PeriodicalId\":72394,\"journal\":{\"name\":\"BioMedInformatics\",\"volume\":\"47 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BioMedInformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/biomedinformatics4030090\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BioMedInformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/biomedinformatics4030090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

背景：生物医学数据通常是在特定时间点进行评估的纵向数据集合。临床观察评估症状的存在和严重程度，这是描述和模拟疾病进展的基础。从不同的观察结果中解读潜在的潜在未知因素，将大大提高对病理级联的理解。隐马尔可夫模型（HMM）已成功应用于处理可能存在噪声的连续信号。我们将 HMM 集合应用于分类分布的多变量时间序列数据，在预测过程中为专家领域知识留出空间。方法：我们使用 HMMs 集合来预测自由行走能力的丧失，这是全球最常见的常染色体显性遗传共济失调疾病的一种主要临床恶化。结果我们介绍了一个预测管道，它可以处理与配置文件配对的数据，使我们能够训练、验证和查询 HMMs 集合。特别是，我们为基于 HMM 的多变量时间序列推断提供了一个理论和实践框架，其中包括构建多个 HMM，每个 HMM 预测一个特定的可观测变量。我们的分析不仅基于伪数据，还基于脊髓小脑共济失调 3 型疾病的生物医学数据。结论我们发现，该模型在我们测试的数据中显示出良好的结果。这种方法的优势在于，HMM 是广为人知的概率可解释模型，使其有别于大多数深度学习方法。我们在一个开源资源库中公布了所有代码和评估伪数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Ensemble of HMMs for Sequence Prediction on Multivariate Biomedical Data

Background: Biomedical data are usually collections of longitudinal data assessed at certain points in time. Clinical observations assess the presences and severity of symptoms, which are the basis for the description and modeling of disease progression. Deciphering potential underlying unknowns from the distinct observation would substantially improve the understanding of pathological cascades. Hidden Markov Models (HMMs) have been successfully applied to the processing of possibly noisy continuous signals. We apply ensembles of HMMs to categorically distributed multivariate time series data, leaving space for expert domain knowledge in the prediction process. Methods: We use an ensemble of HMMs to predict the loss of free walking ability as one major clinical deterioration in the most common autosomal dominantly inherited ataxia disorder worldwide. Results: We present a prediction pipeline that processes data paired with a configuration file, enabling us to train, validate and query an ensemble of HMMs. In particular, we provide a theoretical and practical framework for multivariate time-series inference based on HMMs that includes constructing multiple HMMs, each to predict a particular observable variable. Our analysis is conducted on pseudo-data, but also on biomedical data based on Spinocerebellar ataxia type 3 disease. Conclusions: We find that the model shows promising results for the data we tested. The strength of this approach is that HMMs are well understood, probabilistic and interpretable models, setting it apart from most Deep Learning approaches. We publish all code and evaluation pseudo-data in an open-source repository.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BioMedInformatics

CiteScore

1.70

自引率

0.00%

发文量