数字健康中基于变压器模型的行为数据分类

C. Siebra, Igor Matias, K. Wac
{"title":"数字健康中基于变压器模型的行为数据分类","authors":"C. Siebra, Igor Matias, K. Wac","doi":"10.1109/BHI56158.2022.9926938","DOIUrl":null,"url":null,"abstract":"Transformers are recent deep learning (DL) models used to capture the dependence between parts of sequential data. While their potential was already demonstrated in the natural language processing (NLP) domain, emerging research shows transformers can also be an adequate modeling approach to relate longitudinal multi-featured continuous behavioral data to future health outcomes. As transformers-based predictions are based on a domain lexicon, the use of categories, commonly used in specialized areas to cluster values, is the likely way to compose lexica. However, the number of categories may influence the transformer prediction accuracy, mainly when the categorization process creates imbalanced datasets, or the search space is very restricted to generate optimal feasible solutions. This paper analyzes the relationship between models' accuracy and the sparsity of behavioral data categories that compose the lexicon. This analysis relies on a case example that uses mQoL-Transformer to model the influence of physical activity behavior on sleep health. Results show that the number of categories shall be treated as a further transformer's hyperparameter, which can balance the literature-based categorization and optimization aspects. Thus, DL processes could also obtain similar accuracies compared to traditional approaches, such as long short-term memory, when used to process short behavioral data sequences.","PeriodicalId":347210,"journal":{"name":"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Behavioral Data Categorization for Transformers-based Models in Digital Health\",\"authors\":\"C. Siebra, Igor Matias, K. Wac\",\"doi\":\"10.1109/BHI56158.2022.9926938\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transformers are recent deep learning (DL) models used to capture the dependence between parts of sequential data. While their potential was already demonstrated in the natural language processing (NLP) domain, emerging research shows transformers can also be an adequate modeling approach to relate longitudinal multi-featured continuous behavioral data to future health outcomes. As transformers-based predictions are based on a domain lexicon, the use of categories, commonly used in specialized areas to cluster values, is the likely way to compose lexica. However, the number of categories may influence the transformer prediction accuracy, mainly when the categorization process creates imbalanced datasets, or the search space is very restricted to generate optimal feasible solutions. This paper analyzes the relationship between models' accuracy and the sparsity of behavioral data categories that compose the lexicon. This analysis relies on a case example that uses mQoL-Transformer to model the influence of physical activity behavior on sleep health. Results show that the number of categories shall be treated as a further transformer's hyperparameter, which can balance the literature-based categorization and optimization aspects. Thus, DL processes could also obtain similar accuracies compared to traditional approaches, such as long short-term memory, when used to process short behavioral data sequences.\",\"PeriodicalId\":347210,\"journal\":{\"name\":\"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BHI56158.2022.9926938\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BHI56158.2022.9926938","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

变形器是最近的深度学习(DL)模型,用于捕获序列数据各部分之间的依赖性。虽然它们的潜力已经在自然语言处理(NLP)领域得到了证明,但新兴研究表明,变压器也可以作为一种适当的建模方法,将纵向多特征连续行为数据与未来的健康结果联系起来。由于基于转换器的预测是基于领域词典的,因此使用类别(通常用于专门领域对值进行聚类)是组成词典的可能方法。然而,类别的数量可能会影响变压器的预测精度,主要是当分类过程产生不平衡的数据集,或者搜索空间非常有限,无法产生最优可行解时。本文分析了模型的准确性与构成词典的行为数据类别的稀疏度之间的关系。此分析依赖于使用mQoL-Transformer对身体活动行为对睡眠健康的影响进行建模的案例示例。结果表明,类别数量应被视为进一步的变压器的超参数,它可以平衡基于文献的分类和优化方面。因此,与传统方法(如长短期记忆)相比,深度学习过程在处理短行为数据序列时也可以获得相似的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Behavioral Data Categorization for Transformers-based Models in Digital Health
Transformers are recent deep learning (DL) models used to capture the dependence between parts of sequential data. While their potential was already demonstrated in the natural language processing (NLP) domain, emerging research shows transformers can also be an adequate modeling approach to relate longitudinal multi-featured continuous behavioral data to future health outcomes. As transformers-based predictions are based on a domain lexicon, the use of categories, commonly used in specialized areas to cluster values, is the likely way to compose lexica. However, the number of categories may influence the transformer prediction accuracy, mainly when the categorization process creates imbalanced datasets, or the search space is very restricted to generate optimal feasible solutions. This paper analyzes the relationship between models' accuracy and the sparsity of behavioral data categories that compose the lexicon. This analysis relies on a case example that uses mQoL-Transformer to model the influence of physical activity behavior on sleep health. Results show that the number of categories shall be treated as a further transformer's hyperparameter, which can balance the literature-based categorization and optimization aspects. Thus, DL processes could also obtain similar accuracies compared to traditional approaches, such as long short-term memory, when used to process short behavioral data sequences.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信