使用多任务学习从消费者健康论坛中提取句子、实体和关键短语。

IF 2 3区工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Journal of Biomedical Semantics Pub Date : 2025-05-06 DOI:10.1186/s13326-025-00329-2

Tsaqif Naufal, Rahmad Mahendra, Alfan Farizki Wicaksono

{"title":"使用多任务学习从消费者健康论坛中提取句子、实体和关键短语。","authors":"Tsaqif Naufal, Rahmad Mahendra, Alfan Farizki Wicaksono","doi":"10.1186/s13326-025-00329-2","DOIUrl":null,"url":null,"abstract":"Purpose: Online consumer health forums offer an alternative source of health-related information for internet users seeking specific details that may not be readily available through articles or other one-way communication channels. However, the effectiveness of these forums can be constrained by the limited number of healthcare professionals actively participating, which can impact response times to user inquiries. One potential solution to this issue is the integration of a semi-automatic system. A critical component of such a system is question processing, which often involves sentence recognition (SR), medical entity recognition (MER), and keyphrase extraction (KE) modules. We posit that the development of these three modules would enable the system to identify critical components of the question, thereby facilitating a deeper understanding of the question, and allowing for the re-formulation of more effective questions with extracted key information.Methods: This work contributes to two key aspects related to these three tasks. First, we expand and publicly release an Indonesian dataset for each task. Second, we establish a baseline for all three tasks within the Indonesian language domain by employing transformer-based models with nine distinct encoder variations. Our feature studies revealed an interdependence among these three tasks. Consequently, we propose several multi-task learning (MTL) models, both in pairwise and three-way configurations, incorporating parallel and hierarchical architectures.Results: Using F1-score at the chunk level, the inter-annotator agreements for SR, MER, and KE tasks were <math><mrow><mn>88.61</mn> <mo>%</mo> <mo>,</mo> <mn>64.83</mn> <mo>%</mo></mrow> </math> , and <math><mrow><mn>35.01</mn> <mo>%</mo></mrow> </math> respectively. In single-task learning (STL) settings, the best performance for each task was achieved by different model, with <math><msub><mtext>IndoNLU</mtext> <mtext>LARGE</mtext></msub> </math> obtained the highest average score. These results suggested that a larger model did not always perform better. We also found no indication of which ones between Indonesian and multilingual language models that generally performed better for our tasks. In pairwise MTL settings, we found that pairing tasks could outperform the STL baseline for all three tasks. Despite varying loss weights across our three-way MTL models, we did not identify a consistent pattern. While some configurations improved MER and KE performance, none surpassed the best pairwise MTL model for the SR task.Conclusion: We extended an Indonesian dataset for SR, MER, and KE tasks, resulted in 1, 173 labeled data points which splitted into 773 training instances, 200 validation instances, and 200 testing instances. We then used transformer-based models to set a baseline for all three tasks. Our MTL experiments suggested that additional information regarding the other two tasks could help the learning process for MER and KE tasks, while had only a small effect for SR task.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"8"},"PeriodicalIF":2.0000,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12057135/pdf/","citationCount":"0","resultStr":"{\"title\":\"Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning.\",\"authors\":\"Tsaqif Naufal, Rahmad Mahendra, Alfan Farizki Wicaksono\",\"doi\":\"10.1186/s13326-025-00329-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: Online consumer health forums offer an alternative source of health-related information for internet users seeking specific details that may not be readily available through articles or other one-way communication channels. However, the effectiveness of these forums can be constrained by the limited number of healthcare professionals actively participating, which can impact response times to user inquiries. One potential solution to this issue is the integration of a semi-automatic system. A critical component of such a system is question processing, which often involves sentence recognition (SR), medical entity recognition (MER), and keyphrase extraction (KE) modules. We posit that the development of these three modules would enable the system to identify critical components of the question, thereby facilitating a deeper understanding of the question, and allowing for the re-formulation of more effective questions with extracted key information.Methods: This work contributes to two key aspects related to these three tasks. First, we expand and publicly release an Indonesian dataset for each task. Second, we establish a baseline for all three tasks within the Indonesian language domain by employing transformer-based models with nine distinct encoder variations. Our feature studies revealed an interdependence among these three tasks. Consequently, we propose several multi-task learning (MTL) models, both in pairwise and three-way configurations, incorporating parallel and hierarchical architectures.Results: Using F1-score at the chunk level, the inter-annotator agreements for SR, MER, and KE tasks were <math><mrow><mn>88.61</mn> <mo>%</mo> <mo>,</mo> <mn>64.83</mn> <mo>%</mo></mrow> </math> , and <math><mrow><mn>35.01</mn> <mo>%</mo></mrow> </math> respectively. In single-task learning (STL) settings, the best performance for each task was achieved by different model, with <math><msub><mtext>IndoNLU</mtext> <mtext>LARGE</mtext></msub> </math> obtained the highest average score. These results suggested that a larger model did not always perform better. We also found no indication of which ones between Indonesian and multilingual language models that generally performed better for our tasks. In pairwise MTL settings, we found that pairing tasks could outperform the STL baseline for all three tasks. Despite varying loss weights across our three-way MTL models, we did not identify a consistent pattern. While some configurations improved MER and KE performance, none surpassed the best pairwise MTL model for the SR task.Conclusion: We extended an Indonesian dataset for SR, MER, and KE tasks, resulted in 1, 173 labeled data points which splitted into 773 training instances, 200 validation instances, and 200 testing instances. We then used transformer-based models to set a baseline for all three tasks. Our MTL experiments suggested that additional information regarding the other two tasks could help the learning process for MER and KE tasks, while had only a small effect for SR task.\",\"PeriodicalId\":15055,\"journal\":{\"name\":\"Journal of Biomedical Semantics\",\"volume\":\"16 1\",\"pages\":\"8\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12057135/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biomedical Semantics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1186/s13326-025-00329-2\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Semantics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s13326-025-00329-2","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

目的：在线消费者健康论坛为寻求具体细节的互联网用户提供了另一种健康相关信息来源，这些细节可能无法通过文章或其他单向沟通渠道轻易获得。然而，这些论坛的有效性可能会受到积极参与的医疗保健专业人员数量有限的限制，这可能会影响对用户查询的响应时间。这个问题的一个潜在解决方案是集成半自动系统。该系统的一个关键组成部分是问题处理，它通常涉及句子识别（SR）、医疗实体识别（MER）和关键短语提取（KE）模块。我们认为，这三个模块的开发将使系统能够确定问题的关键组成部分，从而促进对问题的更深入理解，并允许用提取的关键信息重新制定更有效的问题。方法：本工作对与这三项任务相关的两个关键方面作出了贡献。首先，我们扩展并公开发布每个任务的印尼语数据集。其次，我们通过使用具有九个不同编码器变体的基于转换器的模型，为印度尼西亚语言领域内的所有三个任务建立基线。我们的特征研究揭示了这三个任务之间的相互依存关系。因此，我们提出了几种多任务学习（MTL）模型，包括两两和三方配置，结合并行和分层架构。结果：在块水平上使用f1评分，SR、MER和KE任务的注释者间一致性分别为88.61%、64.83%和35.01%。在单任务学习（STL）设置中，不同的模型在每个任务上的表现都是最好的，其中IndoNLU LARGE的平均分最高。这些结果表明，更大的模型并不总是表现得更好。我们也没有发现印尼语和多语言语言模型中哪一种在我们的任务中表现得更好。在成对MTL设置中，我们发现配对任务可以在所有三个任务中优于STL基线。尽管在我们的三方MTL模型中损失权重不同，但我们没有确定一致的模式。虽然一些配置提高了MER和KE的性能，但没有一个超过SR任务的最佳成对MTL模型。结论：我们扩展了印度尼西亚的SR、MER和KE任务数据集，得到了1,173个标记数据点，分为773个训练实例、200个验证实例和200个测试实例。然后，我们使用基于转换器的模型为所有三个任务设置基线。我们的MTL实验表明，关于其他两个任务的额外信息对MER和KE任务的学习过程有帮助，而对SR任务的影响很小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning.

查看原文本刊更多论文

Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning.

Purpose: Online consumer health forums offer an alternative source of health-related information for internet users seeking specific details that may not be readily available through articles or other one-way communication channels. However, the effectiveness of these forums can be constrained by the limited number of healthcare professionals actively participating, which can impact response times to user inquiries. One potential solution to this issue is the integration of a semi-automatic system. A critical component of such a system is question processing, which often involves sentence recognition (SR), medical entity recognition (MER), and keyphrase extraction (KE) modules. We posit that the development of these three modules would enable the system to identify critical components of the question, thereby facilitating a deeper understanding of the question, and allowing for the re-formulation of more effective questions with extracted key information.

Methods: This work contributes to two key aspects related to these three tasks. First, we expand and publicly release an Indonesian dataset for each task. Second, we establish a baseline for all three tasks within the Indonesian language domain by employing transformer-based models with nine distinct encoder variations. Our feature studies revealed an interdependence among these three tasks. Consequently, we propose several multi-task learning (MTL) models, both in pairwise and three-way configurations, incorporating parallel and hierarchical architectures.

Results: Using F1-score at the chunk level, the inter-annotator agreements for SR, MER, and KE tasks were $88.61 %, 64.83 %$ , and $35.01 %$ respectively. In single-task learning (STL) settings, the best performance for each task was achieved by different model, with ${IndoNLU}_{LARGE}$ obtained the highest average score. These results suggested that a larger model did not always perform better. We also found no indication of which ones between Indonesian and multilingual language models that generally performed better for our tasks. In pairwise MTL settings, we found that pairing tasks could outperform the STL baseline for all three tasks. Despite varying loss weights across our three-way MTL models, we did not identify a consistent pattern. While some configurations improved MER and KE performance, none surpassed the best pairwise MTL model for the SR task.

Conclusion: We extended an Indonesian dataset for SR, MER, and KE tasks, resulted in 1, 173 labeled data points which splitted into 773 training instances, 200 validation instances, and 200 testing instances. We then used transformer-based models to set a baseline for all three tasks. Our MTL experiments suggested that additional information regarding the other two tasks could help the learning process for MER and KE tasks, while had only a small effect for SR task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Biomedical Semantics MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

4.20

自引率

5.30%

发文量

审稿时长

30 weeks

期刊介绍： Journal of Biomedical Semantics addresses issues of semantic enrichment and semantic processing in the biomedical domain. The scope of the journal covers two main areas: Infrastructure for biomedical semantics: focusing on semantic resources and repositories, meta-data management and resource description, knowledge representation and semantic frameworks, the Biomedical Semantic Web, and semantic interoperability. Semantic mining, annotation, and analysis: focusing on approaches and applications of semantic resources; and tools for investigation, reasoning, prediction, and discoveries in biomedicine.