减少对生物医学知识发现的监督。

IF 3.3 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-09-01 DOI:10.1186/s12859-025-06187-0

Christos Theodoropoulos, Andrei Catalin Coman, James Henderson, Marie-Francine Moens

{"title":"减少对生物医学知识发现的监督。","authors":"Christos Theodoropoulos, Andrei Catalin Coman, James Henderson, Marie-Francine Moens","doi":"10.1186/s12859-025-06187-0","DOIUrl":null,"url":null,"abstract":"Background: Knowledge discovery in scientific literature is hindered by the increasing volume of publications and the scarcity of extensive annotated data. To tackle the challenge of information overload, it is essential to employ automated methods for knowledge extraction and processing. Finding the right balance between the level of supervision and the effectiveness of models poses a significant challenge. While supervised techniques generally result in better performance, they have the major drawback of demanding labeled data. This requirement is labor-intensive, time-consuming, and hinders scalability when exploring new domains.Methods and results: In this context, our study addresses the challenge of identifying semantic relationships between biomedical entities (e.g., diseases, proteins, medications) in unstructured text while minimizing dependency on supervision. We introduce a suite of unsupervised algorithms based on dependency trees and attention mechanisms and employ a range of pointwise binary classification methods. Transitioning from weakly supervised to fully unsupervised settings, we assess the methods' ability to learn from data with noisy labels. The evaluation on four biomedical benchmark datasets explores the effectiveness of the methods, demonstrating their potential to enable scalable knowledge discovery systems less reliant on annotated datasets.Conclusion: Our approach tackles a central issue in knowledge discovery: balancing performance with minimal supervision which is crucial to adapting models to varied and changing domains. This study also investigates the use of pointwise binary classification techniques within a weakly supervised framework for knowledge discovery. By gradually decreasing supervision, we assess the robustness of these techniques in handling noisy labels, revealing their capability to shift from weakly supervised to entirely unsupervised scenarios. Comprehensive benchmarking offers insights into the effectiveness of these techniques, examining how unsupervised methods can reliably capture complex relationships in biomedical texts. These results suggest an encouraging direction toward scalable, adaptable knowledge discovery systems, representing progress in creating data-efficient methodologies for extracting useful insights when annotated data is limited.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"225"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12403602/pdf/","citationCount":"0","resultStr":"{\"title\":\"Reduction of supervision for biomedical knowledge discovery.\",\"authors\":\"Christos Theodoropoulos, Andrei Catalin Coman, James Henderson, Marie-Francine Moens\",\"doi\":\"10.1186/s12859-025-06187-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Knowledge discovery in scientific literature is hindered by the increasing volume of publications and the scarcity of extensive annotated data. To tackle the challenge of information overload, it is essential to employ automated methods for knowledge extraction and processing. Finding the right balance between the level of supervision and the effectiveness of models poses a significant challenge. While supervised techniques generally result in better performance, they have the major drawback of demanding labeled data. This requirement is labor-intensive, time-consuming, and hinders scalability when exploring new domains.Methods and results: In this context, our study addresses the challenge of identifying semantic relationships between biomedical entities (e.g., diseases, proteins, medications) in unstructured text while minimizing dependency on supervision. We introduce a suite of unsupervised algorithms based on dependency trees and attention mechanisms and employ a range of pointwise binary classification methods. Transitioning from weakly supervised to fully unsupervised settings, we assess the methods' ability to learn from data with noisy labels. The evaluation on four biomedical benchmark datasets explores the effectiveness of the methods, demonstrating their potential to enable scalable knowledge discovery systems less reliant on annotated datasets.Conclusion: Our approach tackles a central issue in knowledge discovery: balancing performance with minimal supervision which is crucial to adapting models to varied and changing domains. This study also investigates the use of pointwise binary classification techniques within a weakly supervised framework for knowledge discovery. By gradually decreasing supervision, we assess the robustness of these techniques in handling noisy labels, revealing their capability to shift from weakly supervised to entirely unsupervised scenarios. Comprehensive benchmarking offers insights into the effectiveness of these techniques, examining how unsupervised methods can reliably capture complex relationships in biomedical texts. These results suggest an encouraging direction toward scalable, adaptable knowledge discovery systems, representing progress in creating data-efficient methodologies for extracting useful insights when annotated data is limited.\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"225\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12403602/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-025-06187-0\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06187-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

背景：科学文献中的知识发现受到出版物数量增加和大量注释数据缺乏的阻碍。为了应对信息过载的挑战，必须采用自动化的方法进行知识提取和处理。在监督水平和模型的有效性之间找到适当的平衡是一项重大挑战。虽然监督技术通常会带来更好的性能，但它们的主要缺点是需要标记数据。这个需求是劳动密集型的，耗时的，并且在探索新领域时阻碍了可伸缩性。方法和结果：在这种情况下，我们的研究解决了在非结构化文本中识别生物医学实体（如疾病、蛋白质、药物）之间语义关系的挑战，同时最大限度地减少对监督的依赖。我们引入了一套基于依赖树和注意机制的无监督算法，并采用了一系列点向二值分类方法。从弱监督过渡到完全无监督设置，我们评估了方法从带有噪声标签的数据中学习的能力。对四个生物医学基准数据集的评估探讨了这些方法的有效性，展示了它们的潜力，使可扩展的知识发现系统减少对注释数据集的依赖。结论：我们的方法解决了知识发现中的一个核心问题：在最小监督下平衡性能，这对于使模型适应不同和不断变化的领域至关重要。本研究还探讨了在弱监督框架内使用点向二元分类技术进行知识发现。通过逐渐减少监督，我们评估了这些技术在处理噪声标签方面的鲁棒性，揭示了它们从弱监督到完全无监督场景转变的能力。全面的基准测试提供了对这些技术有效性的见解，检查无监督方法如何可靠地捕获生物医学文本中的复杂关系。这些结果表明了一个令人鼓舞的方向，即向可扩展的、可适应的知识发现系统发展，这代表了在创建数据高效方法方面取得的进展，这些方法用于在注释数据有限的情况下提取有用的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reduction of supervision for biomedical knowledge discovery.

Background: Knowledge discovery in scientific literature is hindered by the increasing volume of publications and the scarcity of extensive annotated data. To tackle the challenge of information overload, it is essential to employ automated methods for knowledge extraction and processing. Finding the right balance between the level of supervision and the effectiveness of models poses a significant challenge. While supervised techniques generally result in better performance, they have the major drawback of demanding labeled data. This requirement is labor-intensive, time-consuming, and hinders scalability when exploring new domains.

Methods and results: In this context, our study addresses the challenge of identifying semantic relationships between biomedical entities (e.g., diseases, proteins, medications) in unstructured text while minimizing dependency on supervision. We introduce a suite of unsupervised algorithms based on dependency trees and attention mechanisms and employ a range of pointwise binary classification methods. Transitioning from weakly supervised to fully unsupervised settings, we assess the methods' ability to learn from data with noisy labels. The evaluation on four biomedical benchmark datasets explores the effectiveness of the methods, demonstrating their potential to enable scalable knowledge discovery systems less reliant on annotated datasets.

Conclusion: Our approach tackles a central issue in knowledge discovery: balancing performance with minimal supervision which is crucial to adapting models to varied and changing domains. This study also investigates the use of pointwise binary classification techniques within a weakly supervised framework for knowledge discovery. By gradually decreasing supervision, we assess the robustness of these techniques in handling noisy labels, revealing their capability to shift from weakly supervised to entirely unsupervised scenarios. Comprehensive benchmarking offers insights into the effectiveness of these techniques, examining how unsupervised methods can reliably capture complex relationships in biomedical texts. These results suggest an encouraging direction toward scalable, adaptable knowledge discovery systems, representing progress in creating data-efficient methodologies for extracting useful insights when annotated data is limited.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.