Negin Sadat Babaiha , Sathvik Guru Rao , Jürgen Klein , Bruce Schultz , Marc Jacobs , Martin Hofmann-Apitius
{"title":"面对 GPT 虚伪的理性主义:以人类专家编辑的生物医学知识图谱为基准测试大型语言模型的输出结果","authors":"Negin Sadat Babaiha , Sathvik Guru Rao , Jürgen Klein , Bruce Schultz , Marc Jacobs , Martin Hofmann-Apitius","doi":"10.1016/j.ailsci.2024.100095","DOIUrl":null,"url":null,"abstract":"<div><p>Biomedical knowledge graphs (KGs) hold valuable information regarding biomedical entities such as genes, diseases, biological processes, and drugs. KGs have been successfully employed in challenging biomedical areas such as the identification of pathophysiology mechanisms or drug repurposing. The creation of high-quality KGs typically requires labor-intensive multi-database integration or substantial human expert curation, both of which take time and contribute to the workload of data processing and annotation. Therefore, the use of automatic systems for KG building and maintenance is a prerequisite for the wide uptake and utilization of KGs. Technologies supporting the automated generation and updating of KGs typically make use of Natural Language Processing (NLP), which is optimized for extracting implicit triples described in relevant biomedical text sources. At the core of this challenge is how to improve the accuracy and coverage of the information extraction module by utilizing different models and tools. The emergence of pre-trained large language models (LLMs), such as ChatGPT which has grown in popularity dramatically, has revolutionized the field of NLP, making them a potential candidate to be used in text-based graph creation as well. So far, no previous work has investigated the power of LLMs on the generation of cause-and-effect networks and KGs encoded in Biological Expression Language (BEL). In this paper, we present initial studies towards one-shot BEL relation extraction using two different versions of the Generative Pre-trained Transformer (GPT) models and evaluate its performance by comparing the extracted results to a highly accurate, manually curated BEL KG curated by domain experts.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318524000023/pdfft?md5=9137dd2a207653e4d13cb5b99ca17d48&pid=1-s2.0-S2667318524000023-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Rationalism in the face of GPT hypes: Benchmarking the output of large language models against human expert-curated biomedical knowledge graphs\",\"authors\":\"Negin Sadat Babaiha , Sathvik Guru Rao , Jürgen Klein , Bruce Schultz , Marc Jacobs , Martin Hofmann-Apitius\",\"doi\":\"10.1016/j.ailsci.2024.100095\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Biomedical knowledge graphs (KGs) hold valuable information regarding biomedical entities such as genes, diseases, biological processes, and drugs. KGs have been successfully employed in challenging biomedical areas such as the identification of pathophysiology mechanisms or drug repurposing. The creation of high-quality KGs typically requires labor-intensive multi-database integration or substantial human expert curation, both of which take time and contribute to the workload of data processing and annotation. Therefore, the use of automatic systems for KG building and maintenance is a prerequisite for the wide uptake and utilization of KGs. Technologies supporting the automated generation and updating of KGs typically make use of Natural Language Processing (NLP), which is optimized for extracting implicit triples described in relevant biomedical text sources. At the core of this challenge is how to improve the accuracy and coverage of the information extraction module by utilizing different models and tools. The emergence of pre-trained large language models (LLMs), such as ChatGPT which has grown in popularity dramatically, has revolutionized the field of NLP, making them a potential candidate to be used in text-based graph creation as well. So far, no previous work has investigated the power of LLMs on the generation of cause-and-effect networks and KGs encoded in Biological Expression Language (BEL). In this paper, we present initial studies towards one-shot BEL relation extraction using two different versions of the Generative Pre-trained Transformer (GPT) models and evaluate its performance by comparing the extracted results to a highly accurate, manually curated BEL KG curated by domain experts.</p></div>\",\"PeriodicalId\":72304,\"journal\":{\"name\":\"Artificial intelligence in the life sciences\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2667318524000023/pdfft?md5=9137dd2a207653e4d13cb5b99ca17d48&pid=1-s2.0-S2667318524000023-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial intelligence in the life sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667318524000023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence in the life sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667318524000023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
生物医学知识图谱(KG)包含有关基因、疾病、生物过程和药物等生物医学实体的宝贵信息。知识图谱已成功应用于具有挑战性的生物医学领域,如病理生理学机制鉴定或药物再利用。创建高质量的 KG 通常需要劳动密集型的多数据库整合或大量的人工专家策划,这两者都需要时间,并增加了数据处理和注释的工作量。因此,使用自动系统建立和维护 KG 是广泛吸收和利用 KG 的先决条件。支持自动生成和更新 KG 的技术通常使用自然语言处理(NLP)技术,该技术针对提取相关生物医学文本资源中描述的隐式三元组进行了优化。这一挑战的核心是如何利用不同的模型和工具来提高信息提取模块的准确性和覆盖范围。预训练的大型语言模型(LLM)的出现,如 ChatGPT 的急剧普及,给 NLP 领域带来了革命性的变化,使其也有可能用于基于文本的图创建。迄今为止,还没有人研究过 LLM 在生成以生物表达语言(BEL)编码的因果网络和 KG 方面的威力。在本文中,我们介绍了使用两种不同版本的生成预训练转换器(GPT)模型进行一次 BEL 关系提取的初步研究,并通过将提取结果与领域专家手动策划的高精度 BEL KG 进行比较,评估了其性能。
Rationalism in the face of GPT hypes: Benchmarking the output of large language models against human expert-curated biomedical knowledge graphs
Biomedical knowledge graphs (KGs) hold valuable information regarding biomedical entities such as genes, diseases, biological processes, and drugs. KGs have been successfully employed in challenging biomedical areas such as the identification of pathophysiology mechanisms or drug repurposing. The creation of high-quality KGs typically requires labor-intensive multi-database integration or substantial human expert curation, both of which take time and contribute to the workload of data processing and annotation. Therefore, the use of automatic systems for KG building and maintenance is a prerequisite for the wide uptake and utilization of KGs. Technologies supporting the automated generation and updating of KGs typically make use of Natural Language Processing (NLP), which is optimized for extracting implicit triples described in relevant biomedical text sources. At the core of this challenge is how to improve the accuracy and coverage of the information extraction module by utilizing different models and tools. The emergence of pre-trained large language models (LLMs), such as ChatGPT which has grown in popularity dramatically, has revolutionized the field of NLP, making them a potential candidate to be used in text-based graph creation as well. So far, no previous work has investigated the power of LLMs on the generation of cause-and-effect networks and KGs encoded in Biological Expression Language (BEL). In this paper, we present initial studies towards one-shot BEL relation extraction using two different versions of the Generative Pre-trained Transformer (GPT) models and evaluate its performance by comparing the extracted results to a highly accurate, manually curated BEL KG curated by domain experts.
Artificial intelligence in the life sciencesPharmacology, Biochemistry, Genetics and Molecular Biology (General), Computer Science Applications, Health Informatics, Drug Discovery, Veterinary Science and Veterinary Medicine (General)