应用自然语言处理技术提取毒理学机理信息

Marie Corradi, Thomas Luechtefeld, Alyanne M. de Haan, R. Pieters, Jonathan H. Freedman, T. Vanhaecke, Mathieu Vinken, M. Teunis
{"title":"应用自然语言处理技术提取毒理学机理信息","authors":"Marie Corradi, Thomas Luechtefeld, Alyanne M. de Haan, R. Pieters, Jonathan H. Freedman, T. Vanhaecke, Mathieu Vinken, M. Teunis","doi":"10.3389/ftox.2024.1393662","DOIUrl":null,"url":null,"abstract":"To study the ways in which compounds can induce adverse effects, toxicologists have been constructing Adverse Outcome Pathways (AOPs). An AOP can be considered as a pragmatic tool to capture and visualize mechanisms underlying different types of toxicity inflicted by any kind of stressor, and describes the interactions between key entities that lead to the adverse outcome on multiple biological levels of organization. The construction or optimization of an AOP is a labor intensive process, which currently depends on the manual search, collection, reviewing and synthesis of available scientific literature. This process could however be largely facilitated using Natural Language Processing (NLP) to extract information contained in scientific literature in a systematic, objective, and rapid manner that would lead to greater accuracy and reproducibility. This would support researchers to invest their expertise in the substantive assessment of the AOPs by replacing the time spent on evidence gathering by a critical review of the data extracted by NLP. As case examples, we selected two frequent adversities observed in the liver: namely, cholestasis and steatosis denoting accumulation of bile and lipid, respectively. We used deep learning language models to recognize entities of interest in text and establish causal relationships between them. We demonstrate how an NLP pipeline combining Named Entity Recognition and a simple rules-based relationship extraction model helps screen compounds related to liver adversities in the literature, but also extract mechanistic information for how such adversities develop, from the molecular to the organismal level. Finally, we provide some perspectives opened by the recent progress in Large Language Models and how these could be used in the future. We propose this work brings two main contributions: 1) a proof-of-concept that NLP can support the extraction of information from text for modern toxicology and 2) a template open-source model for recognition of toxicological entities and extraction of their relationships. All resources are openly accessible via GitHub (https://github.com/ontox-project/en-tox).","PeriodicalId":502303,"journal":{"name":"Frontiers in Toxicology","volume":" 67","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The application of natural language processing for the extraction of mechanistic information in toxicology\",\"authors\":\"Marie Corradi, Thomas Luechtefeld, Alyanne M. de Haan, R. Pieters, Jonathan H. Freedman, T. Vanhaecke, Mathieu Vinken, M. Teunis\",\"doi\":\"10.3389/ftox.2024.1393662\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To study the ways in which compounds can induce adverse effects, toxicologists have been constructing Adverse Outcome Pathways (AOPs). An AOP can be considered as a pragmatic tool to capture and visualize mechanisms underlying different types of toxicity inflicted by any kind of stressor, and describes the interactions between key entities that lead to the adverse outcome on multiple biological levels of organization. The construction or optimization of an AOP is a labor intensive process, which currently depends on the manual search, collection, reviewing and synthesis of available scientific literature. This process could however be largely facilitated using Natural Language Processing (NLP) to extract information contained in scientific literature in a systematic, objective, and rapid manner that would lead to greater accuracy and reproducibility. This would support researchers to invest their expertise in the substantive assessment of the AOPs by replacing the time spent on evidence gathering by a critical review of the data extracted by NLP. As case examples, we selected two frequent adversities observed in the liver: namely, cholestasis and steatosis denoting accumulation of bile and lipid, respectively. We used deep learning language models to recognize entities of interest in text and establish causal relationships between them. We demonstrate how an NLP pipeline combining Named Entity Recognition and a simple rules-based relationship extraction model helps screen compounds related to liver adversities in the literature, but also extract mechanistic information for how such adversities develop, from the molecular to the organismal level. Finally, we provide some perspectives opened by the recent progress in Large Language Models and how these could be used in the future. We propose this work brings two main contributions: 1) a proof-of-concept that NLP can support the extraction of information from text for modern toxicology and 2) a template open-source model for recognition of toxicological entities and extraction of their relationships. All resources are openly accessible via GitHub (https://github.com/ontox-project/en-tox).\",\"PeriodicalId\":502303,\"journal\":{\"name\":\"Frontiers in Toxicology\",\"volume\":\" 67\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Toxicology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/ftox.2024.1393662\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/ftox.2024.1393662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

为了研究化合物诱发不良反应的方式,毒理学家一直在构建不良后果途径(AOPs)。AOP 可被视为一种实用工具,用于捕捉和直观显示任何类型的应激源所造成的不同类型毒性的内在机制,并描述在多个生物组织层次上导致不良后果的关键实体之间的相互作用。构建或优化 AOP 是一个劳动密集型过程,目前需要人工搜索、收集、审查和综合现有的科学文献。然而,使用自然语言处理(NLP)技术可以在很大程度上简化这一过程,以系统、客观和快速的方式提取科学文献中的信息,从而提高准确性和可重复性。这将支持研究人员将其专业知识投入到对 AOP 的实质性评估中,通过对 NLP 提取的数据进行批判性审查来取代花在证据收集上的时间。作为案例,我们选择了在肝脏中观察到的两种常见逆境:即胆汁淤积和脂肪变性,分别表示胆汁和脂质的积累。我们使用深度学习语言模型来识别文本中感兴趣的实体,并建立它们之间的因果关系。我们展示了结合命名实体识别和简单的基于规则的关系提取模型的 NLP 管道如何帮助筛选文献中与肝脏不良反应相关的化合物,以及如何从分子到机体层面提取此类不良反应发生的机理信息。最后,我们对大型语言模型的最新进展以及未来如何使用这些模型提出了一些展望。我们认为这项工作有两大贡献:1)概念证明 NLP 可以支持现代毒理学从文本中提取信息;2)用于识别毒理学实体并提取其关系的开源模板模型。所有资源均可通过 GitHub (https://github.com/ontox-project/en-tox) 公开访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The application of natural language processing for the extraction of mechanistic information in toxicology
To study the ways in which compounds can induce adverse effects, toxicologists have been constructing Adverse Outcome Pathways (AOPs). An AOP can be considered as a pragmatic tool to capture and visualize mechanisms underlying different types of toxicity inflicted by any kind of stressor, and describes the interactions between key entities that lead to the adverse outcome on multiple biological levels of organization. The construction or optimization of an AOP is a labor intensive process, which currently depends on the manual search, collection, reviewing and synthesis of available scientific literature. This process could however be largely facilitated using Natural Language Processing (NLP) to extract information contained in scientific literature in a systematic, objective, and rapid manner that would lead to greater accuracy and reproducibility. This would support researchers to invest their expertise in the substantive assessment of the AOPs by replacing the time spent on evidence gathering by a critical review of the data extracted by NLP. As case examples, we selected two frequent adversities observed in the liver: namely, cholestasis and steatosis denoting accumulation of bile and lipid, respectively. We used deep learning language models to recognize entities of interest in text and establish causal relationships between them. We demonstrate how an NLP pipeline combining Named Entity Recognition and a simple rules-based relationship extraction model helps screen compounds related to liver adversities in the literature, but also extract mechanistic information for how such adversities develop, from the molecular to the organismal level. Finally, we provide some perspectives opened by the recent progress in Large Language Models and how these could be used in the future. We propose this work brings two main contributions: 1) a proof-of-concept that NLP can support the extraction of information from text for modern toxicology and 2) a template open-source model for recognition of toxicological entities and extraction of their relationships. All resources are openly accessible via GitHub (https://github.com/ontox-project/en-tox).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信