Enock Niyonkuru, J Harry Caufield, Leigh C Carmody, Michael A Gargano, Sabrina Toro, Patricia L Whetzel, Hannah Blau, Mauricio Soto Gomez, Elena Casiraghi, Leonardo Chimirri, Justin T Reese, Giorgio Valentini, Melissa A Haendel, Christopher J Mungall, Peter N Robinson
{"title":"利用生成式人工智能协助罕见疾病医疗行动的生物定位。","authors":"Enock Niyonkuru, J Harry Caufield, Leigh C Carmody, Michael A Gargano, Sabrina Toro, Patricia L Whetzel, Hannah Blau, Mauricio Soto Gomez, Elena Casiraghi, Leonardo Chimirri, Justin T Reese, Giorgio Valentini, Melissa A Haendel, Christopher J Mungall, Peter N Robinson","doi":"10.1093/bioadv/vbaf141","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for clinical management. Currently, manual biocuration is used to annotate MAxO terms to rare diseases. However, it is challenging to scale manual curation to comprehensively capture information about medical actions for the more than 10 000 rare diseases.</p><p><strong>Results: </strong>We present AutoMAxO, a semi-automated workflow that leverages Large Language Models (LLMs) to streamline MAxO biocuration. AutoMAxO first uses LLMs to retrieve candidate curations from abstracts of relevant publications. Next, the candidate curations are matched to ontology terms from MAxO, Human Phenotype Ontology (HPO), and MONDO disease ontology via a combination of LLMs and post-processing techniques. Finally, the matched terms are presented in a structured form to a human curator for approval. We used this approach to process abstracts related to 37 rare genetic diseases and identified 958 novel treatment annotations that were transferred to the MAxO annotation dataset.</p><p><strong>Availability and implementation: </strong>AutoMAxO is a Python package freely available at https://github.com/monarch-initiative/automaxo.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf141"},"PeriodicalIF":2.8000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12228962/pdf/","citationCount":"0","resultStr":"{\"title\":\"Leveraging generative AI to assist biocuration of medical actions for rare disease.\",\"authors\":\"Enock Niyonkuru, J Harry Caufield, Leigh C Carmody, Michael A Gargano, Sabrina Toro, Patricia L Whetzel, Hannah Blau, Mauricio Soto Gomez, Elena Casiraghi, Leonardo Chimirri, Justin T Reese, Giorgio Valentini, Melissa A Haendel, Christopher J Mungall, Peter N Robinson\",\"doi\":\"10.1093/bioadv/vbaf141\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for clinical management. Currently, manual biocuration is used to annotate MAxO terms to rare diseases. However, it is challenging to scale manual curation to comprehensively capture information about medical actions for the more than 10 000 rare diseases.</p><p><strong>Results: </strong>We present AutoMAxO, a semi-automated workflow that leverages Large Language Models (LLMs) to streamline MAxO biocuration. AutoMAxO first uses LLMs to retrieve candidate curations from abstracts of relevant publications. Next, the candidate curations are matched to ontology terms from MAxO, Human Phenotype Ontology (HPO), and MONDO disease ontology via a combination of LLMs and post-processing techniques. Finally, the matched terms are presented in a structured form to a human curator for approval. We used this approach to process abstracts related to 37 rare genetic diseases and identified 958 novel treatment annotations that were transferred to the MAxO annotation dataset.</p><p><strong>Availability and implementation: </strong>AutoMAxO is a Python package freely available at https://github.com/monarch-initiative/automaxo.</p>\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":\"5 1\",\"pages\":\"vbaf141\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12228962/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbaf141\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
Leveraging generative AI to assist biocuration of medical actions for rare disease.
Motivation: Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for clinical management. Currently, manual biocuration is used to annotate MAxO terms to rare diseases. However, it is challenging to scale manual curation to comprehensively capture information about medical actions for the more than 10 000 rare diseases.
Results: We present AutoMAxO, a semi-automated workflow that leverages Large Language Models (LLMs) to streamline MAxO biocuration. AutoMAxO first uses LLMs to retrieve candidate curations from abstracts of relevant publications. Next, the candidate curations are matched to ontology terms from MAxO, Human Phenotype Ontology (HPO), and MONDO disease ontology via a combination of LLMs and post-processing techniques. Finally, the matched terms are presented in a structured form to a human curator for approval. We used this approach to process abstracts related to 37 rare genetic diseases and identified 958 novel treatment annotations that were transferred to the MAxO annotation dataset.
Availability and implementation: AutoMAxO is a Python package freely available at https://github.com/monarch-initiative/automaxo.