Shuo Jiang , Daniel Evans-Yamamoto , Dennis Bersenev , Sucheendra K. Palaniappan , Ayako Yachie-Kinoshita
{"title":"ProtoCode: Leveraging large language models (LLMs) for automated generation of machine-readable PCR protocols from scientific publications","authors":"Shuo Jiang , Daniel Evans-Yamamoto , Dennis Bersenev , Sucheendra K. Palaniappan , Ayako Yachie-Kinoshita","doi":"10.1016/j.slast.2024.100134","DOIUrl":null,"url":null,"abstract":"<div><p>Protocol standardization and sharing are crucial for reproducibility in life sciences. In spite of numerous efforts for standardized protocol description, adherence to these standards in literature remains largely inconsistent. Curation of protocols are especially challenging due to the labor intensive process, requiring expert domain knowledge of each experimental procedure. Recent advancements in Large Language Models (LLMs) offer a promising solution to interpret and curate knowledge from complex scientific literature. In this work, we develop ProtoCode, a tool leveraging fine-tune LLMs to curate protocols into intermediate representation formats which can be interpretable by both human and machine interfaces. Our proof-of-concept, focused on polymerase chain reaction (PCR) protocols, retrieves information from PCR protocols at an accuracy ranging 69–100 % depending on the information content. In all tested protocols, we demonstrate that ProtoCode successfully converts literature-based protocols into correct operational files for multiple thermal cycler systems. In conclusion, ProtoCode can alleviate labor intensive curation and standardization of life science protocols to enhance research reproducibility by providing a reliable, automated means to process and standardize protocols. ProtoCode is freely available as a web server at <span>https://curation.taxila.io/ProtoCode/</span><svg><path></path></svg>.</p></div>","PeriodicalId":54248,"journal":{"name":"SLAS Technology","volume":null,"pages":null},"PeriodicalIF":2.5000,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2472630324000165/pdfft?md5=d470a9ecd45aada315855fbaeaba12b8&pid=1-s2.0-S2472630324000165-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SLAS Technology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2472630324000165","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Protocol standardization and sharing are crucial for reproducibility in life sciences. In spite of numerous efforts for standardized protocol description, adherence to these standards in literature remains largely inconsistent. Curation of protocols are especially challenging due to the labor intensive process, requiring expert domain knowledge of each experimental procedure. Recent advancements in Large Language Models (LLMs) offer a promising solution to interpret and curate knowledge from complex scientific literature. In this work, we develop ProtoCode, a tool leveraging fine-tune LLMs to curate protocols into intermediate representation formats which can be interpretable by both human and machine interfaces. Our proof-of-concept, focused on polymerase chain reaction (PCR) protocols, retrieves information from PCR protocols at an accuracy ranging 69–100 % depending on the information content. In all tested protocols, we demonstrate that ProtoCode successfully converts literature-based protocols into correct operational files for multiple thermal cycler systems. In conclusion, ProtoCode can alleviate labor intensive curation and standardization of life science protocols to enhance research reproducibility by providing a reliable, automated means to process and standardize protocols. ProtoCode is freely available as a web server at https://curation.taxila.io/ProtoCode/.
期刊介绍:
SLAS Technology emphasizes scientific and technical advances that enable and improve life sciences research and development; drug-delivery; diagnostics; biomedical and molecular imaging; and personalized and precision medicine. This includes high-throughput and other laboratory automation technologies; micro/nanotechnologies; analytical, separation and quantitative techniques; synthetic chemistry and biology; informatics (data analysis, statistics, bio, genomic and chemoinformatics); and more.