An T H Le, Thomas Shvekher, Lewis Nguyen, Sergey N Krylov
{"title":"会话式大语言模型导师,加速常规生物分析工作流程中的机器学习方法开发。","authors":"An T H Le, Thomas Shvekher, Lewis Nguyen, Sergey N Krylov","doi":"10.1002/cbic.202500678","DOIUrl":null,"url":null,"abstract":"<p><p>As machine learning (ML) becomes increasingly relevant in experimental chemistry, many scientists face barriers to adoption due to limited training in ML. While AutoML platforms offer powerful capabilities, they lack the instructional scaffolding needed by users without an ML background. To address this gap, a lightweight, conversational assistant is presented that guides users through ML workflow design using plain-language dialog. Powered by OpenAI's GPT-4o and deployed via a Gradio interface, the assistant operates under a structured system prompt that simulates pedagogical reasoning. It behaves like a domain-specific tutor: helping users define ML goals, assess data structure, select models, evaluate metrics, and generate annotated Python code. A complete documentation of the development process is provided, allowing researchers to adapt the system for other domains. Herein, its utility is demonstrated in two representative case studies: 1) image classification of lateral flow immunoassay test strips for diagnostic readout; and 2) regression-based prediction of liquid chromatography-mass spectrometry retention times from molecular descriptors for small molecules. In both cases, lab members with no ML experience successfully developed working models guided solely by the assistant. By lowering the barrier to ML adoption in data-rich analytical workflows, this system offers a customizable workflow for building domain-specific assistants across experimental science.</p>","PeriodicalId":140,"journal":{"name":"ChemBioChem","volume":" ","pages":"e202500678"},"PeriodicalIF":2.8000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Conversational Large-Language-Model Tutor that Accelerates Machine-Learning Method Development in Routine Bioanalytical Workflows.\",\"authors\":\"An T H Le, Thomas Shvekher, Lewis Nguyen, Sergey N Krylov\",\"doi\":\"10.1002/cbic.202500678\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>As machine learning (ML) becomes increasingly relevant in experimental chemistry, many scientists face barriers to adoption due to limited training in ML. While AutoML platforms offer powerful capabilities, they lack the instructional scaffolding needed by users without an ML background. To address this gap, a lightweight, conversational assistant is presented that guides users through ML workflow design using plain-language dialog. Powered by OpenAI's GPT-4o and deployed via a Gradio interface, the assistant operates under a structured system prompt that simulates pedagogical reasoning. It behaves like a domain-specific tutor: helping users define ML goals, assess data structure, select models, evaluate metrics, and generate annotated Python code. A complete documentation of the development process is provided, allowing researchers to adapt the system for other domains. Herein, its utility is demonstrated in two representative case studies: 1) image classification of lateral flow immunoassay test strips for diagnostic readout; and 2) regression-based prediction of liquid chromatography-mass spectrometry retention times from molecular descriptors for small molecules. In both cases, lab members with no ML experience successfully developed working models guided solely by the assistant. By lowering the barrier to ML adoption in data-rich analytical workflows, this system offers a customizable workflow for building domain-specific assistants across experimental science.</p>\",\"PeriodicalId\":140,\"journal\":{\"name\":\"ChemBioChem\",\"volume\":\" \",\"pages\":\"e202500678\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ChemBioChem\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/cbic.202500678\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ChemBioChem","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/cbic.202500678","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
A Conversational Large-Language-Model Tutor that Accelerates Machine-Learning Method Development in Routine Bioanalytical Workflows.
As machine learning (ML) becomes increasingly relevant in experimental chemistry, many scientists face barriers to adoption due to limited training in ML. While AutoML platforms offer powerful capabilities, they lack the instructional scaffolding needed by users without an ML background. To address this gap, a lightweight, conversational assistant is presented that guides users through ML workflow design using plain-language dialog. Powered by OpenAI's GPT-4o and deployed via a Gradio interface, the assistant operates under a structured system prompt that simulates pedagogical reasoning. It behaves like a domain-specific tutor: helping users define ML goals, assess data structure, select models, evaluate metrics, and generate annotated Python code. A complete documentation of the development process is provided, allowing researchers to adapt the system for other domains. Herein, its utility is demonstrated in two representative case studies: 1) image classification of lateral flow immunoassay test strips for diagnostic readout; and 2) regression-based prediction of liquid chromatography-mass spectrometry retention times from molecular descriptors for small molecules. In both cases, lab members with no ML experience successfully developed working models guided solely by the assistant. By lowering the barrier to ML adoption in data-rich analytical workflows, this system offers a customizable workflow for building domain-specific assistants across experimental science.
期刊介绍:
ChemBioChem (Impact Factor 2018: 2.641) publishes important breakthroughs across all areas at the interface of chemistry and biology, including the fields of chemical biology, bioorganic chemistry, bioinorganic chemistry, synthetic biology, biocatalysis, bionanotechnology, and biomaterials. It is published on behalf of Chemistry Europe, an association of 16 European chemical societies, and supported by the Asian Chemical Editorial Society (ACES).