Haley A Delcher, Enas S Alsatari, Adeyeye I Haastrup, Sayema Naaz, Lydia A Hayes-Guastella, Autumn M McDaniel, Olivia G Clark, Devin M Katerski, Francois O Prinsloo, Olivia R Roberts, Meredith A Shaddix, Bridgette N Sullivan, Isabella M Swan, Emily M Hartsell, Jeffrey D DeMeis, Sunita S Paudel, Glen M Borchert
{"title":"Using ChatGPT as a tool for training nonprogrammers to generate genomic sequence analysis code.","authors":"Haley A Delcher, Enas S Alsatari, Adeyeye I Haastrup, Sayema Naaz, Lydia A Hayes-Guastella, Autumn M McDaniel, Olivia G Clark, Devin M Katerski, Francois O Prinsloo, Olivia R Roberts, Meredith A Shaddix, Bridgette N Sullivan, Isabella M Swan, Emily M Hartsell, Jeffrey D DeMeis, Sunita S Paudel, Glen M Borchert","doi":"10.1002/bmb.21899","DOIUrl":null,"url":null,"abstract":"<p><p>Today, due to the size of many genomes and the increasingly large sizes of sequencing files, independently analyzing sequencing data is largely impossible for a biologist with little to no programming expertise. As such, biologists are typically faced with the dilemma of either having to spend a significant amount of time and effort to learn how to program themselves or having to identify (and rely on) an available computer scientist to analyze large sequence data sets. That said, the advent of AI-powered programs like ChatGPT may offer a means of circumventing the disconnect between biologists and their analysis of genomic data critically important to their field. The work detailed herein demonstrates how implementing ChatGPT into an existing Course-based Undergraduate Research Experience curriculum can provide a means for equipping biology students with no programming expertise the power to generate their own programs and allow those students to carry out a publishable, comprehensive analysis of real-world Next Generation Sequencing (NGS) datasets. Relying solely on the students' biology background as a prompt for directing ChatGPT to generate Python codes, we found students could readily generate programs able to deal with and analyze NGS datasets greater than 10 gigabytes. In summary, we believe that integrating ChatGPT into education can help bridge a critical gap between biology and computer science and may prove similarly beneficial in other disciplines. Additionally, ChatGPT can provide biological researchers with powerful new tools capable of mediating NGS dataset analysis to help accelerate major new advances in the field.</p>","PeriodicalId":8830,"journal":{"name":"Biochemistry and Molecular Biology Education","volume":" ","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biochemistry and Molecular Biology Education","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1002/bmb.21899","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Today, due to the size of many genomes and the increasingly large sizes of sequencing files, independently analyzing sequencing data is largely impossible for a biologist with little to no programming expertise. As such, biologists are typically faced with the dilemma of either having to spend a significant amount of time and effort to learn how to program themselves or having to identify (and rely on) an available computer scientist to analyze large sequence data sets. That said, the advent of AI-powered programs like ChatGPT may offer a means of circumventing the disconnect between biologists and their analysis of genomic data critically important to their field. The work detailed herein demonstrates how implementing ChatGPT into an existing Course-based Undergraduate Research Experience curriculum can provide a means for equipping biology students with no programming expertise the power to generate their own programs and allow those students to carry out a publishable, comprehensive analysis of real-world Next Generation Sequencing (NGS) datasets. Relying solely on the students' biology background as a prompt for directing ChatGPT to generate Python codes, we found students could readily generate programs able to deal with and analyze NGS datasets greater than 10 gigabytes. In summary, we believe that integrating ChatGPT into education can help bridge a critical gap between biology and computer science and may prove similarly beneficial in other disciplines. Additionally, ChatGPT can provide biological researchers with powerful new tools capable of mediating NGS dataset analysis to help accelerate major new advances in the field.
期刊介绍:
The aim of BAMBED is to enhance teacher preparation and student learning in Biochemistry, Molecular Biology, and related sciences such as Biophysics and Cell Biology, by promoting the world-wide dissemination of educational materials. BAMBED seeks and communicates articles on many topics, including:
Innovative techniques in teaching and learning.
New pedagogical approaches.
Research in biochemistry and molecular biology education.
Reviews on emerging areas of Biochemistry and Molecular Biology to provide background for the preparation of lectures, seminars, student presentations, dissertations, etc.
Historical Reviews describing "Paths to Discovery".
Novel and proven laboratory experiments that have both skill-building and discovery-based characteristics.
Reviews of relevant textbooks, software, and websites.
Descriptions of software for educational use.
Descriptions of multimedia materials such as tutorials on various aspects of biochemistry and molecular biology.