Tim Havers, Caroline Jelonnek, Lukas Masur, Eduard Isenmann, Billy Sperlich, Stephan Geisler, Peter Düking
{"title":"A professional assessment of training plans for muscle hypertrophy and maximal strength developed by generative artificial intelligence.","authors":"Tim Havers, Caroline Jelonnek, Lukas Masur, Eduard Isenmann, Billy Sperlich, Stephan Geisler, Peter Düking","doi":"10.5114/biolsport.2026.152350","DOIUrl":null,"url":null,"abstract":"<p><p>The aim of this study was to evaluate the quality of resistance training plans for muscle hypertrophy and maximal strength generated by three large language models (LLMs): GPT-3.5 (via ChatGPT and Microsoft Copilot) and Google Gemini (GG). A total of 10 experienced coaches, each with at least a bachelor's degree in exercise science and at least 2 years of coaching experience, rated these plans on a 1-5 Likert scale based on 27 criteria essential for effective training plan design. The LLMs were accessed on April 30, 2024, with a prompt structure that included key training objectives and the training history of a fictional advanced trainee. Results showed that the overall quality of the LLM-generated training plans was moderate. GG outperformed GPT-3.5 (via ChatGPT and Microsoft Copilot) for hypertrophy-related plans on 2 out of 27 criteria (advanced exercise methods, recovery strategies; p < 0.05), while GPT-3.5 (via Microsoft Copilot) outperformed GG for strength-related plans on 1 out of 27 criteria (testing procedure; p < 0.05). Across all criteria, GG received ratings > 3 more frequently than GPT-3.5 (via ChatGPT and Microsoft Copilot), particularly for general aspects, training principles, and training methods. Differences between hypertrophy- and strength-oriented plans within each LLM were minimal, although GPT-3.5 (via ChatGPT) showed the most inconsistency in ratings. Although LLM-generated plans can serve as an initial framework for hypertrophy and strength development, expert supervision remains crucial to refine these plans, as LLMs cannot account for individual responses to training, safety considerations, and the complex physiological adaptation processes observed by experienced coaches.</p>","PeriodicalId":55365,"journal":{"name":"Biology of Sport","volume":"42 4","pages":"353-366"},"PeriodicalIF":4.2000,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12492345/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biology of Sport","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5114/biolsport.2026.152350","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/10/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"SPORT SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
The aim of this study was to evaluate the quality of resistance training plans for muscle hypertrophy and maximal strength generated by three large language models (LLMs): GPT-3.5 (via ChatGPT and Microsoft Copilot) and Google Gemini (GG). A total of 10 experienced coaches, each with at least a bachelor's degree in exercise science and at least 2 years of coaching experience, rated these plans on a 1-5 Likert scale based on 27 criteria essential for effective training plan design. The LLMs were accessed on April 30, 2024, with a prompt structure that included key training objectives and the training history of a fictional advanced trainee. Results showed that the overall quality of the LLM-generated training plans was moderate. GG outperformed GPT-3.5 (via ChatGPT and Microsoft Copilot) for hypertrophy-related plans on 2 out of 27 criteria (advanced exercise methods, recovery strategies; p < 0.05), while GPT-3.5 (via Microsoft Copilot) outperformed GG for strength-related plans on 1 out of 27 criteria (testing procedure; p < 0.05). Across all criteria, GG received ratings > 3 more frequently than GPT-3.5 (via ChatGPT and Microsoft Copilot), particularly for general aspects, training principles, and training methods. Differences between hypertrophy- and strength-oriented plans within each LLM were minimal, although GPT-3.5 (via ChatGPT) showed the most inconsistency in ratings. Although LLM-generated plans can serve as an initial framework for hypertrophy and strength development, expert supervision remains crucial to refine these plans, as LLMs cannot account for individual responses to training, safety considerations, and the complex physiological adaptation processes observed by experienced coaches.
期刊介绍:
Biology of Sport is the official journal of the Institute of Sport in Warsaw, Poland, published since 1984.
Biology of Sport is an international scientific peer-reviewed journal, published quarterly in both paper and electronic format. The journal publishes articles concerning basic and applied sciences in sport: sports and exercise physiology, sports immunology and medicine, sports genetics, training and testing, pharmacology, as well as in other biological aspects related to sport. Priority is given to inter-disciplinary papers.