Bohan Xu, Nick Obradovich, Wenjie Zheng, Robert Loughnan, Lucy Shao, Masaya Misaki, Wesley K Thompson, Martin Paulus, Chun Chieh Fan
{"title":"预训练的大型语言模型的编码反映了人类心理特征的遗传结构。","authors":"Bohan Xu, Nick Obradovich, Wenjie Zheng, Robert Loughnan, Lucy Shao, Masaya Misaki, Wesley K Thompson, Martin Paulus, Chun Chieh Fan","doi":"10.1101/2025.03.27.25324744","DOIUrl":null,"url":null,"abstract":"<p><p>Recent advances in large language models (LLMs) have prompted a frenzy in utilizing them as universal translators for biomedical terms. However, the black box nature of LLMs has forced researchers to rely on artificially designed benchmarks without understanding what exactly LLMs encode. We demonstrate that pretrained LLMs can already explain up to 51% of the genetic correlation between items from a psychometrically-validated neuroticism questionnaire, without any fine-tuning. For psychiatric diagnoses, we found disorder names aligned better with genetic relationships than diagnostic descriptions. Our results indicate the pretrained LLMs have encodings mirroring genetic architectures. These findings highlight LLMs' potential for validating phenotypes, refining taxonomies, and integrating textual and genetic data in mental health research.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11974973/pdf/","citationCount":"0","resultStr":"{\"title\":\"Encoding of pretrained large language models mirrors the genetic architectures of human psychological traits.\",\"authors\":\"Bohan Xu, Nick Obradovich, Wenjie Zheng, Robert Loughnan, Lucy Shao, Masaya Misaki, Wesley K Thompson, Martin Paulus, Chun Chieh Fan\",\"doi\":\"10.1101/2025.03.27.25324744\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Recent advances in large language models (LLMs) have prompted a frenzy in utilizing them as universal translators for biomedical terms. However, the black box nature of LLMs has forced researchers to rely on artificially designed benchmarks without understanding what exactly LLMs encode. We demonstrate that pretrained LLMs can already explain up to 51% of the genetic correlation between items from a psychometrically-validated neuroticism questionnaire, without any fine-tuning. For psychiatric diagnoses, we found disorder names aligned better with genetic relationships than diagnostic descriptions. Our results indicate the pretrained LLMs have encodings mirroring genetic architectures. These findings highlight LLMs' potential for validating phenotypes, refining taxonomies, and integrating textual and genetic data in mental health research.</p>\",\"PeriodicalId\":94281,\"journal\":{\"name\":\"medRxiv : the preprint server for health sciences\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11974973/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"medRxiv : the preprint server for health sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2025.03.27.25324744\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv : the preprint server for health sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2025.03.27.25324744","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Encoding of pretrained large language models mirrors the genetic architectures of human psychological traits.
Recent advances in large language models (LLMs) have prompted a frenzy in utilizing them as universal translators for biomedical terms. However, the black box nature of LLMs has forced researchers to rely on artificially designed benchmarks without understanding what exactly LLMs encode. We demonstrate that pretrained LLMs can already explain up to 51% of the genetic correlation between items from a psychometrically-validated neuroticism questionnaire, without any fine-tuning. For psychiatric diagnoses, we found disorder names aligned better with genetic relationships than diagnostic descriptions. Our results indicate the pretrained LLMs have encodings mirroring genetic architectures. These findings highlight LLMs' potential for validating phenotypes, refining taxonomies, and integrating textual and genetic data in mental health research.