大型语言模型对认知和教育的预测接近或优于基因组学或专家评估。

Tobias Wolfram
{"title":"大型语言模型对认知和教育的预测接近或优于基因组学或专家评估。","authors":"Tobias Wolfram","doi":"10.1038/s44271-025-00274-x","DOIUrl":null,"url":null,"abstract":"<p><p>Previous research using standard social survey data has emphasized a relative lack of power when predicting educational and psychological outcomes. Leveraging a unique longitudinal dataset, we explore predictability of educational attainment, cognitive abilities, and non-cognitive traits. Integrating various measures of computational linguistics and large language model-based embeddings within a SuperLearner framework trained on short aspirational essays written at age 11, we accurately predict cognition and non-cognitive traits at the same and later age to a similar degree as teacher assessments, and better than genomic data. The same is true for predicting final educational attainment. Combining text, genetic markers, and teacher assessments into an ensemble model, we can predict cognitive ability at close to test-retest reliability of gold-standard tests ( <math> <msubsup><mrow><mi>R</mi></mrow> <mrow><mi>H</mi> <mi>o</mi> <mi>l</mi> <mi>d</mi> <mi>o</mi> <mi>u</mi> <mi>t</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> <mo>=</mo> <mn>0.7</mn></math> ) and explain 38% of individual differences in attainment. A sociological model comparable to the baseline of the Fragile Family Challenge replicates the FFC's findings regarding the level of predictability achievable with such data. These findings show that recent advances in large language models and machine learning equip behavioural scientists with tools for prediction of psycho-social features.</p>","PeriodicalId":501698,"journal":{"name":"Communications Psychology","volume":"3 1","pages":"95"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12229686/pdf/","citationCount":"0","resultStr":"{\"title\":\"Large language models predict cognition and education close to or better than genomics or expert assessment.\",\"authors\":\"Tobias Wolfram\",\"doi\":\"10.1038/s44271-025-00274-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Previous research using standard social survey data has emphasized a relative lack of power when predicting educational and psychological outcomes. Leveraging a unique longitudinal dataset, we explore predictability of educational attainment, cognitive abilities, and non-cognitive traits. Integrating various measures of computational linguistics and large language model-based embeddings within a SuperLearner framework trained on short aspirational essays written at age 11, we accurately predict cognition and non-cognitive traits at the same and later age to a similar degree as teacher assessments, and better than genomic data. The same is true for predicting final educational attainment. Combining text, genetic markers, and teacher assessments into an ensemble model, we can predict cognitive ability at close to test-retest reliability of gold-standard tests ( <math> <msubsup><mrow><mi>R</mi></mrow> <mrow><mi>H</mi> <mi>o</mi> <mi>l</mi> <mi>d</mi> <mi>o</mi> <mi>u</mi> <mi>t</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> <mo>=</mo> <mn>0.7</mn></math> ) and explain 38% of individual differences in attainment. A sociological model comparable to the baseline of the Fragile Family Challenge replicates the FFC's findings regarding the level of predictability achievable with such data. These findings show that recent advances in large language models and machine learning equip behavioural scientists with tools for prediction of psycho-social features.</p>\",\"PeriodicalId\":501698,\"journal\":{\"name\":\"Communications Psychology\",\"volume\":\"3 1\",\"pages\":\"95\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12229686/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communications Psychology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1038/s44271-025-00274-x\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications Psychology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s44271-025-00274-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

先前使用标准社会调查数据的研究强调,在预测教育和心理结果时,相对缺乏权力。利用独特的纵向数据集,我们探索教育程度、认知能力和非认知特征的可预测性。将计算语言学和基于大型语言模型的嵌入的各种测量方法整合到一个超级学习者框架中,该框架对11岁时写的励志短文进行了训练,我们准确地预测了同龄和以后年龄的认知和非认知特征,其程度与教师评估相似,而且比基因组数据更好。这同样适用于预测最终的教育成就。将文本、遗传标记和教师评估结合到一个集成模型中,我们可以预测接近金标准测试的重测信度的认知能力(R H = 1 = 2 = 0.7),并解释38%的个体成就差异。一个与脆弱家庭挑战基线相当的社会学模型复制了FFC关于利用此类数据可实现的可预测性水平的研究结果。这些发现表明,大型语言模型和机器学习的最新进展为行为科学家提供了预测心理社会特征的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Large language models predict cognition and education close to or better than genomics or expert assessment.

Large language models predict cognition and education close to or better than genomics or expert assessment.

Large language models predict cognition and education close to or better than genomics or expert assessment.

Large language models predict cognition and education close to or better than genomics or expert assessment.

Previous research using standard social survey data has emphasized a relative lack of power when predicting educational and psychological outcomes. Leveraging a unique longitudinal dataset, we explore predictability of educational attainment, cognitive abilities, and non-cognitive traits. Integrating various measures of computational linguistics and large language model-based embeddings within a SuperLearner framework trained on short aspirational essays written at age 11, we accurately predict cognition and non-cognitive traits at the same and later age to a similar degree as teacher assessments, and better than genomic data. The same is true for predicting final educational attainment. Combining text, genetic markers, and teacher assessments into an ensemble model, we can predict cognitive ability at close to test-retest reliability of gold-standard tests ( R H o l d o u t 2 = 0.7 ) and explain 38% of individual differences in attainment. A sociological model comparable to the baseline of the Fragile Family Challenge replicates the FFC's findings regarding the level of predictability achievable with such data. These findings show that recent advances in large language models and machine learning equip behavioural scientists with tools for prediction of psycho-social features.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信