{"title":"训练过程中的线性回忆偏差可提高变压器与阅读时间的匹配度","authors":"Christian Clark, Byung-Doh Oh, William Schuler","doi":"arxiv-2409.11250","DOIUrl":null,"url":null,"abstract":"Recent psycholinguistic research has compared human reading times to\nsurprisal estimates from language models to study the factors shaping human\nsentence processing difficulty. Previous studies have shown a strong fit\nbetween surprisal values from Transformers and reading times. However, standard\nTransformers work with a lossless representation of the entire previous\nlinguistic context, unlike models of human language processing that include\nmemory decay. To bridge this gap, this paper evaluates a modification of the\nTransformer model that uses ALiBi (Press et al., 2022), a recency bias added to\nattention scores. Surprisal estimates with ALiBi show an improved fit to human\nreading times compared to a standard Transformer baseline. A subsequent\nanalysis of attention heads suggests that ALiBi's mixture of slopes -- which\ndetermine the rate of memory decay in each attention head -- may play a role in\nthe improvement by helping models with ALiBi to track different kinds of\nlinguistic dependencies.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"1243 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Linear Recency Bias During Training Improves Transformers' Fit to Reading Times\",\"authors\":\"Christian Clark, Byung-Doh Oh, William Schuler\",\"doi\":\"arxiv-2409.11250\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent psycholinguistic research has compared human reading times to\\nsurprisal estimates from language models to study the factors shaping human\\nsentence processing difficulty. Previous studies have shown a strong fit\\nbetween surprisal values from Transformers and reading times. However, standard\\nTransformers work with a lossless representation of the entire previous\\nlinguistic context, unlike models of human language processing that include\\nmemory decay. To bridge this gap, this paper evaluates a modification of the\\nTransformer model that uses ALiBi (Press et al., 2022), a recency bias added to\\nattention scores. Surprisal estimates with ALiBi show an improved fit to human\\nreading times compared to a standard Transformer baseline. A subsequent\\nanalysis of attention heads suggests that ALiBi's mixture of slopes -- which\\ndetermine the rate of memory decay in each attention head -- may play a role in\\nthe improvement by helping models with ALiBi to track different kinds of\\nlinguistic dependencies.\",\"PeriodicalId\":501030,\"journal\":{\"name\":\"arXiv - CS - Computation and Language\",\"volume\":\"1243 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computation and Language\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11250\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11250","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Linear Recency Bias During Training Improves Transformers' Fit to Reading Times
Recent psycholinguistic research has compared human reading times to
surprisal estimates from language models to study the factors shaping human
sentence processing difficulty. Previous studies have shown a strong fit
between surprisal values from Transformers and reading times. However, standard
Transformers work with a lossless representation of the entire previous
linguistic context, unlike models of human language processing that include
memory decay. To bridge this gap, this paper evaluates a modification of the
Transformer model that uses ALiBi (Press et al., 2022), a recency bias added to
attention scores. Surprisal estimates with ALiBi show an improved fit to human
reading times compared to a standard Transformer baseline. A subsequent
analysis of attention heads suggests that ALiBi's mixture of slopes -- which
determine the rate of memory decay in each attention head -- may play a role in
the improvement by helping models with ALiBi to track different kinds of
linguistic dependencies.