Chenrui Mao , Kai Shuang , Jinyu Guo , Bing Qian , Yu Yang , Haoqing Li
{"title":"基于认知对齐的句子嵌入频率滤波","authors":"Chenrui Mao , Kai Shuang , Jinyu Guo , Bing Qian , Yu Yang , Haoqing Li","doi":"10.1016/j.ipm.2025.104415","DOIUrl":null,"url":null,"abstract":"<div><div>Learning better sentence embeddings that capture precise semantic plays an important role in Natural Language Processing (NLP). The Sentence Textual Similarity (STS) of embeddings reflects their semantic precision, as this task requires a direct comparison of semantic meanings in vector space. Thus, we focus on improving the ability of sentence embeddings to capture semantic similarity. From the perspective of human cognition, we identify a critical cognitive gap in frequency-domain semantic representation: while semantic information is distributed across all frequency components of embeddings, the human selective attention mechanism suggests that only specific frequency bands are utilized for semantic processing. This frequency-domain cognitive gap leads to semantic redundancy in machine-learned embeddings, which is particularly detrimental for tasks requiring redundancy-resistant representations. To bridge this gap, we propose a simple <strong>C</strong>ognition-<strong>A</strong>ligned <strong>F</strong>requency <strong>F</strong>iltering (CAFF) method for unsupervised embedding training on <span><math><mrow><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>6</mn></mrow></msup></mrow></math></span> sentences from Wikipedia. CAFF introduces a self-adaptive Frequency Filtering Unit (FFU) to modulate the frequency components of embedding. The FFU functions as a filtering mechanism that suppresses irrelevant components in embeddings to mitigate semantic redundancy. Extensive evaluations with SentEval show that our embeddings improve over the initial encoder by 2.33% on the STS task, achieving state-of-the-art performance. Additionally, our results demonstrate improved performance on both transfer and retrieval tasks.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104415"},"PeriodicalIF":6.9000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cognition-aligned frequency filtering for sentence embeddings\",\"authors\":\"Chenrui Mao , Kai Shuang , Jinyu Guo , Bing Qian , Yu Yang , Haoqing Li\",\"doi\":\"10.1016/j.ipm.2025.104415\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Learning better sentence embeddings that capture precise semantic plays an important role in Natural Language Processing (NLP). The Sentence Textual Similarity (STS) of embeddings reflects their semantic precision, as this task requires a direct comparison of semantic meanings in vector space. Thus, we focus on improving the ability of sentence embeddings to capture semantic similarity. From the perspective of human cognition, we identify a critical cognitive gap in frequency-domain semantic representation: while semantic information is distributed across all frequency components of embeddings, the human selective attention mechanism suggests that only specific frequency bands are utilized for semantic processing. This frequency-domain cognitive gap leads to semantic redundancy in machine-learned embeddings, which is particularly detrimental for tasks requiring redundancy-resistant representations. To bridge this gap, we propose a simple <strong>C</strong>ognition-<strong>A</strong>ligned <strong>F</strong>requency <strong>F</strong>iltering (CAFF) method for unsupervised embedding training on <span><math><mrow><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>6</mn></mrow></msup></mrow></math></span> sentences from Wikipedia. CAFF introduces a self-adaptive Frequency Filtering Unit (FFU) to modulate the frequency components of embedding. The FFU functions as a filtering mechanism that suppresses irrelevant components in embeddings to mitigate semantic redundancy. Extensive evaluations with SentEval show that our embeddings improve over the initial encoder by 2.33% on the STS task, achieving state-of-the-art performance. Additionally, our results demonstrate improved performance on both transfer and retrieval tasks.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"63 2\",\"pages\":\"Article 104415\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457325003565\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325003565","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Cognition-aligned frequency filtering for sentence embeddings
Learning better sentence embeddings that capture precise semantic plays an important role in Natural Language Processing (NLP). The Sentence Textual Similarity (STS) of embeddings reflects their semantic precision, as this task requires a direct comparison of semantic meanings in vector space. Thus, we focus on improving the ability of sentence embeddings to capture semantic similarity. From the perspective of human cognition, we identify a critical cognitive gap in frequency-domain semantic representation: while semantic information is distributed across all frequency components of embeddings, the human selective attention mechanism suggests that only specific frequency bands are utilized for semantic processing. This frequency-domain cognitive gap leads to semantic redundancy in machine-learned embeddings, which is particularly detrimental for tasks requiring redundancy-resistant representations. To bridge this gap, we propose a simple Cognition-Aligned Frequency Filtering (CAFF) method for unsupervised embedding training on sentences from Wikipedia. CAFF introduces a self-adaptive Frequency Filtering Unit (FFU) to modulate the frequency components of embedding. The FFU functions as a filtering mechanism that suppresses irrelevant components in embeddings to mitigate semantic redundancy. Extensive evaluations with SentEval show that our embeddings improve over the initial encoder by 2.33% on the STS task, achieving state-of-the-art performance. Additionally, our results demonstrate improved performance on both transfer and retrieval tasks.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.