{"title":"A Brief Overview of Universal Sentence Representation Methods: A Linguistic View","authors":"Ruiqi Li, Xiang Zhao, M. Moens","doi":"10.1145/3482853","DOIUrl":null,"url":null,"abstract":"How to transfer the semantic information in a sentence to a computable numerical embedding form is a fundamental problem in natural language processing. An informative universal sentence embedding can greatly promote subsequent natural language processing tasks. However, unlike universal word embeddings, a widely accepted general-purpose sentence embedding technique has not been developed. This survey summarizes the current universal sentence-embedding methods, categorizes them into four groups from a linguistic view, and ultimately analyzes their reported performance. Sentence embeddings trained from words in a bottom-up manner are observed to have different, nearly opposite, performance patterns in downstream tasks compared to those trained from logical relationships between sentences. By comparing differences of training schemes in and between groups, we analyze possible essential reasons for different performance patterns. We additionally collect incentive strategies handling sentences from other models and propose potentially inspiring future research directions.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"378 1","pages":"1 - 42"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys (CSUR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3482853","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
How to transfer the semantic information in a sentence to a computable numerical embedding form is a fundamental problem in natural language processing. An informative universal sentence embedding can greatly promote subsequent natural language processing tasks. However, unlike universal word embeddings, a widely accepted general-purpose sentence embedding technique has not been developed. This survey summarizes the current universal sentence-embedding methods, categorizes them into four groups from a linguistic view, and ultimately analyzes their reported performance. Sentence embeddings trained from words in a bottom-up manner are observed to have different, nearly opposite, performance patterns in downstream tasks compared to those trained from logical relationships between sentences. By comparing differences of training schemes in and between groups, we analyze possible essential reasons for different performance patterns. We additionally collect incentive strategies handling sentences from other models and propose potentially inspiring future research directions.