{"title":"“打不过他们,就加入他们”:一种基于词变换的复合词嵌入广义跳跃图","authors":"Debasis Ganguly, Shripad Bhat, Chandan Biswas","doi":"10.1145/3574318.3574346","DOIUrl":null,"url":null,"abstract":"While a class of data-driven approaches has been shown to be effective in embedding words of languages that are relatively simple as per inflections and compounding characteristics (e.g. English), an open area of investigation is ways of integrating language-specific characteristics within the framework of an embedding model. Standard word embedding approaches, such as word2vec, Glove etc. embed each word into a high dimensional dense vector. However, these approaches may not adequately capture the inherent linguistic phenomenon namely that of word compounding. We propose a stochastic word transformation based generalization of the skip-gram algorithm, which seeks to potentially improve the representation of the compositional compound words by leveraging information from the contexts of their constituents. Our experiments show that addressing the compounding effect of a language as a part of the word embedding objective outperforms existing methods of compounding-specific post-transformation based approaches on word semantics prediction and word polarity prediction tasks.","PeriodicalId":270700,"journal":{"name":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"“If you can’t beat them, join them”: A Word Transformation based Generalized Skip-gram for Embedding Compound Words\",\"authors\":\"Debasis Ganguly, Shripad Bhat, Chandan Biswas\",\"doi\":\"10.1145/3574318.3574346\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While a class of data-driven approaches has been shown to be effective in embedding words of languages that are relatively simple as per inflections and compounding characteristics (e.g. English), an open area of investigation is ways of integrating language-specific characteristics within the framework of an embedding model. Standard word embedding approaches, such as word2vec, Glove etc. embed each word into a high dimensional dense vector. However, these approaches may not adequately capture the inherent linguistic phenomenon namely that of word compounding. We propose a stochastic word transformation based generalization of the skip-gram algorithm, which seeks to potentially improve the representation of the compositional compound words by leveraging information from the contexts of their constituents. Our experiments show that addressing the compounding effect of a language as a part of the word embedding objective outperforms existing methods of compounding-specific post-transformation based approaches on word semantics prediction and word polarity prediction tasks.\",\"PeriodicalId\":270700,\"journal\":{\"name\":\"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3574318.3574346\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3574318.3574346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
“If you can’t beat them, join them”: A Word Transformation based Generalized Skip-gram for Embedding Compound Words
While a class of data-driven approaches has been shown to be effective in embedding words of languages that are relatively simple as per inflections and compounding characteristics (e.g. English), an open area of investigation is ways of integrating language-specific characteristics within the framework of an embedding model. Standard word embedding approaches, such as word2vec, Glove etc. embed each word into a high dimensional dense vector. However, these approaches may not adequately capture the inherent linguistic phenomenon namely that of word compounding. We propose a stochastic word transformation based generalization of the skip-gram algorithm, which seeks to potentially improve the representation of the compositional compound words by leveraging information from the contexts of their constituents. Our experiments show that addressing the compounding effect of a language as a part of the word embedding objective outperforms existing methods of compounding-specific post-transformation based approaches on word semantics prediction and word polarity prediction tasks.