自然语言处理中的嵌入:意义向量表示的理论与进展

Synthesis Lectures on Human Language Technologies Pub Date : 2020-11-12 DOI:10.2200/s01057ed1v01y202009hlt047

José Camacho-Collados, Mohammad Taher Pilehvar

{"title":"自然语言处理中的嵌入:意义向量表示的理论与进展","authors":"José Camacho-Collados, Mohammad Taher Pilehvar","doi":"10.2200/s01057ed1v01y202009hlt047","DOIUrl":null,"url":null,"abstract":"Word vector representations have a long tradition in several research fields, such as cognitive science or computational linguistics. They have been used to represent the meaning of various units of natural languages, including, among others, words, phrases, and sentences. Before the deep learning tsunami, count-based vector space models had been successfully used in computational linguistics to represent the semantics of natural languages. However, the rise of neural networks in NLP popularized the use of word embeddings, which are now applied as pre-trained vectors in most machine learning architectures. This book, written by Mohammad Taher Pilehvar and Jose Camacho-Collados, provides a comprehensive and easy-to-read review of the theory and advances in vector models for NLP, focusing specially on semantic representations and their applications. It is a great introduction to different types of embeddings and the background and motivations behind them. In this sense, the authors adequately present the most relevant concepts and approaches that have been used to build vector representations. They also keep track of the most recent advances of this vibrant and fast-evolving area of research, discussing cross-lingual representations and current language models based on the Transformer. Therefore, this is a useful book for researchers interested in computational methods for semantic representations and artificial intelligence. Although some basic knowledge of machine learning may be necessary to follow a few topics, the book includes clear illustrations and explanations which make it accessible to a wide range of readers. Apart from the preface and the conclusions, the book is organized into eight chapters. In the first two, the authors introduce some of the core ideas of NLP and artificial neural networks, respectively, discussing several concepts that will be useful throughout the book. Then, Chapters 3 to 6 present different types of vector representations at the lexical level (word embeddings, graph embeddings, sense embeddings, and contextualized embeddings), followed by a brief chapter (7) about sentence and document embeddings. For each specific topic, the book includes methods and data sets to assess the quality of the embeddings. Finally, Chapter 8 raises ethical issues involved","PeriodicalId":22125,"journal":{"name":"Synthesis Lectures on Human Language Technologies","volume":"53 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Embeddings in Natural Language Processing: Theory and Advances in Vector Representations of Meaning\",\"authors\":\"José Camacho-Collados, Mohammad Taher Pilehvar\",\"doi\":\"10.2200/s01057ed1v01y202009hlt047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Word vector representations have a long tradition in several research fields, such as cognitive science or computational linguistics. They have been used to represent the meaning of various units of natural languages, including, among others, words, phrases, and sentences. Before the deep learning tsunami, count-based vector space models had been successfully used in computational linguistics to represent the semantics of natural languages. However, the rise of neural networks in NLP popularized the use of word embeddings, which are now applied as pre-trained vectors in most machine learning architectures. This book, written by Mohammad Taher Pilehvar and Jose Camacho-Collados, provides a comprehensive and easy-to-read review of the theory and advances in vector models for NLP, focusing specially on semantic representations and their applications. It is a great introduction to different types of embeddings and the background and motivations behind them. In this sense, the authors adequately present the most relevant concepts and approaches that have been used to build vector representations. They also keep track of the most recent advances of this vibrant and fast-evolving area of research, discussing cross-lingual representations and current language models based on the Transformer. Therefore, this is a useful book for researchers interested in computational methods for semantic representations and artificial intelligence. Although some basic knowledge of machine learning may be necessary to follow a few topics, the book includes clear illustrations and explanations which make it accessible to a wide range of readers. Apart from the preface and the conclusions, the book is organized into eight chapters. In the first two, the authors introduce some of the core ideas of NLP and artificial neural networks, respectively, discussing several concepts that will be useful throughout the book. Then, Chapters 3 to 6 present different types of vector representations at the lexical level (word embeddings, graph embeddings, sense embeddings, and contextualized embeddings), followed by a brief chapter (7) about sentence and document embeddings. For each specific topic, the book includes methods and data sets to assess the quality of the embeddings. Finally, Chapter 8 raises ethical issues involved\",\"PeriodicalId\":22125,\"journal\":{\"name\":\"Synthesis Lectures on Human Language Technologies\",\"volume\":\"53 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Synthesis Lectures on Human Language Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2200/s01057ed1v01y202009hlt047\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Synthesis Lectures on Human Language Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2200/s01057ed1v01y202009hlt047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

词向量表示在认知科学和计算语言学等研究领域有着悠久的传统。它们被用来表示自然语言的各种单位的含义，其中包括单词、短语和句子。在深度学习海啸之前，基于计数的向量空间模型已经成功地用于计算语言学来表示自然语言的语义。然而，神经网络在NLP中的兴起普及了词嵌入的使用，它现在作为预训练向量应用于大多数机器学习架构中。这本书由Mohammad Taher Pilehvar和Jose Camacho-Collados撰写，提供了对NLP矢量模型的理论和进展的全面且易于阅读的回顾，特别关注语义表示及其应用。这是一个很好的介绍不同类型的嵌入及其背后的背景和动机。从这个意义上说，作者充分地提出了用于构建向量表示的最相关的概念和方法。他们还跟踪这个充满活力和快速发展的研究领域的最新进展，讨论基于Transformer的跨语言表示和当前语言模型。因此，对于对语义表示和人工智能的计算方法感兴趣的研究人员来说，这是一本有用的书。虽然一些机器学习的基本知识可能需要遵循一些主题，但本书包括清晰的插图和解释，使其能够被广泛的读者访问。全书除前言和结语外，共分八章。在前两篇中，作者分别介绍了NLP和人工神经网络的一些核心思想，讨论了几个在整本书中都很有用的概念。然后，第3章到第6章在词汇层面介绍了不同类型的向量表示(词嵌入、图嵌入、意义嵌入和语境化嵌入)，随后是关于句子和文档嵌入的简短章节(7)。对于每个特定的主题，本书包括方法和数据集来评估嵌入的质量。最后，第8章提出了相关的伦理问题

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Embeddings in Natural Language Processing: Theory and Advances in Vector Representations of Meaning

Word vector representations have a long tradition in several research fields, such as cognitive science or computational linguistics. They have been used to represent the meaning of various units of natural languages, including, among others, words, phrases, and sentences. Before the deep learning tsunami, count-based vector space models had been successfully used in computational linguistics to represent the semantics of natural languages. However, the rise of neural networks in NLP popularized the use of word embeddings, which are now applied as pre-trained vectors in most machine learning architectures. This book, written by Mohammad Taher Pilehvar and Jose Camacho-Collados, provides a comprehensive and easy-to-read review of the theory and advances in vector models for NLP, focusing specially on semantic representations and their applications. It is a great introduction to different types of embeddings and the background and motivations behind them. In this sense, the authors adequately present the most relevant concepts and approaches that have been used to build vector representations. They also keep track of the most recent advances of this vibrant and fast-evolving area of research, discussing cross-lingual representations and current language models based on the Transformer. Therefore, this is a useful book for researchers interested in computational methods for semantic representations and artificial intelligence. Although some basic knowledge of machine learning may be necessary to follow a few topics, the book includes clear illustrations and explanations which make it accessible to a wide range of readers. Apart from the preface and the conclusions, the book is organized into eight chapters. In the first two, the authors introduce some of the core ideas of NLP and artificial neural networks, respectively, discussing several concepts that will be useful throughout the book. Then, Chapters 3 to 6 present different types of vector representations at the lexical level (word embeddings, graph embeddings, sense embeddings, and contextualized embeddings), followed by a brief chapter (7) about sentence and document embeddings. For each specific topic, the book includes methods and data sets to assess the quality of the embeddings. Finally, Chapter 8 raises ethical issues involved

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Synthesis Lectures on Human Language Technologies

CiteScore

2.30

自引率

0.00%

发文量