{"title":"Embeddings in Natural Language Processing: Theory and Advances in Vector Representations of Meaning","authors":"Marcos Garcia","doi":"10.1162/coli_r_00410","DOIUrl":null,"url":null,"abstract":"Word vector representations have a long tradition in several research fields, such as cognitive science or computational linguistics. They have been used to represent the meaning of various units of natural languages, including, among others, words, phrases, and sentences. Before the deep learning tsunami, count-based vector space models had been successfully used in computational linguistics to represent the semantics of natural languages. However, the rise of neural networks in NLP popularized the use of word embeddings, which are now applied as pre-trained vectors in most machine learning architectures. This book, written by Mohammad Taher Pilehvar and Jose Camacho-Collados, provides a comprehensive and easy-to-read review of the theory and advances in vector models for NLP, focusing specially on semantic representations and their applications. It is a great introduction to different types of embeddings and the background and motivations behind them. In this sense, the authors adequately present the most relevant concepts and approaches that have been used to build vector representations. They also keep track of the most recent advances of this vibrant and fast-evolving area of research, discussing cross-lingual representations and current language models based on the Transformer. Therefore, this is a useful book for researchers interested in computational methods for semantic representations and artificial intelligence. Although some basic knowledge of machine learning may be necessary to follow a few topics, the book includes clear illustrations and explanations, which make it accessible to a wide range of readers. Apart from the preface and the conclusions, the book is organized into eight chapters. In the first two, the authors introduce some of the core ideas of NLP and artificial neural networks, respectively, discussing several concepts that will be useful throughout the book. Then, Chapters 3 to 6 present different types of vector representations at the lexical level (word embeddings, graph embeddings, sense embeddings, and contextualized embeddings), followed by a brief chapter (7) about sentence and document embeddings. For each specific topic, the book includes methods and data sets to assess the quality of the embeddings. Finally, Chapter 8 raises ethical issues involved","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":" ","pages":"1-3"},"PeriodicalIF":3.7000,"publicationDate":"2021-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Linguistics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/coli_r_00410","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 37
Abstract
Word vector representations have a long tradition in several research fields, such as cognitive science or computational linguistics. They have been used to represent the meaning of various units of natural languages, including, among others, words, phrases, and sentences. Before the deep learning tsunami, count-based vector space models had been successfully used in computational linguistics to represent the semantics of natural languages. However, the rise of neural networks in NLP popularized the use of word embeddings, which are now applied as pre-trained vectors in most machine learning architectures. This book, written by Mohammad Taher Pilehvar and Jose Camacho-Collados, provides a comprehensive and easy-to-read review of the theory and advances in vector models for NLP, focusing specially on semantic representations and their applications. It is a great introduction to different types of embeddings and the background and motivations behind them. In this sense, the authors adequately present the most relevant concepts and approaches that have been used to build vector representations. They also keep track of the most recent advances of this vibrant and fast-evolving area of research, discussing cross-lingual representations and current language models based on the Transformer. Therefore, this is a useful book for researchers interested in computational methods for semantic representations and artificial intelligence. Although some basic knowledge of machine learning may be necessary to follow a few topics, the book includes clear illustrations and explanations, which make it accessible to a wide range of readers. Apart from the preface and the conclusions, the book is organized into eight chapters. In the first two, the authors introduce some of the core ideas of NLP and artificial neural networks, respectively, discussing several concepts that will be useful throughout the book. Then, Chapters 3 to 6 present different types of vector representations at the lexical level (word embeddings, graph embeddings, sense embeddings, and contextualized embeddings), followed by a brief chapter (7) about sentence and document embeddings. For each specific topic, the book includes methods and data sets to assess the quality of the embeddings. Finally, Chapter 8 raises ethical issues involved
期刊介绍:
Computational Linguistics, the longest-running publication dedicated solely to the computational and mathematical aspects of language and the design of natural language processing systems, provides university and industry linguists, computational linguists, AI and machine learning researchers, cognitive scientists, speech specialists, and philosophers with the latest insights into the computational aspects of language research.