{"title":"自然语言处理中质量控制词嵌入的计算效率学习","authors":"M. Alawad, G. Tourassi","doi":"10.1109/ISVLSI.2019.00033","DOIUrl":null,"url":null,"abstract":"Deep learning (DL) has been used for many natural language processing (NLP) tasks due to its superior performance as compared to traditional machine learning approaches. In DL models for NLP, words are represented using word embeddings, which capture both semantic and syntactic information in text. However, 90-95% of the DL trainable parameters are associated with the word embeddings, resulting in a large storage or memory footprint. Therefore, reducing the number of word embedding parameters is critical, especially with the increase of vocabulary size. In this work, we propose a novel approximate word embeddings approach for convolutional neural networks (CNNs) used for text classification tasks. The proposed approach significantly reduces the number of model trainable parameters without noticeably sacrificing in computing performance accuracy. Compared to other techniques, our proposed word embeddings technique does not require modifications to the DL model architecture. We evaluate the performance of the the proposed word embeddings on three classification tasks using two datasets, composed of Yelp and Amazon reviews. The results show that the proposed method can reduce the number of word embeddings parameters by 98% and 99% for the Yelp and Amazon datasets respectively, with no drop in computing accuracy.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"9 1","pages":"134-139"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing\",\"authors\":\"M. Alawad, G. Tourassi\",\"doi\":\"10.1109/ISVLSI.2019.00033\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning (DL) has been used for many natural language processing (NLP) tasks due to its superior performance as compared to traditional machine learning approaches. In DL models for NLP, words are represented using word embeddings, which capture both semantic and syntactic information in text. However, 90-95% of the DL trainable parameters are associated with the word embeddings, resulting in a large storage or memory footprint. Therefore, reducing the number of word embedding parameters is critical, especially with the increase of vocabulary size. In this work, we propose a novel approximate word embeddings approach for convolutional neural networks (CNNs) used for text classification tasks. The proposed approach significantly reduces the number of model trainable parameters without noticeably sacrificing in computing performance accuracy. Compared to other techniques, our proposed word embeddings technique does not require modifications to the DL model architecture. We evaluate the performance of the the proposed word embeddings on three classification tasks using two datasets, composed of Yelp and Amazon reviews. The results show that the proposed method can reduce the number of word embeddings parameters by 98% and 99% for the Yelp and Amazon datasets respectively, with no drop in computing accuracy.\",\"PeriodicalId\":6703,\"journal\":{\"name\":\"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)\",\"volume\":\"9 1\",\"pages\":\"134-139\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISVLSI.2019.00033\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2019.00033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing
Deep learning (DL) has been used for many natural language processing (NLP) tasks due to its superior performance as compared to traditional machine learning approaches. In DL models for NLP, words are represented using word embeddings, which capture both semantic and syntactic information in text. However, 90-95% of the DL trainable parameters are associated with the word embeddings, resulting in a large storage or memory footprint. Therefore, reducing the number of word embedding parameters is critical, especially with the increase of vocabulary size. In this work, we propose a novel approximate word embeddings approach for convolutional neural networks (CNNs) used for text classification tasks. The proposed approach significantly reduces the number of model trainable parameters without noticeably sacrificing in computing performance accuracy. Compared to other techniques, our proposed word embeddings technique does not require modifications to the DL model architecture. We evaluate the performance of the the proposed word embeddings on three classification tasks using two datasets, composed of Yelp and Amazon reviews. The results show that the proposed method can reduce the number of word embeddings parameters by 98% and 99% for the Yelp and Amazon datasets respectively, with no drop in computing accuracy.