工程语言:面向工程的特定领域词嵌入模型的训练

Daniel Braun, Oleksandra Klymenko, Tim Schopf, Yusuf KAAN AKAN, F. Matthes
{"title":"工程语言:面向工程的特定领域词嵌入模型的训练","authors":"Daniel Braun, Oleksandra Klymenko, Tim Schopf, Yusuf KAAN AKAN, F. Matthes","doi":"10.1145/3460824.3460826","DOIUrl":null,"url":null,"abstract":"Since the introduction of Word2Vec in 2013, so-called word embeddings, dense vector representation of words that are supposed to capture their semantic meaning, have become a universally applied technique in a wide range of Natural Language Processing (NLP) tasks and domains. The vector representations they provide are learned on huge corpora of unlabeled text data. Due to the large amount of data and computing power that is necessary to train such embedding models, very often, pre-trained models are applied which have been trained on domain unspecific data like newspaper articles or Wikipedia entries. In this paper, we present a domain-specific embedding model that is trained exclusively on texts from the domain of engineering. We will show that such a domain-specific embeddings model performs better in different NLP tasks and can therefore help to improve NLP-based AI in the domain of Engineering.","PeriodicalId":315518,"journal":{"name":"Proceedings of the 2021 3rd International Conference on Management Science and Industrial Engineering","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"The Language of Engineering: Training a Domain-Specific Word Embedding Model for Engineering\",\"authors\":\"Daniel Braun, Oleksandra Klymenko, Tim Schopf, Yusuf KAAN AKAN, F. Matthes\",\"doi\":\"10.1145/3460824.3460826\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since the introduction of Word2Vec in 2013, so-called word embeddings, dense vector representation of words that are supposed to capture their semantic meaning, have become a universally applied technique in a wide range of Natural Language Processing (NLP) tasks and domains. The vector representations they provide are learned on huge corpora of unlabeled text data. Due to the large amount of data and computing power that is necessary to train such embedding models, very often, pre-trained models are applied which have been trained on domain unspecific data like newspaper articles or Wikipedia entries. In this paper, we present a domain-specific embedding model that is trained exclusively on texts from the domain of engineering. We will show that such a domain-specific embeddings model performs better in different NLP tasks and can therefore help to improve NLP-based AI in the domain of Engineering.\",\"PeriodicalId\":315518,\"journal\":{\"name\":\"Proceedings of the 2021 3rd International Conference on Management Science and Industrial Engineering\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 3rd International Conference on Management Science and Industrial Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3460824.3460826\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 3rd International Conference on Management Science and Industrial Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460824.3460826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

自2013年引入Word2Vec以来,所谓的词嵌入,即应该捕获其语义的词的密集向量表示,已经成为广泛应用于自然语言处理(NLP)任务和领域的一种普遍应用技术。它们提供的向量表示是在大量未标记文本数据的语料库上学习到的。由于训练这样的嵌入模型需要大量的数据和计算能力,所以通常使用预先训练好的模型,这些模型是在报纸文章或维基百科条目等领域非特定数据上训练的。在本文中,我们提出了一个领域特定的嵌入模型,该模型专门针对来自工程领域的文本进行训练。我们将证明,这种特定领域的嵌入模型在不同的NLP任务中表现更好,因此可以帮助改进工程领域中基于NLP的人工智能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The Language of Engineering: Training a Domain-Specific Word Embedding Model for Engineering
Since the introduction of Word2Vec in 2013, so-called word embeddings, dense vector representation of words that are supposed to capture their semantic meaning, have become a universally applied technique in a wide range of Natural Language Processing (NLP) tasks and domains. The vector representations they provide are learned on huge corpora of unlabeled text data. Due to the large amount of data and computing power that is necessary to train such embedding models, very often, pre-trained models are applied which have been trained on domain unspecific data like newspaper articles or Wikipedia entries. In this paper, we present a domain-specific embedding model that is trained exclusively on texts from the domain of engineering. We will show that such a domain-specific embeddings model performs better in different NLP tasks and can therefore help to improve NLP-based AI in the domain of Engineering.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信