Fusing domain-specific data with general data for in-domain applications

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics Pub Date : 2017-08-23 DOI:10.1145/3106426.3106473

An-Zi Yen, Hen-Hsen Huang, Hsin-Hsi Chen

引用次数: 1

Abstract

This paper analyzes the lexical semantics of domain-specific terms based on various pre-trained specific domain and general domain word vectors, and addresses the semantic drift between domains. To capture lexical semantics in the specific domain, we propose a bridge mechanism to introduce domain-specific data into general data, and re-train word vectors. We find that even a small-scale fusion can result in the similar lexical semantics learned by using the large-scale domain-specific dataset. Experiments on sentiment analysis and outlier detection show that application of word embedding by the fusion dataset has the better performance than applications of word embeddings by pure large domain-specific and pure large general datasets. The simple, but effective methodology facilitates the domain adaptation of distributed word representations.

查看原文本刊更多论文

将特定于领域的数据与域内应用程序的通用数据融合

本文基于各种预训练的特定领域和一般领域词向量，分析了领域特定术语的词汇语义，并解决了领域之间的语义漂移问题。为了捕获特定领域的词汇语义，我们提出了一种桥接机制，将特定领域的数据引入到一般数据中，并重新训练词向量。我们发现，即使是小规模的融合也能产生与使用大规模特定领域数据集学习到的相似的词汇语义。情感分析和离群点检测实验表明，融合数据集的词嵌入应用比纯大型特定领域和纯大型通用数据集的词嵌入应用具有更好的性能。这种简单而有效的方法促进了分布式词表示的领域适应。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics

自引率

0.00%

发文量