{"title":"Fusing domain-specific data with general data for in-domain applications","authors":"An-Zi Yen, Hen-Hsen Huang, Hsin-Hsi Chen","doi":"10.1145/3106426.3106473","DOIUrl":null,"url":null,"abstract":"This paper analyzes the lexical semantics of domain-specific terms based on various pre-trained specific domain and general domain word vectors, and addresses the semantic drift between domains. To capture lexical semantics in the specific domain, we propose a bridge mechanism to introduce domain-specific data into general data, and re-train word vectors. We find that even a small-scale fusion can result in the similar lexical semantics learned by using the large-scale domain-specific dataset. Experiments on sentiment analysis and outlier detection show that application of word embedding by the fusion dataset has the better performance than applications of word embeddings by pure large domain-specific and pure large general datasets. The simple, but effective methodology facilitates the domain adaptation of distributed word representations.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106426.3106473","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper analyzes the lexical semantics of domain-specific terms based on various pre-trained specific domain and general domain word vectors, and addresses the semantic drift between domains. To capture lexical semantics in the specific domain, we propose a bridge mechanism to introduce domain-specific data into general data, and re-train word vectors. We find that even a small-scale fusion can result in the similar lexical semantics learned by using the large-scale domain-specific dataset. Experiments on sentiment analysis and outlier detection show that application of word embedding by the fusion dataset has the better performance than applications of word embeddings by pure large domain-specific and pure large general datasets. The simple, but effective methodology facilitates the domain adaptation of distributed word representations.