{"title":"Pivot-based Unsupervised Domain Adaptation for Pre-trained Language Model","authors":"Zhang Pengyu, Zhang Wenkang, Xing Zhiqiang","doi":"10.1109/TOCS56154.2022.10016201","DOIUrl":null,"url":null,"abstract":"In the task of text classification, natural language processing technology provides an effective solution for automatically identifying text content classification, but labeled data is difficult to obtain in specific domains. To reduce manual labeling, some researchers have proposed unsupervised domain adaptation technology, which is a special transfer learning technology, transferring source domain models suitable for general knowledge to the target domain with less labeled data, to improve the generalization effect of the model in the target domain. However, the current unsupervised domain adaptation methods are mostly to fine-tune the pre-trained model directly using unsupervised data from the target domain. This method needs a large amount of unsupervised data as usual to improve the training effect. Therefore, this paper presents a pivot-based unsupervised domain adaptation method, which extracts and masks pivots from unsupervised data, fine-tunes the pre-trained language model, and finally validates the method using supervised training, compared with the method of directly using unsupervised data to fine-tune the original model. The pivot-based domain adaptation method effectively improves the efficiency of domain knowledge transfer for the specific domain.","PeriodicalId":227449,"journal":{"name":"2022 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TOCS56154.2022.10016201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the task of text classification, natural language processing technology provides an effective solution for automatically identifying text content classification, but labeled data is difficult to obtain in specific domains. To reduce manual labeling, some researchers have proposed unsupervised domain adaptation technology, which is a special transfer learning technology, transferring source domain models suitable for general knowledge to the target domain with less labeled data, to improve the generalization effect of the model in the target domain. However, the current unsupervised domain adaptation methods are mostly to fine-tune the pre-trained model directly using unsupervised data from the target domain. This method needs a large amount of unsupervised data as usual to improve the training effect. Therefore, this paper presents a pivot-based unsupervised domain adaptation method, which extracts and masks pivots from unsupervised data, fine-tunes the pre-trained language model, and finally validates the method using supervised training, compared with the method of directly using unsupervised data to fine-tune the original model. The pivot-based domain adaptation method effectively improves the efficiency of domain knowledge transfer for the specific domain.