{"title":"高技术产业政策的机器阅读理解:一个新的数据集和中文预训练语言模型","authors":"Changchang Zeng, Shaobo Li, B. Chen","doi":"10.1109/TOCS53301.2021.9688582","DOIUrl":null,"url":null,"abstract":"Machine reading comprehension (MRC) is a challenging research hotspot in the field of Artificial Intelligence (AI). It can be applied to many scenarios, such as intelligent question answering, intelligent document retrieval, and so on. In this article, we focus on the machine reading comprehension of high-tech industry policy texts in China. First, we create a cloze style machine reading comprehension dataset of Chinese high-tech industrial policies. Next, we propose a new pre-training objective named multi-segment ordering discriminator, and we also use domain-specific dictionary to improve the MLM pre-training process. Finally, on our dataset, we trained a new pre-trained language model for machine reading comprehension of Chinese industrial policies. Experiment results show that our pre-trained language model surpasses existing models such as BERT and RoBERTa in the new dataset.","PeriodicalId":360004,"journal":{"name":"2021 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Reading Comprehension of High-Tech Industry Policies: A New Dataset and Chinese Pre-Trained Language Model\",\"authors\":\"Changchang Zeng, Shaobo Li, B. Chen\",\"doi\":\"10.1109/TOCS53301.2021.9688582\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine reading comprehension (MRC) is a challenging research hotspot in the field of Artificial Intelligence (AI). It can be applied to many scenarios, such as intelligent question answering, intelligent document retrieval, and so on. In this article, we focus on the machine reading comprehension of high-tech industry policy texts in China. First, we create a cloze style machine reading comprehension dataset of Chinese high-tech industrial policies. Next, we propose a new pre-training objective named multi-segment ordering discriminator, and we also use domain-specific dictionary to improve the MLM pre-training process. Finally, on our dataset, we trained a new pre-trained language model for machine reading comprehension of Chinese industrial policies. Experiment results show that our pre-trained language model surpasses existing models such as BERT and RoBERTa in the new dataset.\",\"PeriodicalId\":360004,\"journal\":{\"name\":\"2021 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TOCS53301.2021.9688582\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TOCS53301.2021.9688582","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Machine Reading Comprehension of High-Tech Industry Policies: A New Dataset and Chinese Pre-Trained Language Model
Machine reading comprehension (MRC) is a challenging research hotspot in the field of Artificial Intelligence (AI). It can be applied to many scenarios, such as intelligent question answering, intelligent document retrieval, and so on. In this article, we focus on the machine reading comprehension of high-tech industry policy texts in China. First, we create a cloze style machine reading comprehension dataset of Chinese high-tech industrial policies. Next, we propose a new pre-training objective named multi-segment ordering discriminator, and we also use domain-specific dictionary to improve the MLM pre-training process. Finally, on our dataset, we trained a new pre-trained language model for machine reading comprehension of Chinese industrial policies. Experiment results show that our pre-trained language model surpasses existing models such as BERT and RoBERTa in the new dataset.