Few-shot Learning Named Entity Recognition of Pressure Sensor Patent Text Based on MLM

Yue Deng, Honghui Li, Xueliang Fu
{"title":"Few-shot Learning Named Entity Recognition of Pressure Sensor Patent Text Based on MLM","authors":"Yue Deng, Honghui Li, Xueliang Fu","doi":"10.1109/TOCS53301.2021.9688929","DOIUrl":null,"url":null,"abstract":"Abstract of patent text, as an important support for intellectual property protection, is an ideal data source for technology mining. Named entity recognition of patent text can reduce the workload of patent analysis, improve work efficiency, and provide effective technical means for patent discovery, patent promotion, patent infringement and other aspects. However, the technical terms of patent texts are difficult to be mined, extracted and labeled. Therefore, this paper proposes a few-shot learning named entity recognition method to solve the problem that the named entity recognition of pressure sensor patent text lacks sufficient annotation data.This method uses MLM (Masked Language Model) pretraining method of BERT Model, selects a small part of token to mask each time, and then repeatedly trains on the same sample, finally obtains the training embedding of bidirectional fusion information on massive continuous corpus. Then the CRF layer is used to decode and finally the prediction tag sequence is obtained. Experiments on 55 patent abstracts and 34 patent abstracts in the field of pressure sensor preparation, the simulation results show that the proposed method can improve the recognition accuracy by about 10% compared with the traditional machine learning model (HMM, CRF) in the case of small samples. Compared with the deep learning model (BI-LSTM and BiLSTM+CRF), the accuracy of the model is improved by about 30%, and the accuracy of the model is 93%.","PeriodicalId":360004,"journal":{"name":"2021 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TOCS53301.2021.9688929","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract of patent text, as an important support for intellectual property protection, is an ideal data source for technology mining. Named entity recognition of patent text can reduce the workload of patent analysis, improve work efficiency, and provide effective technical means for patent discovery, patent promotion, patent infringement and other aspects. However, the technical terms of patent texts are difficult to be mined, extracted and labeled. Therefore, this paper proposes a few-shot learning named entity recognition method to solve the problem that the named entity recognition of pressure sensor patent text lacks sufficient annotation data.This method uses MLM (Masked Language Model) pretraining method of BERT Model, selects a small part of token to mask each time, and then repeatedly trains on the same sample, finally obtains the training embedding of bidirectional fusion information on massive continuous corpus. Then the CRF layer is used to decode and finally the prediction tag sequence is obtained. Experiments on 55 patent abstracts and 34 patent abstracts in the field of pressure sensor preparation, the simulation results show that the proposed method can improve the recognition accuracy by about 10% compared with the traditional machine learning model (HMM, CRF) in the case of small samples. Compared with the deep learning model (BI-LSTM and BiLSTM+CRF), the accuracy of the model is improved by about 30%, and the accuracy of the model is 93%.
基于MLM的压力传感器专利文本小样本学习命名实体识别
专利文本摘要作为知识产权保护的重要支撑,是技术挖掘的理想数据源。专利文本的命名实体识别可以减少专利分析的工作量,提高工作效率,为专利发现、专利推广、专利侵权等方面提供有效的技术手段。然而,专利文本的技术术语难以挖掘、提取和标记。因此,本文提出了一种少次学习命名实体识别方法,以解决压力传感器专利文本命名实体识别缺乏足够标注数据的问题。该方法采用BERT模型中的MLM (mask Language Model)预训练方法,每次选取一小部分token进行掩码,然后在同一样本上重复训练,最终得到在海量连续语料库上双向融合信息的训练嵌入。然后利用CRF层进行解码,最后得到预测标签序列。对压力传感器制备领域的55个专利摘要和34个专利摘要进行了实验,仿真结果表明,在小样本情况下,与传统机器学习模型(HMM、CRF)相比,所提方法的识别准确率可提高10%左右。与深度学习模型(BI-LSTM和BiLSTM+CRF)相比,该模型的准确率提高了约30%,模型的准确率达到93%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信