Classification of Chinese Text in the Power Field Using FastText with Electric Power Keywords

Changan Li, Feihu Hu, Yanpeng Wang
{"title":"Classification of Chinese Text in the Power Field Using FastText with Electric Power Keywords","authors":"Changan Li, Feihu Hu, Yanpeng Wang","doi":"10.1109/ipec54454.2022.9777517","DOIUrl":null,"url":null,"abstract":"Text classification has always been a hot research topic in the field of natural language processing. Fasttext is a model which is simple in structure and has the advantage of fast and efficient. In order to analyze the situation of electric power intellectual property in recent years, this paper use the model of FastText to process Chinese electric power core journals and patents in recent five years. The text information in the company's intellectual property rights is divided into a series of phrases representing individual meanings though word segmentation basing on the Jieba word segmentation tool. The stop words unrelated to the power industry in the phrases are removed, and the word related the power industry frequency corresponding to the phrases and the frequency of occurrence in the text are counted. According to the characteristics of power industry word aggregation, the phrase can be divided into 26 categories. The text classification model based on keywords of power industry was designed, and the machine learning algorithm Fasttext is used to complete the text classification. Based on each subdivision field of power, the publication situation of intellectual property is analyzed and counted, and the results of power system intellectual property are visualized.","PeriodicalId":232563,"journal":{"name":"2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipec54454.2022.9777517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Text classification has always been a hot research topic in the field of natural language processing. Fasttext is a model which is simple in structure and has the advantage of fast and efficient. In order to analyze the situation of electric power intellectual property in recent years, this paper use the model of FastText to process Chinese electric power core journals and patents in recent five years. The text information in the company's intellectual property rights is divided into a series of phrases representing individual meanings though word segmentation basing on the Jieba word segmentation tool. The stop words unrelated to the power industry in the phrases are removed, and the word related the power industry frequency corresponding to the phrases and the frequency of occurrence in the text are counted. According to the characteristics of power industry word aggregation, the phrase can be divided into 26 categories. The text classification model based on keywords of power industry was designed, and the machine learning algorithm Fasttext is used to complete the text classification. Based on each subdivision field of power, the publication situation of intellectual property is analyzed and counted, and the results of power system intellectual property are visualized.
基于快速文本的电力领域中文文本分类
文本分类一直是自然语言处理领域的研究热点。Fasttext模型结构简单,具有快速高效的优点。为了分析近年来电力知识产权的状况,本文利用FastText模型对近五年的中国电力核心期刊和专利进行了处理。公司知识产权的文本信息,基于Jieba分词工具,通过分词将文本信息分成一系列表示单个含义的短语。剔除短语中与电力行业无关的停用词,统计短语所对应的与电力行业相关的词的频率和在文中出现的频率。根据电力行业词语聚集的特点,可将词语分为26类。设计了基于电力行业关键词的文本分类模型,并采用机器学习算法Fasttext完成文本分类。以电力各细分领域为基础,对电力系统知识产权公示情况进行了分析和统计,并对电力系统知识产权公示结果进行了可视化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信