{"title":"Classification of Chinese Text in the Power Field Using FastText with Electric Power Keywords","authors":"Changan Li, Feihu Hu, Yanpeng Wang","doi":"10.1109/ipec54454.2022.9777517","DOIUrl":null,"url":null,"abstract":"Text classification has always been a hot research topic in the field of natural language processing. Fasttext is a model which is simple in structure and has the advantage of fast and efficient. In order to analyze the situation of electric power intellectual property in recent years, this paper use the model of FastText to process Chinese electric power core journals and patents in recent five years. The text information in the company's intellectual property rights is divided into a series of phrases representing individual meanings though word segmentation basing on the Jieba word segmentation tool. The stop words unrelated to the power industry in the phrases are removed, and the word related the power industry frequency corresponding to the phrases and the frequency of occurrence in the text are counted. According to the characteristics of power industry word aggregation, the phrase can be divided into 26 categories. The text classification model based on keywords of power industry was designed, and the machine learning algorithm Fasttext is used to complete the text classification. Based on each subdivision field of power, the publication situation of intellectual property is analyzed and counted, and the results of power system intellectual property are visualized.","PeriodicalId":232563,"journal":{"name":"2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipec54454.2022.9777517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Text classification has always been a hot research topic in the field of natural language processing. Fasttext is a model which is simple in structure and has the advantage of fast and efficient. In order to analyze the situation of electric power intellectual property in recent years, this paper use the model of FastText to process Chinese electric power core journals and patents in recent five years. The text information in the company's intellectual property rights is divided into a series of phrases representing individual meanings though word segmentation basing on the Jieba word segmentation tool. The stop words unrelated to the power industry in the phrases are removed, and the word related the power industry frequency corresponding to the phrases and the frequency of occurrence in the text are counted. According to the characteristics of power industry word aggregation, the phrase can be divided into 26 categories. The text classification model based on keywords of power industry was designed, and the machine learning algorithm Fasttext is used to complete the text classification. Based on each subdivision field of power, the publication situation of intellectual property is analyzed and counted, and the results of power system intellectual property are visualized.