{"title":"An Industry Classification Model of Small and Medium-sized Enterprises based on TF-IDF Characteristics","authors":"Chen Jiahao, Zhang Jiayi","doi":"10.23977/ICAMEI.2019.047","DOIUrl":null,"url":null,"abstract":"This paper selects the data of the national SME Information Disclosure System, uses the TensorFlow in Python to establish the corresponding learning framework, according to its business scope to carry on the corresponding classification. The Jieba participle in Python is first used to remove extraneous words from the business scope of the enterprise. Secondly, using the simple Bayesian text classification model, using Chi as the basis of feature selection, the multi-dimensional characteristics of each type of business scope are selected and re-weighed. After that, the VSM model is constructed for each business scope, which classifies it according to probability. Then, XG-boost is used to encode all the words one-hot, the tree-based model XG-boost is used to make decisions on the processing capacity of tabular data, and prune categories below the threshold. Then, the convolution neural network is used to encode the vocabulary, the lexical annotation is added to the participle, the Gensim training word vector is used, then the cosine similarity is used to calculate, and the classification results are finally obtained.","PeriodicalId":273092,"journal":{"name":"2019 International Conference on Arts, Management, Education and Innovation (ICAMEI 2019)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Arts, Management, Education and Innovation (ICAMEI 2019)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23977/ICAMEI.2019.047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper selects the data of the national SME Information Disclosure System, uses the TensorFlow in Python to establish the corresponding learning framework, according to its business scope to carry on the corresponding classification. The Jieba participle in Python is first used to remove extraneous words from the business scope of the enterprise. Secondly, using the simple Bayesian text classification model, using Chi as the basis of feature selection, the multi-dimensional characteristics of each type of business scope are selected and re-weighed. After that, the VSM model is constructed for each business scope, which classifies it according to probability. Then, XG-boost is used to encode all the words one-hot, the tree-based model XG-boost is used to make decisions on the processing capacity of tabular data, and prune categories below the threshold. Then, the convolution neural network is used to encode the vocabulary, the lexical annotation is added to the participle, the Gensim training word vector is used, then the cosine similarity is used to calculate, and the classification results are finally obtained.