{"title":"A Clustering Algorithm of Four Character Medicine Effect Phrases in TCM Patents","authors":"Na Deng, Song Lin, Caiquan Xiong, Desheng Li","doi":"10.1109/ICEIEC.2018.8473529","DOIUrl":null,"url":null,"abstract":"In the era of big data, data analysis and data mining are important decision support tools. As a very critical step, the accuracy and comprehensiveness of patent retrieval directly affects the results of patent analysis and mining. Now almost all the mainstream patent retrieval systems work based on retrieval words. It will miss a lot of similar patents. In order to improve the recall rate of Chinese patent retrieval and implement semantic retrieval, utilizing word-building and part of speech combination characteristics of four character medicine effect phrases, this paper puts forward a method to calculate the similarity of four character medicine effect phrases and gives a K-centroid clustering algorithm of them. The experimental results show the effectiveness of the method.","PeriodicalId":344233,"journal":{"name":"2018 8th International Conference on Electronics Information and Emergency Communication (ICEIEC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 8th International Conference on Electronics Information and Emergency Communication (ICEIEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIEC.2018.8473529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In the era of big data, data analysis and data mining are important decision support tools. As a very critical step, the accuracy and comprehensiveness of patent retrieval directly affects the results of patent analysis and mining. Now almost all the mainstream patent retrieval systems work based on retrieval words. It will miss a lot of similar patents. In order to improve the recall rate of Chinese patent retrieval and implement semantic retrieval, utilizing word-building and part of speech combination characteristics of four character medicine effect phrases, this paper puts forward a method to calculate the similarity of four character medicine effect phrases and gives a K-centroid clustering algorithm of them. The experimental results show the effectiveness of the method.