{"title":"Intelligent Analysis of Patent Data in the Biomedical Field Based on Spark Parallel Clustering Algorithm","authors":"Bailing Xu","doi":"10.1109/ACAIT56212.2022.10137981","DOIUrl":null,"url":null,"abstract":"Aiming at the problem of poor analysis performance of traditional patent data in the biomedical field, a parallel strategy based on the combination of Spark framework and K-means clustering algorithm was proposed. Firstly, Spark tool was used to initially process the big data. Then, K-means clustering algorithm was used to cluster and analyze the patent data, and obtain the optimal solution, so as to realize the intelligent analysis of patent data. Experimental results showed that in the same test sample data and sample classification results, compared with a single K-means clustering algorithm, the proposed parallel clustering analysis algorithm has a better classification effect on the quantity and category of patent data, which can prove that the analysis effect of parallel clustering algorithm is better. At the same time, the parallel strategy greatly improves the accuracy and speed of patent data analysis, thereby effectively improving the ability of clustering and analysis of massive data.","PeriodicalId":398228,"journal":{"name":"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACAIT56212.2022.10137981","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Aiming at the problem of poor analysis performance of traditional patent data in the biomedical field, a parallel strategy based on the combination of Spark framework and K-means clustering algorithm was proposed. Firstly, Spark tool was used to initially process the big data. Then, K-means clustering algorithm was used to cluster and analyze the patent data, and obtain the optimal solution, so as to realize the intelligent analysis of patent data. Experimental results showed that in the same test sample data and sample classification results, compared with a single K-means clustering algorithm, the proposed parallel clustering analysis algorithm has a better classification effect on the quantity and category of patent data, which can prove that the analysis effect of parallel clustering algorithm is better. At the same time, the parallel strategy greatly improves the accuracy and speed of patent data analysis, thereby effectively improving the ability of clustering and analysis of massive data.