Clustering patent document in the field of ICT (Information & Communication Technology)

A. Widodo, I. Budi
{"title":"Clustering patent document in the field of ICT (Information & Communication Technology)","authors":"A. Widodo, I. Budi","doi":"10.1109/STAIR.2011.5995789","DOIUrl":null,"url":null,"abstract":"The current classification of patent data that refers to the IPC (International Patent Classification) of the WIPO (World Intellectual Property Organization), deemed not reflect the classification of the field of ICT (Information & Communication Technology). ICT applications are usually included in sections G (Physics) and H (Electricity). This paper will evaluate the eight groupings of patents based on the IPC classes (G01, G06, G09, G11, H01, H03, H04, and H06) of patents registered in the Directorate General of Intellectual Property Rights in Indonesia, from the year 1991 to 2000. The algorithm used to grouping is KMeans, KMeans++, Hierchical Clustering, and a combination of these three algorithms with SVD (Singular Value Decomposition). For external validation, Purity and F-Measure are used, whereas Silhouette is used for internal validation. From the experimental results it can be concluded that SVD provides improvements to the clustering results. In addition, the use of abstract does not necessarily improve the performance of clustering, and the use of phrase does not always yield better cluster than the use of the word as index. Moreover, no cluster has purity measure greater than 50%, which means that the existing IPC classification has not been able to accommodate the field of ICT appropriately.","PeriodicalId":376671,"journal":{"name":"2011 International Conference on Semantic Technology and Information Retrieval","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Semantic Technology and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/STAIR.2011.5995789","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

The current classification of patent data that refers to the IPC (International Patent Classification) of the WIPO (World Intellectual Property Organization), deemed not reflect the classification of the field of ICT (Information & Communication Technology). ICT applications are usually included in sections G (Physics) and H (Electricity). This paper will evaluate the eight groupings of patents based on the IPC classes (G01, G06, G09, G11, H01, H03, H04, and H06) of patents registered in the Directorate General of Intellectual Property Rights in Indonesia, from the year 1991 to 2000. The algorithm used to grouping is KMeans, KMeans++, Hierchical Clustering, and a combination of these three algorithms with SVD (Singular Value Decomposition). For external validation, Purity and F-Measure are used, whereas Silhouette is used for internal validation. From the experimental results it can be concluded that SVD provides improvements to the clustering results. In addition, the use of abstract does not necessarily improve the performance of clustering, and the use of phrase does not always yield better cluster than the use of the word as index. Moreover, no cluster has purity measure greater than 50%, which means that the existing IPC classification has not been able to accommodate the field of ICT appropriately.
ICT (Information & Communication Technology)领域专利文献聚类
目前的专利数据分类参照的是WIPO(世界知识产权组织)的IPC(国际专利分类),被认为不能反映ICT(信息与通信技术)领域的分类。信息通信技术应用通常包括在G(物理)和H(电力)部分。本文将对1991年至2000年在印度尼西亚知识产权总局注册的专利根据IPC类别(G01、G06、G09、G11、H01、H03、H04和H06)的八组专利进行评估。用于分组的算法是KMeans、kmeans++、分层聚类以及这三种算法与奇异值分解(SVD)的结合。对于外部验证,使用Purity和F-Measure,而内部验证使用Silhouette。从实验结果可以看出,奇异值分解对聚类结果有改善作用。此外,使用abstract并不一定会提高聚类的性能,使用短语并不总是比使用单词作为索引产生更好的聚类。此外,没有一个集群的纯度测量值大于50%,这意味着现有的IPC分类不能适当地适应ICT领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信