Text Classification for Patents: Experiments with Unigrams, Bigrams and Different Weighting Methods

Chanjong Im, 김도완, Thomas Mandl
{"title":"Text Classification for Patents: Experiments with Unigrams, Bigrams and Different Weighting Methods","authors":"Chanjong Im, 김도완, Thomas Mandl","doi":"10.5392/IJoC.2017.13.2.066","DOIUrl":null,"url":null,"abstract":"Patent classification is becoming more critical as patent filings have been increasing over the years. Despite comprehensive studies in the area, there remain several issues in classifying patents on IPC hierarchical levels. Not only structural complexity but also shortage of patents in the lower level of the hierarchy causes the decline in classification performance. Therefore, we propose a new method of classification based on different criteria that are categories defined by the domain’s experts mentioned in trend analysis reports, i.e. Patent Landscape Report (PLR). Several experiments were conducted with the purpose of identifying type of features and weighting methods that lead to the best classification performance using Support Vector Machine (SVM). Two types of features (noun and noun phrases) and five different weighting schemes (TF-idf, TF-rf, TF-icf, TF-icf-based, and TF-idcef-based) were experimented on.","PeriodicalId":31343,"journal":{"name":"International Journal of Contents","volume":"13 1","pages":"66-74"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Contents","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5392/IJoC.2017.13.2.066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Patent classification is becoming more critical as patent filings have been increasing over the years. Despite comprehensive studies in the area, there remain several issues in classifying patents on IPC hierarchical levels. Not only structural complexity but also shortage of patents in the lower level of the hierarchy causes the decline in classification performance. Therefore, we propose a new method of classification based on different criteria that are categories defined by the domain’s experts mentioned in trend analysis reports, i.e. Patent Landscape Report (PLR). Several experiments were conducted with the purpose of identifying type of features and weighting methods that lead to the best classification performance using Support Vector Machine (SVM). Two types of features (noun and noun phrases) and five different weighting schemes (TF-idf, TF-rf, TF-icf, TF-icf-based, and TF-idcef-based) were experimented on.
专利文本分类:单图、双图和不同加权方法的实验
随着近年来专利申请的不断增加,专利分类变得越来越重要。尽管在该领域进行了全面的研究,但在IPC等级水平上对专利进行分类仍然存在一些问题。导致分类绩效下降的原因不仅是结构复杂,还有层次较低的专利数量不足。因此,我们提出了一种新的基于不同标准的分类方法,这些标准是由趋势分析报告中提到的领域专家定义的类别,即专利景观报告(PLR)。为了确定特征类型和加权方法,使用支持向量机(SVM)进行了一些实验,从而获得最佳的分类性能。对两类特征(名词和名词短语)和五种不同的权重方案(TF-idf、TF-rf、TF-icf、TF-icf-based和TF-idcef-based)进行了实验。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
审稿时长
8 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信