{"title":"Integrating big data with KNIME as an alternative without programming code: an application to the PATSTAT patent database","authors":"Fernando H. Taques, Coro Chasco, Flávio H. Taques","doi":"10.1007/s10109-024-00445-0","DOIUrl":null,"url":null,"abstract":"<p>Accessing massive datasets can be challenging for users unfamiliar with programming codes. Combining Konstanz Information Miner (KNIME) and MySQL tools on standard configuration equipment allows for addressing this issue. This research proposal aims to present a methodology that describes the necessary configuration steps in both tools and the required manipulation in KNIME to transmit the information to the MySQL environment for further processing in a database management system (DBMS). In addition, we propose a procedure so that the use of this point-and-click software in research work can gain in reproducibility and, therefore, in credibility in the scientific community. To achieve this, we will use a big database regarding patent applications as a reference, the PATSTAT Global 2023, provided by the European Patent Office (EPO). As well known, patent data can be a valuable source for understanding innovation dynamics and technological trends, whether for studies on companies, sectors, nations or even regions, at aggregated and disaggregated levels.</p>","PeriodicalId":47245,"journal":{"name":"Journal of Geographical Systems","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Geographical Systems","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s10109-024-00445-0","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY","Score":null,"Total":0}
引用次数: 0
Abstract
Accessing massive datasets can be challenging for users unfamiliar with programming codes. Combining Konstanz Information Miner (KNIME) and MySQL tools on standard configuration equipment allows for addressing this issue. This research proposal aims to present a methodology that describes the necessary configuration steps in both tools and the required manipulation in KNIME to transmit the information to the MySQL environment for further processing in a database management system (DBMS). In addition, we propose a procedure so that the use of this point-and-click software in research work can gain in reproducibility and, therefore, in credibility in the scientific community. To achieve this, we will use a big database regarding patent applications as a reference, the PATSTAT Global 2023, provided by the European Patent Office (EPO). As well known, patent data can be a valuable source for understanding innovation dynamics and technological trends, whether for studies on companies, sectors, nations or even regions, at aggregated and disaggregated levels.
对于不熟悉编程代码的用户来说,访问海量数据集是一项挑战。在标准配置设备上结合康斯坦茨信息挖掘器(KNIME)和 MySQL 工具可以解决这个问题。本研究提案旨在提出一种方法,描述两种工具的必要配置步骤,以及在 KNIME 中传输信息到 MySQL 环境以便在数据库管理系统(DBMS)中进一步处理所需的操作。此外,我们还提出了一个程序,以便在研究工作中使用这种点选式软件可以提高可重复性,从而提高科学界的可信度。为此,我们将以欧洲专利局(EPO)提供的大型专利申请数据库 PATSTAT Global 2023 作为参考。众所周知,专利数据是了解创新动态和技术趋势的重要来源,无论是对公司、行业、国家甚至地区的研究,都可以从总体或分类的层面进行分析。
期刊介绍:
The Journal of Geographical Systems (JGS) is an interdisciplinary peer-reviewed academic journal that aims to encourage and promote high-quality scholarship on new theoretical or empirical results, models and methods in the social sciences. It solicits original papers with a spatial dimension that can be of interest to social scientists. Coverage includes regional science, economic geography, spatial economics, regional and urban economics, GIScience and GeoComputation, big data and machine learning. Spatial analysis, spatial econometrics and statistics are strongly represented.
One of the distinctive features of the journal is its concern for the interface between modeling, statistical techniques and spatial issues in a wide spectrum of related fields. An important goal of the journal is to encourage a spatial perspective in the social sciences that emphasizes geographical space as a relevant dimension to our understanding of socio-economic phenomena.
Contributions should be of high-quality, be technically well-crafted, make a substantial contribution to the subject and contain a spatial dimension. The journal also aims to publish, review and survey articles that make recent theoretical and methodological developments more readily accessible to the audience of the journal.
All papers of this journal have undergone rigorous double-blind peer-review, based on initial editor screening and with at least two peer reviewers.
Officially cited as J Geogr Syst