{"title":"使用Web抓取技术和大型语言模型开发自动数据收集和挖掘方法:提取技术就绪程度评估的案例研究","authors":"F. M. Grozovskiy, I. V. Loginova","doi":"10.3103/S0005105525700670","DOIUrl":null,"url":null,"abstract":"<p>The paper proposes an approach to the automated extraction and structuring of information from text, combining web scraping for data collection from online sources with a large language model for subsequent data mining. As a case study, texts from news publications on technology readiness levels from the CNews website were chosen to test the developed methodology in a specific domain. The model’s accuracy in identifying technology readiness assessments was 84–85%, which is comparable to similar results in other, less specialized tasks.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 4","pages":"269 - 278"},"PeriodicalIF":0.5000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Developing an Approach for Automated Data Collection and Mining Using Web Scraping Techniques and Large Language Models: A Case Study on Extracting Technology Readiness Level Assessments\",\"authors\":\"F. M. Grozovskiy, I. V. Loginova\",\"doi\":\"10.3103/S0005105525700670\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The paper proposes an approach to the automated extraction and structuring of information from text, combining web scraping for data collection from online sources with a large language model for subsequent data mining. As a case study, texts from news publications on technology readiness levels from the CNews website were chosen to test the developed methodology in a specific domain. The model’s accuracy in identifying technology readiness assessments was 84–85%, which is comparable to similar results in other, less specialized tasks.</p>\",\"PeriodicalId\":42995,\"journal\":{\"name\":\"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS\",\"volume\":\"59 4\",\"pages\":\"269 - 278\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2025-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.3103/S0005105525700670\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S0005105525700670","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Developing an Approach for Automated Data Collection and Mining Using Web Scraping Techniques and Large Language Models: A Case Study on Extracting Technology Readiness Level Assessments
The paper proposes an approach to the automated extraction and structuring of information from text, combining web scraping for data collection from online sources with a large language model for subsequent data mining. As a case study, texts from news publications on technology readiness levels from the CNews website were chosen to test the developed methodology in a specific domain. The model’s accuracy in identifying technology readiness assessments was 84–85%, which is comparable to similar results in other, less specialized tasks.
期刊介绍:
Automatic Documentation and Mathematical Linguistics is an international peer reviewed journal that covers all aspects of automation of information processes and systems, as well as algorithms and methods for automatic language analysis. Emphasis is on the practical applications of new technologies and techniques for information analysis and processing.