AUTOMATION OF TEXT DATA PROCESSING USING NLP

The American Journal of Engineering and Technology Pub Date : 2024-07-01 DOI:10.37547/tajet/volume06issue07-04

Yaroslav Starukhin, Vladimir Diukarev

引用次数: 0

Abstract

This study aims to develop an automated system for processing scientific texts using advanced NLP techniques. The methodology integrates classical NLP methods with deep learning approaches, employing SciBERT for text classification, LDA for topic modeling, and a modified TextRank algorithm for keyword extraction. Results demonstrate high accuracy in document classification (F1-score of 0.92), effective topic identification, and precise keyword extraction. The developed web interface showcases the system's practical applicability. This research contributes to the field by presenting a comprehensive solution for scientific text analysis, combining state-of-the-art language models with established NLP techniques. The study's novelty lies in its tailored approach to scientific literature, addressing the unique challenges of domain-specific language and complex content structure in academic texts.

查看原文本刊更多论文

利用 NLP 实现文本数据处理自动化

本研究旨在利用先进的 NLP 技术开发一个处理科学文本的自动化系统。该方法将经典的 NLP 方法与深度学习方法相结合，使用 SciBERT 进行文本分类，使用 LDA 进行主题建模，并使用改进的 TextRank 算法进行关键词提取。结果表明，该系统在文档分类（F1 分数为 0.92）、有效的主题识别和精确的关键词提取方面具有很高的准确性。开发的网络界面展示了该系统的实用性。这项研究将最先进的语言模型与成熟的 NLP 技术相结合，为科学文本分析提供了一个全面的解决方案，从而为该领域做出了贡献。这项研究的新颖之处在于它为科学文献量身定制了方法，解决了学术文本中特定领域语言和复杂内容结构所带来的独特挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The American Journal of Engineering and Technology

自引率

0.00%

发文量