Design and Implementation of Text Processing System Based on Summarization Algorithm

2022 International Conference on Culture-Oriented Science and Technology (CoST) Pub Date : 2022-08-01 DOI:10.1109/cost57098.2022.00018

Haoqi Sun, Ning Luo, Li-juan Zhou, Songwei Wei

{"title":"Design and Implementation of Text Processing System Based on Summarization Algorithm","authors":"Haoqi Sun, Ning Luo, Li-juan Zhou, Songwei Wei","doi":"10.1109/cost57098.2022.00018","DOIUrl":null,"url":null,"abstract":"With the popularization and development of the Internet, text information has shown an exponential growth trend. This information overload phenomenon affects the ability of users to receive critical information, and people’s demand for quick access to information is increasing. Text summarization technology uses computers to automatically extract the key information of text, which helps to grasp the text content accurately and quickly, so it has a good application prospect. The traditional rule-based method simply counts the word frequency and lacks the consideration of the semantic information of the text, so the results are not accurate enough. To this end, this paper proposes an extractive text summarization algorithm based on multiple feature weighting, which comprehensively considers the global information, surface information, structural information, and semantic information of the text by weighting the sentence position, the total amount of keyword information, keyword distribution, and semantic similarity. This method retains the advantages of the rule-based approach from not requiring data annotation and saving computational resources while improving the text understanding capability of the model. Experimental results show that the model improves the evaluation results of the datasets, improving the quality and accuracy of text summarization. And when the model is applied to the text processing system, the user can quickly obtain the required information, effectively speeding up the process of obtaining and processing information.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cost57098.2022.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With the popularization and development of the Internet, text information has shown an exponential growth trend. This information overload phenomenon affects the ability of users to receive critical information, and people’s demand for quick access to information is increasing. Text summarization technology uses computers to automatically extract the key information of text, which helps to grasp the text content accurately and quickly, so it has a good application prospect. The traditional rule-based method simply counts the word frequency and lacks the consideration of the semantic information of the text, so the results are not accurate enough. To this end, this paper proposes an extractive text summarization algorithm based on multiple feature weighting, which comprehensively considers the global information, surface information, structural information, and semantic information of the text by weighting the sentence position, the total amount of keyword information, keyword distribution, and semantic similarity. This method retains the advantages of the rule-based approach from not requiring data annotation and saving computational resources while improving the text understanding capability of the model. Experimental results show that the model improves the evaluation results of the datasets, improving the quality and accuracy of text summarization. And when the model is applied to the text processing system, the user can quickly obtain the required information, effectively speeding up the process of obtaining and processing information.

查看原文本刊更多论文

基于摘要算法的文本处理系统的设计与实现

随着互联网的普及和发展，文字信息量呈指数级增长趋势。这种信息超载现象影响了用户接收关键信息的能力，人们对快速获取信息的需求越来越大。文本摘要技术利用计算机自动提取文本的关键信息，有助于准确、快速地掌握文本内容，具有良好的应用前景。传统的基于规则的方法简单地统计词频，缺乏对文本语义信息的考虑，结果不够准确。为此，本文提出了一种基于多特征加权的提取文本摘要算法，该算法通过对句子位置、关键词信息总量、关键词分布、语义相似度加权，综合考虑文本的全局信息、表面信息、结构信息和语义信息。该方法在提高模型文本理解能力的同时，保留了基于规则的方法不需要数据标注和节省计算资源的优点。实验结果表明，该模型改善了数据集的评价结果，提高了文本摘要的质量和准确性。并且当该模型应用于文本处理系统时，用户可以快速获取所需的信息，有效地加快了信息的获取和处理过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 International Conference on Culture-Oriented Science and Technology (CoST)

自引率

0.00%

发文量