Research on the Value and Language Features of Chinese Language and Literature Texts Based on Text Mining Technology

Q3 Multidisciplinary
Yiyi Ru
{"title":"Research on the Value and Language Features of Chinese Language and Literature Texts Based on Text Mining Technology","authors":"Yiyi Ru","doi":"10.62227/as/74221","DOIUrl":null,"url":null,"abstract":"The linguistic characteristics of a literary work are the way of thinking embodied in the author’s use of language. From the textual value of Chinese language literature, this paper analyzes the spiritual connotation of Chinese language literature from two dimensions: reading and education. Based on the web crawler technology, we obtain the text data of Chinese language literature from three writers, Bajin, Yu Zheng and Qiong Yao, preprocess the data through data cleaning, Chinese word segmentation, de-duplication, etc., and extract the feature values of the text by using the TF-IDF algorithm. Then the text documents are mapped onto vectors using the VSM model, and the parameters of the LDA topic model are estimated by the Gibbs sampling algorithm in order to better obtain the topic changes of the Chinese language literature texts. This paper carries out linguistic feature verification from the lexical and similarity features of Chinese language literary texts. It is found that the difference in lexical density between Ba Jin’s Cold Night and Resting Garden is only 2.1 percentage points, and the frequency of the verb “to say” is 1,213 times and 735 times respectively. The average sentence lengths of Yu Zheng and Qiong Yao fluctuate within the range of [18.49,34.27], and Qiong Yao’s works have a higher thematic concentration than Zheng Zheng’s works. Analyzing the linguistic features of Chinese language literary texts based on text mining techniques helps to understand the authors’ language usage methods and helps to promote innovative expression paths in literary texts.","PeriodicalId":55478,"journal":{"name":"Archives Des Sciences","volume":" 33","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archives Des Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.62227/as/74221","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0

Abstract

The linguistic characteristics of a literary work are the way of thinking embodied in the author’s use of language. From the textual value of Chinese language literature, this paper analyzes the spiritual connotation of Chinese language literature from two dimensions: reading and education. Based on the web crawler technology, we obtain the text data of Chinese language literature from three writers, Bajin, Yu Zheng and Qiong Yao, preprocess the data through data cleaning, Chinese word segmentation, de-duplication, etc., and extract the feature values of the text by using the TF-IDF algorithm. Then the text documents are mapped onto vectors using the VSM model, and the parameters of the LDA topic model are estimated by the Gibbs sampling algorithm in order to better obtain the topic changes of the Chinese language literature texts. This paper carries out linguistic feature verification from the lexical and similarity features of Chinese language literary texts. It is found that the difference in lexical density between Ba Jin’s Cold Night and Resting Garden is only 2.1 percentage points, and the frequency of the verb “to say” is 1,213 times and 735 times respectively. The average sentence lengths of Yu Zheng and Qiong Yao fluctuate within the range of [18.49,34.27], and Qiong Yao’s works have a higher thematic concentration than Zheng Zheng’s works. Analyzing the linguistic features of Chinese language literary texts based on text mining techniques helps to understand the authors’ language usage methods and helps to promote innovative expression paths in literary texts.
基于文本挖掘技术的汉语言文学文本价值与语言特点研究
文学作品的语言特点是作者运用语言所体现的思维方式。本文从华语文学的文本价值出发,从阅读和教育两个维度分析华语文学的精神内涵。基于网络爬虫技术,获取巴金、于正、琼瑶三位作家的华语文学文本数据,通过数据清洗、中文分词、去重等方法对数据进行预处理,并利用TF-IDF算法提取文本的特征值。然后利用 VSM 模型将文本文献映射到向量上,并利用 Gibbs 抽样算法估计 LDA 主题模型的参数,从而更好地获得中文文献文本的主题变化。本文从汉语文学文本的词性特征和相似性特征出发,进行了语言特征验证。研究发现,巴金的《寒夜》和《憩园》在词性密度上仅相差 2.1 个百分点,动词 "说 "的出现频率分别为 1 213 次和 735 次。于正和琼瑶的平均句长在[18.49,34.27]的范围内波动,琼瑶作品的主题集中度高于于正作品。基于文本挖掘技术分析华语文学文本的语言特点,有助于了解作者的语言使用方法,有助于推动文学文本表达路径的创新。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Archives Des Sciences
Archives Des Sciences 综合性期刊-综合性期刊
CiteScore
1.10
自引率
0.00%
发文量
0
审稿时长
1 months
期刊介绍: Archives des Sciences est un journal scientifique multidisciplinaire et international. Les articles sont soumis à un comité de lecture.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信