{"title":"Theme Classification of the Complete Song Ci from the Perspective of the Digital Humanities","authors":"","doi":"10.23977/langl.2023.061103","DOIUrl":null,"url":null,"abstract":"To fully explore the underlying themes in the Complete Song Ci, we adopted a new paradigm of the digital humanities to efficiently extract themes from large-scale ancient poetry texts, which is expected to provide new perspectives and ideas for the study of traditional poetic themes. Under the BERTopic classification framework, we carried out fine-tuning training by combining a pre-training model for BERT with ancient Chinese and the SimCSE unsupervised learning method. We derived topic classification results of the Complete Song Ci through quantitative and visual means. The results indicate that the Complete Song Ci is divided into 43 sub-themes, among which certain similarities and compatibilities exist. After a further merging of the sub-themes based on cosine similarity values, we identified ten distinct themes, conforming to the Ten Major Themes theory of classical Chinese literature proposed in previous research, simultaneously establishing the research value of machine learning theories such as BERTopic in the topic classification of ancient poetic texts.","PeriodicalId":223840,"journal":{"name":"Lecture Notes on Language and Literature","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lecture Notes on Language and Literature","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23977/langl.2023.061103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
To fully explore the underlying themes in the Complete Song Ci, we adopted a new paradigm of the digital humanities to efficiently extract themes from large-scale ancient poetry texts, which is expected to provide new perspectives and ideas for the study of traditional poetic themes. Under the BERTopic classification framework, we carried out fine-tuning training by combining a pre-training model for BERT with ancient Chinese and the SimCSE unsupervised learning method. We derived topic classification results of the Complete Song Ci through quantitative and visual means. The results indicate that the Complete Song Ci is divided into 43 sub-themes, among which certain similarities and compatibilities exist. After a further merging of the sub-themes based on cosine similarity values, we identified ten distinct themes, conforming to the Ten Major Themes theory of classical Chinese literature proposed in previous research, simultaneously establishing the research value of machine learning theories such as BERTopic in the topic classification of ancient poetic texts.