利用文本挖掘对口头传说的地区间传播趋势相似性的研究

The Research of the Korean Classic Pub Date : 2024-02-28 DOI:10.20516/classic.2024.64.157

Yu-jin Han

{"title":"利用文本挖掘对口头传说的地区间传播趋势相似性的研究","authors":"Yu-jin Han","doi":"10.20516/classic.2024.64.157","DOIUrl":null,"url":null,"abstract":"This paper analyzed the similarities in the inter-regional transmission trends of oral tales handed down in nine regions using two data collections, 『Comprehensive Korean Oral Literature』 and 『Complementary Edition of Comprehensive Korean Oral Literature』. To this end, we used text mining techniques to go through the analysis process of “data collection → local information pre-processing → regional narrative analysis → visualization”. \nFirst, 26,542 tale title data were collected from the digital archive of 〈Comprehensive Korean Oral Literature〉, and regional information that was not organized into administrative districts at the “province” level was preprocessed. The data was then divided into nine regions, and these data were again classified based on the year of recording. Next, the corpus morphemes created by collecting only titles from the preprocessed data were analyzed to extract the top 100 frequencies of nouns by region. Then, the extracted noun frequencies were normalized to accurately compare the proportion of oral speech between regions. The distribution of stories between regions was compared by calculating the cosine similarity between regions using the normalization value calculated here. This targeted 384 nouns extracted from 『Comprehensive Korean Oral Literature』 and 435 nouns from 『Complementary Edition of Comprehensive Korean Oral Literature』. \nThe results derived through the analysis process were presented through a word cloud for each of the nine regions, the numbers of cosine similarity values between regions, and data visualizing the cosine similarity values on a map. The results indicate that, excluding Jeju, narratives transmitted in the Gyeonggi region show relatively low similarity with those of other regions, making it the most heterogeneous in terms of transmission tendencies across the nation in 『Comprehensive Korean Oral Literature』. On the other hand, in the 『Complementary Edition of Comprehensive Korean Oral Literature』 the regions of Chungcheongbuk-do and Jeollabuk-do exhibit the most heterogeneous transmission tendencies, with Gyeonggi region showing a relatively higher similarity with other regions.","PeriodicalId":494569,"journal":{"name":"The Research of the Korean Classic","volume":"69 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A study on the similarity of inter-regional transmission trends of oral tales using text mining\",\"authors\":\"Yu-jin Han\",\"doi\":\"10.20516/classic.2024.64.157\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper analyzed the similarities in the inter-regional transmission trends of oral tales handed down in nine regions using two data collections, 『Comprehensive Korean Oral Literature』 and 『Complementary Edition of Comprehensive Korean Oral Literature』. To this end, we used text mining techniques to go through the analysis process of “data collection → local information pre-processing → regional narrative analysis → visualization”. \\nFirst, 26,542 tale title data were collected from the digital archive of 〈Comprehensive Korean Oral Literature〉, and regional information that was not organized into administrative districts at the “province” level was preprocessed. The data was then divided into nine regions, and these data were again classified based on the year of recording. Next, the corpus morphemes created by collecting only titles from the preprocessed data were analyzed to extract the top 100 frequencies of nouns by region. Then, the extracted noun frequencies were normalized to accurately compare the proportion of oral speech between regions. The distribution of stories between regions was compared by calculating the cosine similarity between regions using the normalization value calculated here. This targeted 384 nouns extracted from 『Comprehensive Korean Oral Literature』 and 435 nouns from 『Complementary Edition of Comprehensive Korean Oral Literature』. \\nThe results derived through the analysis process were presented through a word cloud for each of the nine regions, the numbers of cosine similarity values between regions, and data visualizing the cosine similarity values on a map. The results indicate that, excluding Jeju, narratives transmitted in the Gyeonggi region show relatively low similarity with those of other regions, making it the most heterogeneous in terms of transmission tendencies across the nation in 『Comprehensive Korean Oral Literature』. On the other hand, in the 『Complementary Edition of Comprehensive Korean Oral Literature』 the regions of Chungcheongbuk-do and Jeollabuk-do exhibit the most heterogeneous transmission tendencies, with Gyeonggi region showing a relatively higher similarity with other regions.\",\"PeriodicalId\":494569,\"journal\":{\"name\":\"The Research of the Korean Classic\",\"volume\":\"69 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Research of the Korean Classic\",\"FirstCategoryId\":\"0\",\"ListUrlMain\":\"https://doi.org/10.20516/classic.2024.64.157\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Research of the Korean Classic","FirstCategoryId":"0","ListUrlMain":"https://doi.org/10.20516/classic.2024.64.157","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文利用《韩国综合口头文学》和《韩国综合口头文学补编》两个资料集，分析了九个地区流传的口头传说在地区间传播趋势上的相似性。为此，我们利用文本挖掘技术，经历了 "数据收集→地方信息预处理→区域叙事分析→可视化 "的分析过程。首先，从《韩国口头文学综合资料》数字档案中收集了 26,542 条书目数据，并对未按 "道 "一级行政区域组织的地区信息进行了预处理。然后将数据分为九个地区，并根据记录年份对这些数据再次进行分类。接着，对只从预处理数据中收集标题而创建的语料库语素进行分析，以提取各地区前 100 个名词的频率。然后，对提取的名词频率进行归一化处理，以准确比较不同地区的口头语音比例。通过使用此处计算的归一化值计算地区间的余弦相似度，比较地区间的故事分布。分析对象为从《韩国综合口语文学》中提取的 384 个名词和从《韩国综合口语文学补编》中提取的 435 个名词。分析过程中得出的结果通过九个地区的词云、地区间余弦相似值的数量以及地图上余弦相似值的可视化数据呈现出来。结果表明，除济州地区外，京畿地区流传的叙事与其他地区的叙事相似度相对较低，是《韩国口述文学集成》中全国流传倾向异质性最大的地区。另一方面，在《韩国综合口述文学补编》中，忠清北道和全罗北道地区的传播倾向异质性最大，京畿地区与其他地区的相似性相对较高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A study on the similarity of inter-regional transmission trends of oral tales using text mining

This paper analyzed the similarities in the inter-regional transmission trends of oral tales handed down in nine regions using two data collections, 『Comprehensive Korean Oral Literature』 and 『Complementary Edition of Comprehensive Korean Oral Literature』. To this end, we used text mining techniques to go through the analysis process of “data collection → local information pre-processing → regional narrative analysis → visualization”. First, 26,542 tale title data were collected from the digital archive of 〈Comprehensive Korean Oral Literature〉, and regional information that was not organized into administrative districts at the “province” level was preprocessed. The data was then divided into nine regions, and these data were again classified based on the year of recording. Next, the corpus morphemes created by collecting only titles from the preprocessed data were analyzed to extract the top 100 frequencies of nouns by region. Then, the extracted noun frequencies were normalized to accurately compare the proportion of oral speech between regions. The distribution of stories between regions was compared by calculating the cosine similarity between regions using the normalization value calculated here. This targeted 384 nouns extracted from 『Comprehensive Korean Oral Literature』 and 435 nouns from 『Complementary Edition of Comprehensive Korean Oral Literature』. The results derived through the analysis process were presented through a word cloud for each of the nine regions, the numbers of cosine similarity values between regions, and data visualizing the cosine similarity values on a map. The results indicate that, excluding Jeju, narratives transmitted in the Gyeonggi region show relatively low similarity with those of other regions, making it the most heterogeneous in terms of transmission tendencies across the nation in 『Comprehensive Korean Oral Literature』. On the other hand, in the 『Complementary Edition of Comprehensive Korean Oral Literature』 the regions of Chungcheongbuk-do and Jeollabuk-do exhibit the most heterogeneous transmission tendencies, with Gyeonggi region showing a relatively higher similarity with other regions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Research of the Korean Classic

自引率

0.00%

发文量