{"title":"基于节拍的相似印地语诗歌的两级数据转换学习方法","authors":"Komal Naaz, Niraj Kumar Singh","doi":"10.1093/llc/fqad011","DOIUrl":null,"url":null,"abstract":"\n With the advancement in technology and digitalization of resources, computation of humanities problems is no exception to remain untouched. Automatic poetry classification is now a well-defined problem which can be solved using various approaches. Mood-based poetry classification is one of the popular ones. We propose a learning approach towards metre-based classification of Hindi metrical poetry. The state of art model for the metre-based poetry classification uses the rule-based approach whereas the proposed system uses learning models to perform classification. Feature extraction and classification are the two main components of text classification in natural language processing. Text is transformed into machine-readable numbers through the process of feature extraction, which is subsequently submitted to classification models. Poems, in their most natural formulation, are unfit to any learning-based algorithms. However, transforming the data into certain form and selecting a fixed number of features out of it (feature extraction) made the classification possible using machine learning approach which was yet untouched and can act as benchmark for the concerned area of research. The article deals with six popular and similar types of Hindi poems. The dataset is collected and processed to form an early dataset that undergoes two levels of data transformation and feature engineering, resulting in the pre-processed dataset. The pre-processed dataset is then fed as input to selected machine learning models (Bernoulli Naïve Bayes, k-nearest neighbour, random forest, and support vector machine) producing classification result with best accuracy of 99%, that further undergoes a post-processing step based on observed misclassifications.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":" ","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2023-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A learning approach towards metre-based classification of similar Hindi poems using proposed two-level data transformation\",\"authors\":\"Komal Naaz, Niraj Kumar Singh\",\"doi\":\"10.1093/llc/fqad011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n With the advancement in technology and digitalization of resources, computation of humanities problems is no exception to remain untouched. Automatic poetry classification is now a well-defined problem which can be solved using various approaches. Mood-based poetry classification is one of the popular ones. We propose a learning approach towards metre-based classification of Hindi metrical poetry. The state of art model for the metre-based poetry classification uses the rule-based approach whereas the proposed system uses learning models to perform classification. Feature extraction and classification are the two main components of text classification in natural language processing. Text is transformed into machine-readable numbers through the process of feature extraction, which is subsequently submitted to classification models. Poems, in their most natural formulation, are unfit to any learning-based algorithms. However, transforming the data into certain form and selecting a fixed number of features out of it (feature extraction) made the classification possible using machine learning approach which was yet untouched and can act as benchmark for the concerned area of research. The article deals with six popular and similar types of Hindi poems. The dataset is collected and processed to form an early dataset that undergoes two levels of data transformation and feature engineering, resulting in the pre-processed dataset. The pre-processed dataset is then fed as input to selected machine learning models (Bernoulli Naïve Bayes, k-nearest neighbour, random forest, and support vector machine) producing classification result with best accuracy of 99%, that further undergoes a post-processing step based on observed misclassifications.\",\"PeriodicalId\":45315,\"journal\":{\"name\":\"Digital Scholarship in the Humanities\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Scholarship in the Humanities\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1093/llc/fqad011\",\"RegionNum\":3,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"HUMANITIES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Scholarship in the Humanities","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1093/llc/fqad011","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}
A learning approach towards metre-based classification of similar Hindi poems using proposed two-level data transformation
With the advancement in technology and digitalization of resources, computation of humanities problems is no exception to remain untouched. Automatic poetry classification is now a well-defined problem which can be solved using various approaches. Mood-based poetry classification is one of the popular ones. We propose a learning approach towards metre-based classification of Hindi metrical poetry. The state of art model for the metre-based poetry classification uses the rule-based approach whereas the proposed system uses learning models to perform classification. Feature extraction and classification are the two main components of text classification in natural language processing. Text is transformed into machine-readable numbers through the process of feature extraction, which is subsequently submitted to classification models. Poems, in their most natural formulation, are unfit to any learning-based algorithms. However, transforming the data into certain form and selecting a fixed number of features out of it (feature extraction) made the classification possible using machine learning approach which was yet untouched and can act as benchmark for the concerned area of research. The article deals with six popular and similar types of Hindi poems. The dataset is collected and processed to form an early dataset that undergoes two levels of data transformation and feature engineering, resulting in the pre-processed dataset. The pre-processed dataset is then fed as input to selected machine learning models (Bernoulli Naïve Bayes, k-nearest neighbour, random forest, and support vector machine) producing classification result with best accuracy of 99%, that further undergoes a post-processing step based on observed misclassifications.
期刊介绍:
DSH or Digital Scholarship in the Humanities is an international, peer reviewed journal which publishes original contributions on all aspects of digital scholarship in the Humanities including, but not limited to, the field of what is currently called the Digital Humanities. Long and short papers report on theoretical, methodological, experimental, and applied research and include results of research projects, descriptions and evaluations of tools, techniques, and methodologies, and reports on work in progress. DSH also publishes reviews of books and resources. Digital Scholarship in the Humanities was previously known as Literary and Linguistic Computing.