{"title":"Data Preparation and Quality Challenges for the Personality Recognition in Indian Languages using Machine Learning and Deep Learning Approaches","authors":"Jayshri P. Patil, Jikitsha R. Sheth","doi":"10.36548/jismac.2022.1.004","DOIUrl":null,"url":null,"abstract":"Information about the user and their feelings, thoughts, and emotions are expressed through the status, comments, and updates on social media or other platforms. These user-generated contents are an important source for recognizing a user’s personality. Due to the increase in the amount of various Indian language contents on social media, there is a necessity to recognize personality from Indian languages. The challenges have increased in the collection and generation of datasets due to the lack of resources for Indian languages. In the field of personality recognition, the researchers have utilized machine learning and deep learning techniques to infer users’ personalities. The machine learning and deep learning models require enough labeled data for the training. Unlike traditional machine learning, deep learning techniques automatically generate features and require a significant amount of labeled data. For the personality recognition task from the Indian language, no sufficient annotated dataset is available and data preparation for the personality recognition task in the language has become a critical issue. This paper represents the existing gold standard dataset for personality recognition in English and also focuses on the challenges of a large amount of labeled data preparation in the Indian language.","PeriodicalId":10940,"journal":{"name":"Day 2 Tue, March 22, 2022","volume":"29 2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Day 2 Tue, March 22, 2022","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36548/jismac.2022.1.004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Information about the user and their feelings, thoughts, and emotions are expressed through the status, comments, and updates on social media or other platforms. These user-generated contents are an important source for recognizing a user’s personality. Due to the increase in the amount of various Indian language contents on social media, there is a necessity to recognize personality from Indian languages. The challenges have increased in the collection and generation of datasets due to the lack of resources for Indian languages. In the field of personality recognition, the researchers have utilized machine learning and deep learning techniques to infer users’ personalities. The machine learning and deep learning models require enough labeled data for the training. Unlike traditional machine learning, deep learning techniques automatically generate features and require a significant amount of labeled data. For the personality recognition task from the Indian language, no sufficient annotated dataset is available and data preparation for the personality recognition task in the language has become a critical issue. This paper represents the existing gold standard dataset for personality recognition in English and also focuses on the challenges of a large amount of labeled data preparation in the Indian language.