Personality Detection on Reddit Using DistilBERT

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Pub Date : 2023-10-01 DOI:10.29207/resti.v7i5.5236

Alif Rahmat Julianda, Warih Maharani

{"title":"Personality Detection on Reddit Using DistilBERT","authors":"Alif Rahmat Julianda, Warih Maharani","doi":"10.29207/resti.v7i5.5236","DOIUrl":null,"url":null,"abstract":"Personality is a unique set of motivations, feelings, and behaviors humans possess. Personality detection on social media is a research topic commonly conducted in computer science. Personality models often used for personality detection research are the Big Five Indicator (BFI) and Myers-Briggs Type Indicator (MBTI) models. Unlike the BFI, which classifies personalities based on an individual’s traits, the MBTI model classifies personalities based on the type of the individual. So, MBTI performs better in several scenarios than the Big Five model. Many studies use machine learning to detect personality on social media, such as Logistic Regression, Naïve Bayes, and Support Vector Machine. With the recent popularity of Deep Learning, we can use language models such as DistilBERT to classify personality on social media. Because of DistilBERT’s ability to process large sentences and the ability for parallelization thanks to the transformer architecture. Therefore, the proposed research will detect MBTI personality on Reddit using DistilBERT. The evaluation shows that removing stopwords on the data preprocessing stage can reduce the model’s performance, and with class imbalance handling, DistilBERT performs worse than without class imbalance handling. Also, as a comparison, DistilBERT outperforms other machine learning classifiers such as Naïve Bayes, SVM, and Logistic Regression in accuracy, precision, recall, and f1-score.","PeriodicalId":435683,"journal":{"name":"Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29207/resti.v7i5.5236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Personality is a unique set of motivations, feelings, and behaviors humans possess. Personality detection on social media is a research topic commonly conducted in computer science. Personality models often used for personality detection research are the Big Five Indicator (BFI) and Myers-Briggs Type Indicator (MBTI) models. Unlike the BFI, which classifies personalities based on an individual’s traits, the MBTI model classifies personalities based on the type of the individual. So, MBTI performs better in several scenarios than the Big Five model. Many studies use machine learning to detect personality on social media, such as Logistic Regression, Naïve Bayes, and Support Vector Machine. With the recent popularity of Deep Learning, we can use language models such as DistilBERT to classify personality on social media. Because of DistilBERT’s ability to process large sentences and the ability for parallelization thanks to the transformer architecture. Therefore, the proposed research will detect MBTI personality on Reddit using DistilBERT. The evaluation shows that removing stopwords on the data preprocessing stage can reduce the model’s performance, and with class imbalance handling, DistilBERT performs worse than without class imbalance handling. Also, as a comparison, DistilBERT outperforms other machine learning classifiers such as Naïve Bayes, SVM, and Logistic Regression in accuracy, precision, recall, and f1-score.

查看原文本刊更多论文

在Reddit上使用蒸馏酒进行个性检测

个性是人类拥有的一系列独特的动机、感觉和行为。社交媒体上的个性检测是计算机科学中经常进行的研究课题。通常用于人格检测研究的人格模型是大五指标(BFI)和迈尔斯-布里格斯类型指标(MBTI)模型。与BFI不同的是，MBTI模型根据个人的特征对性格进行分类，而MBTI模型则根据个人的类型对性格进行分类。因此，MBTI在一些情况下比五大模型表现得更好。许多研究使用机器学习来检测社交媒体上的个性，例如逻辑回归，Naïve贝叶斯和支持向量机。随着最近深度学习的流行，我们可以使用语言模型(如蒸馏器)对社交媒体上的个性进行分类。因为蒸馏器处理大句子的能力和转换器架构带来的并行化能力。因此，本研究将使用蒸馏器来检测Reddit上的MBTI人格。评估结果表明，在数据预处理阶段去除停止词会降低模型的性能，并且有类不平衡处理后，蒸馏器的性能比没有类不平衡处理时差。此外，作为比较，蒸馏器优于其他机器学习分类器，如Naïve贝叶斯，支持向量机和逻辑回归在准确性，精度，召回率和f1-score。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)

自引率

0.00%

发文量