Ema Utami, Irwan Oyong, Suwanto Raharjo, Anggit Dwi Hartanto, Sumarni Adi
{"title":"印度尼西亚巴哈萨使用Twitter信息对DISC人格分类进行监督学习和重新采样技术","authors":"Ema Utami, Irwan Oyong, Suwanto Raharjo, Anggit Dwi Hartanto, Sumarni Adi","doi":"10.1108/aci-03-2021-0054","DOIUrl":null,"url":null,"abstract":"PurposeGathering knowledge regarding personality traits has long been the interest of academics and researchers in the fields of psychology and in computer science. Analyzing profile data from personal social media accounts reduces data collection time, as this method does not require users to fill any questionnaires. A pure natural language processing (NLP) approach can give decent results, and its reliability can be improved by combining it with machine learning (as shown by previous studies).Design/methodology/approachIn this, cleaning the dataset and extracting relevant potential features “as assessed by psychological experts” are essential, as Indonesians tend to mix formal words, non-formal words, slang and abbreviations when writing social media posts. For this article, raw data were derived from a predefined dominance, influence, stability and conscientious (DISC) quiz website, returning 316,967 tweets from 1,244 Twitter accounts “filtered to include only personal and Indonesian-language accounts”. Using a combination of NLP techniques and machine learning, the authors aim to develop a better approach and more robust model, especially for the Indonesian language.FindingsThe authors find that employing a SMOTETomek re-sampling technique and hyperparameter tuning boosts the model’s performance on formalized datasets by 57% (as measured through the F1-score).Originality/valueThe process of cleaning dataset and extracting relevant potential features assessed by psychological experts from it are essential because Indonesian people tend to mix formal words, non-formal words, slang words and abbreviations when writing tweets. Organic data derived from a predefined DISC quiz website resulting 1244 records of Twitter accounts and 316.967 tweets.","PeriodicalId":37348,"journal":{"name":"Applied Computing and Informatics","volume":" ","pages":""},"PeriodicalIF":12.3000,"publicationDate":"2021-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Supervised learning and resampling techniques on DISC personality classification using Twitter information in Bahasa Indonesia\",\"authors\":\"Ema Utami, Irwan Oyong, Suwanto Raharjo, Anggit Dwi Hartanto, Sumarni Adi\",\"doi\":\"10.1108/aci-03-2021-0054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"PurposeGathering knowledge regarding personality traits has long been the interest of academics and researchers in the fields of psychology and in computer science. Analyzing profile data from personal social media accounts reduces data collection time, as this method does not require users to fill any questionnaires. A pure natural language processing (NLP) approach can give decent results, and its reliability can be improved by combining it with machine learning (as shown by previous studies).Design/methodology/approachIn this, cleaning the dataset and extracting relevant potential features “as assessed by psychological experts” are essential, as Indonesians tend to mix formal words, non-formal words, slang and abbreviations when writing social media posts. For this article, raw data were derived from a predefined dominance, influence, stability and conscientious (DISC) quiz website, returning 316,967 tweets from 1,244 Twitter accounts “filtered to include only personal and Indonesian-language accounts”. Using a combination of NLP techniques and machine learning, the authors aim to develop a better approach and more robust model, especially for the Indonesian language.FindingsThe authors find that employing a SMOTETomek re-sampling technique and hyperparameter tuning boosts the model’s performance on formalized datasets by 57% (as measured through the F1-score).Originality/valueThe process of cleaning dataset and extracting relevant potential features assessed by psychological experts from it are essential because Indonesian people tend to mix formal words, non-formal words, slang words and abbreviations when writing tweets. Organic data derived from a predefined DISC quiz website resulting 1244 records of Twitter accounts and 316.967 tweets.\",\"PeriodicalId\":37348,\"journal\":{\"name\":\"Applied Computing and Informatics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":12.3000,\"publicationDate\":\"2021-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Computing and Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/aci-03-2021-0054\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/aci-03-2021-0054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Supervised learning and resampling techniques on DISC personality classification using Twitter information in Bahasa Indonesia
PurposeGathering knowledge regarding personality traits has long been the interest of academics and researchers in the fields of psychology and in computer science. Analyzing profile data from personal social media accounts reduces data collection time, as this method does not require users to fill any questionnaires. A pure natural language processing (NLP) approach can give decent results, and its reliability can be improved by combining it with machine learning (as shown by previous studies).Design/methodology/approachIn this, cleaning the dataset and extracting relevant potential features “as assessed by psychological experts” are essential, as Indonesians tend to mix formal words, non-formal words, slang and abbreviations when writing social media posts. For this article, raw data were derived from a predefined dominance, influence, stability and conscientious (DISC) quiz website, returning 316,967 tweets from 1,244 Twitter accounts “filtered to include only personal and Indonesian-language accounts”. Using a combination of NLP techniques and machine learning, the authors aim to develop a better approach and more robust model, especially for the Indonesian language.FindingsThe authors find that employing a SMOTETomek re-sampling technique and hyperparameter tuning boosts the model’s performance on formalized datasets by 57% (as measured through the F1-score).Originality/valueThe process of cleaning dataset and extracting relevant potential features assessed by psychological experts from it are essential because Indonesian people tend to mix formal words, non-formal words, slang words and abbreviations when writing tweets. Organic data derived from a predefined DISC quiz website resulting 1244 records of Twitter accounts and 316.967 tweets.
期刊介绍:
Applied Computing and Informatics aims to be timely in disseminating leading-edge knowledge to researchers, practitioners and academics whose interest is in the latest developments in applied computing and information systems concepts, strategies, practices, tools and technologies. In particular, the journal encourages research studies that have significant contributions to make to the continuous development and improvement of IT practices in the Kingdom of Saudi Arabia and other countries. By doing so, the journal attempts to bridge the gap between the academic and industrial community, and therefore, welcomes theoretically grounded, methodologically sound research studies that address various IT-related problems and innovations of an applied nature. The journal will serve as a forum for practitioners, researchers, managers and IT policy makers to share their knowledge and experience in the design, development, implementation, management and evaluation of various IT applications. Contributions may deal with, but are not limited to: • Internet and E-Commerce Architecture, Infrastructure, Models, Deployment Strategies and Methodologies. • E-Business and E-Government Adoption. • Mobile Commerce and their Applications. • Applied Telecommunication Networks. • Software Engineering Approaches, Methodologies, Techniques, and Tools. • Applied Data Mining and Warehousing. • Information Strategic Planning and Recourse Management. • Applied Wireless Computing. • Enterprise Resource Planning Systems. • IT Education. • Societal, Cultural, and Ethical Issues of IT. • Policy, Legal and Global Issues of IT. • Enterprise Database Technology.