基于网络浏览历史的集成学习网络成瘾与心理健康预测

Proceedings of the 3rd International Conference on Software Engineering and Information Management Pub Date : 2020-01-12 DOI:10.1145/3378936.3378947

B. Purwandari, Wayan Surya Wibawa, Nilam Fitriah, Mellia Christia, D. R. Bintari

{"title":"基于网络浏览历史的集成学习网络成瘾与心理健康预测","authors":"B. Purwandari, Wayan Surya Wibawa, Nilam Fitriah, Mellia Christia, D. R. Bintari","doi":"10.1145/3378936.3378947","DOIUrl":null,"url":null,"abstract":"The widespread prevalence of Web browsing may lead to Internet Addiction Disorder (IAD), which impacts negatively on Web users' general health. Young people who are very active online are prone to suffer from IAD. It negatively affects their academic performance and social lives. The earlier the detection, the better the treatment. Therefore, this pilot study aimed to predict IAD among the youth to encourage early treatment. The sample included 30 undergraduate students at Universitas Indonesia (UI). Their Web browsing histories for five weeks were recorded from their laptops and analyzed using the support vector machine (SVM) with radial basis function (RBF) kernel as a machine learning method for prediction. The results were subsequently compared using ensemble learning, such as random forest (RF) and gradient boosting (GB). It was then matched with respondents' responses to an Internet Addiction Test (IAT) questionnaire, which measures IAD levels. Respondents' general health data were collected with the 12-item General Health Questionnaire (GHQ-12). Features from Web browsing histories were extracted to classify activities in five types. These are information retrieval (IR), instant messaging (IM), social networking services (SNS), leisure, and online shopping (OS). The extracted features became input to classify participants' IAD. The results were compared with their IAD results from the IAT questionnaire. Machine learning was also employed to classify the input into respondents' general health (GH) status, which was matched with their responses to the GHQ-12 questionnaire. The findings revealed that the prediction accuracies were 66.67% for the IAD status and 65.17% for the GH status employing SVM. The precisions for predicting IAD and GH were 63.33% and 44.33%, according to RF. Moreover, the accuracies were 63.33% and 67.17%, according to GB. Results indicated that RF decreased prediction accuracies, but GB was slightly different from SVM. For each classifier, IAD status was predicted more accurately than GH status. An alternative to improve the outcomes is gaining data from the Internet firewall instead of the Web browsing history from users' laptops. It can provide richer and more realistic records of Web access, which are collected from any devices connected to the university's computer networks. However, it requires consent from the participants and authority managing the infrastructure. If each class has a balanced example, we plan to add more features and employ other types of ensemble learning for higher accuracy. Furthermore, performing a multiclass prediction can demonstrate specific IAD severity levels and the class of mental health status, i.e., anxiety and depression.","PeriodicalId":304149,"journal":{"name":"Proceedings of the 3rd International Conference on Software Engineering and Information Management","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Internet Addiction and Mental Health Prediction Using Ensemble Learning Based on Web Browsing History\",\"authors\":\"B. Purwandari, Wayan Surya Wibawa, Nilam Fitriah, Mellia Christia, D. R. Bintari\",\"doi\":\"10.1145/3378936.3378947\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The widespread prevalence of Web browsing may lead to Internet Addiction Disorder (IAD), which impacts negatively on Web users' general health. Young people who are very active online are prone to suffer from IAD. It negatively affects their academic performance and social lives. The earlier the detection, the better the treatment. Therefore, this pilot study aimed to predict IAD among the youth to encourage early treatment. The sample included 30 undergraduate students at Universitas Indonesia (UI). Their Web browsing histories for five weeks were recorded from their laptops and analyzed using the support vector machine (SVM) with radial basis function (RBF) kernel as a machine learning method for prediction. The results were subsequently compared using ensemble learning, such as random forest (RF) and gradient boosting (GB). It was then matched with respondents' responses to an Internet Addiction Test (IAT) questionnaire, which measures IAD levels. Respondents' general health data were collected with the 12-item General Health Questionnaire (GHQ-12). Features from Web browsing histories were extracted to classify activities in five types. These are information retrieval (IR), instant messaging (IM), social networking services (SNS), leisure, and online shopping (OS). The extracted features became input to classify participants' IAD. The results were compared with their IAD results from the IAT questionnaire. Machine learning was also employed to classify the input into respondents' general health (GH) status, which was matched with their responses to the GHQ-12 questionnaire. The findings revealed that the prediction accuracies were 66.67% for the IAD status and 65.17% for the GH status employing SVM. The precisions for predicting IAD and GH were 63.33% and 44.33%, according to RF. Moreover, the accuracies were 63.33% and 67.17%, according to GB. Results indicated that RF decreased prediction accuracies, but GB was slightly different from SVM. For each classifier, IAD status was predicted more accurately than GH status. An alternative to improve the outcomes is gaining data from the Internet firewall instead of the Web browsing history from users' laptops. It can provide richer and more realistic records of Web access, which are collected from any devices connected to the university's computer networks. However, it requires consent from the participants and authority managing the infrastructure. If each class has a balanced example, we plan to add more features and employ other types of ensemble learning for higher accuracy. Furthermore, performing a multiclass prediction can demonstrate specific IAD severity levels and the class of mental health status, i.e., anxiety and depression.\",\"PeriodicalId\":304149,\"journal\":{\"name\":\"Proceedings of the 3rd International Conference on Software Engineering and Information Management\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd International Conference on Software Engineering and Information Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3378936.3378947\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Software Engineering and Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3378936.3378947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

网络浏览的广泛流行可能导致网络成瘾，对网络用户的整体健康产生负面影响。上网非常活跃的年轻人容易患上网络成瘾症。这会对他们的学习成绩和社会生活产生负面影响。发现得越早，治疗得越好。因此，本初步研究旨在预测青少年的IAD，以鼓励早期治疗。样本包括印尼大学(Universitas Indonesia)的30名本科生。从他们的笔记本电脑上记录他们五周的网络浏览历史，并使用支持向量机(SVM)和径向基函数(RBF)核作为机器学习预测方法进行分析。随后使用集成学习(如随机森林(RF)和梯度增强(GB))对结果进行比较。然后将其与受访者对网络成瘾测试(IAT)问卷的回答相匹配，该问卷测量了IAD的水平。调查对象的一般健康数据由12项一般健康问卷(GHQ-12)收集。从Web浏览历史中提取特征，将活动分为五种类型。它们分别是信息检索(IR)、即时通讯(IM)、社交网络服务(SNS)、休闲、网上购物(OS)。提取的特征作为对被试IAD进行分类的输入。将结果与IAT问卷的IAD结果进行比较。机器学习还用于将输入分类为受访者的一般健康(GH)状态，这与他们对GHQ-12问卷的回答相匹配。结果表明，支持向量机对IAD状态的预测准确率为66.67%，对GH状态的预测准确率为65.17%。根据RF预测IAD和GH的精密度分别为63.33%和44.33%。根据国标，准确率分别为63.33%和67.17%。结果表明，RF降低了预测精度，但GB与SVM略有不同。对于每个分类器，IAD状态的预测比GH状态更准确。改善结果的另一种方法是从互联网防火墙获取数据，而不是从用户的笔记本电脑中获取Web浏览历史记录。它可以提供更丰富、更真实的网络访问记录，这些记录是从连接到大学计算机网络的任何设备收集的。但是，它需要获得参与者和管理基础设施的机构的同意。如果每个类都有一个平衡的例子，我们计划添加更多的特征，并采用其他类型的集成学习来提高准确性。此外，进行多类别预测可以显示特定的IAD严重程度和心理健康状况类别，即焦虑和抑郁。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Internet Addiction and Mental Health Prediction Using Ensemble Learning Based on Web Browsing History

The widespread prevalence of Web browsing may lead to Internet Addiction Disorder (IAD), which impacts negatively on Web users' general health. Young people who are very active online are prone to suffer from IAD. It negatively affects their academic performance and social lives. The earlier the detection, the better the treatment. Therefore, this pilot study aimed to predict IAD among the youth to encourage early treatment. The sample included 30 undergraduate students at Universitas Indonesia (UI). Their Web browsing histories for five weeks were recorded from their laptops and analyzed using the support vector machine (SVM) with radial basis function (RBF) kernel as a machine learning method for prediction. The results were subsequently compared using ensemble learning, such as random forest (RF) and gradient boosting (GB). It was then matched with respondents' responses to an Internet Addiction Test (IAT) questionnaire, which measures IAD levels. Respondents' general health data were collected with the 12-item General Health Questionnaire (GHQ-12). Features from Web browsing histories were extracted to classify activities in five types. These are information retrieval (IR), instant messaging (IM), social networking services (SNS), leisure, and online shopping (OS). The extracted features became input to classify participants' IAD. The results were compared with their IAD results from the IAT questionnaire. Machine learning was also employed to classify the input into respondents' general health (GH) status, which was matched with their responses to the GHQ-12 questionnaire. The findings revealed that the prediction accuracies were 66.67% for the IAD status and 65.17% for the GH status employing SVM. The precisions for predicting IAD and GH were 63.33% and 44.33%, according to RF. Moreover, the accuracies were 63.33% and 67.17%, according to GB. Results indicated that RF decreased prediction accuracies, but GB was slightly different from SVM. For each classifier, IAD status was predicted more accurately than GH status. An alternative to improve the outcomes is gaining data from the Internet firewall instead of the Web browsing history from users' laptops. It can provide richer and more realistic records of Web access, which are collected from any devices connected to the university's computer networks. However, it requires consent from the participants and authority managing the infrastructure. If each class has a balanced example, we plan to add more features and employ other types of ensemble learning for higher accuracy. Furthermore, performing a multiclass prediction can demonstrate specific IAD severity levels and the class of mental health status, i.e., anxiety and depression.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 3rd International Conference on Software Engineering and Information Management

自引率

0.00%

发文量