Twitter-STMHD:多种精神健康障碍的广泛用户级数据库

International Conference on Web and Social Media Pub Date : 2022-05-31 DOI:10.1609/icwsm.v16i1.19368

Suhavi, A. Singh, Udit Arora, Somyadeep Shrivastava, Aryaveer Singh, R. Shah, P. Kumaraguru

{"title":"Twitter-STMHD:多种精神健康障碍的广泛用户级数据库","authors":"Suhavi, A. Singh, Udit Arora, Somyadeep Shrivastava, Aryaveer Singh, R. Shah, P. Kumaraguru","doi":"10.1609/icwsm.v16i1.19368","DOIUrl":null,"url":null,"abstract":"Social Media is equipped with the ability to track and quantify user behavior, establishing it as an appropriate resource for mental health studies. However, previous efforts in the area have been limited by the lack of data and contextually relevant information. There is a need for large-scale, well-labeled mental health datasets with fast reproducible methods to facilitate their heuristic growth. In this paper, we cater to this need by building the Twitter - Self-Reported Temporally-Contextual Mental Health Diagnosis Dataset (Twitter-STMHD), a large scale, user-level dataset grouped into 8 disorder categories and a companion class of control users. The dataset is 60% hand-annotated, which lead to the creation of high-precision self-reported diagnosis report patterns, used for the construction of the rest of the dataset. The dataset, instead of being a corpus of tweets, is a collection of user-profiles of those suffering from mental health disorders to provide a holistic view of the problem statement. By leveraging temporal information, the data for a given profile in the dataset has been collected for disease prevalence periods: onset of disorder, diagnosis and progression, along with a fourth period: COVID-19. This is the only and the largest dataset that captures the tweeting activity of users suffering from mental health disorders during the COVID-19 period.","PeriodicalId":175641,"journal":{"name":"International Conference on Web and Social Media","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Twitter-STMHD: An Extensive User-Level Database of Multiple Mental Health Disorders\",\"authors\":\"Suhavi, A. Singh, Udit Arora, Somyadeep Shrivastava, Aryaveer Singh, R. Shah, P. Kumaraguru\",\"doi\":\"10.1609/icwsm.v16i1.19368\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social Media is equipped with the ability to track and quantify user behavior, establishing it as an appropriate resource for mental health studies. However, previous efforts in the area have been limited by the lack of data and contextually relevant information. There is a need for large-scale, well-labeled mental health datasets with fast reproducible methods to facilitate their heuristic growth. In this paper, we cater to this need by building the Twitter - Self-Reported Temporally-Contextual Mental Health Diagnosis Dataset (Twitter-STMHD), a large scale, user-level dataset grouped into 8 disorder categories and a companion class of control users. The dataset is 60% hand-annotated, which lead to the creation of high-precision self-reported diagnosis report patterns, used for the construction of the rest of the dataset. The dataset, instead of being a corpus of tweets, is a collection of user-profiles of those suffering from mental health disorders to provide a holistic view of the problem statement. By leveraging temporal information, the data for a given profile in the dataset has been collected for disease prevalence periods: onset of disorder, diagnosis and progression, along with a fourth period: COVID-19. This is the only and the largest dataset that captures the tweeting activity of users suffering from mental health disorders during the COVID-19 period.\",\"PeriodicalId\":175641,\"journal\":{\"name\":\"International Conference on Web and Social Media\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Web and Social Media\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1609/icwsm.v16i1.19368\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Web and Social Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/icwsm.v16i1.19368","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

社交媒体具备跟踪和量化用户行为的能力，使其成为心理健康研究的适当资源。然而，由于缺乏数据和相关信息，这方面的工作受到限制。需要大规模、标记良好的心理健康数据集，并采用快速可重复的方法，以促进其启发式增长。在本文中，我们通过构建Twitter-自我报告的时间上下文心理健康诊断数据集(Twitter- stmhd)来满足这一需求，这是一个大规模的用户级数据集，分为8个障碍类别和一个同伴类的控制用户。该数据集有60%是手工注释的，这导致了高精度自报告诊断报告模式的创建，用于构建数据集的其余部分。该数据集不是tweet的语料，而是那些患有精神健康障碍的用户资料的集合，以提供问题陈述的整体视图。通过利用时间信息，收集了数据集中特定概况的疾病流行期(发病、诊断和进展)以及第四个时期(COVID-19)的数据。这是捕获COVID-19期间患有精神健康障碍的用户的推文活动的唯一和最大的数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Twitter-STMHD: An Extensive User-Level Database of Multiple Mental Health Disorders

Social Media is equipped with the ability to track and quantify user behavior, establishing it as an appropriate resource for mental health studies. However, previous efforts in the area have been limited by the lack of data and contextually relevant information. There is a need for large-scale, well-labeled mental health datasets with fast reproducible methods to facilitate their heuristic growth. In this paper, we cater to this need by building the Twitter - Self-Reported Temporally-Contextual Mental Health Diagnosis Dataset (Twitter-STMHD), a large scale, user-level dataset grouped into 8 disorder categories and a companion class of control users. The dataset is 60% hand-annotated, which lead to the creation of high-precision self-reported diagnosis report patterns, used for the construction of the rest of the dataset. The dataset, instead of being a corpus of tweets, is a collection of user-profiles of those suffering from mental health disorders to provide a holistic view of the problem statement. By leveraging temporal information, the data for a given profile in the dataset has been collected for disease prevalence periods: onset of disorder, diagnosis and progression, along with a fourth period: COVID-19. This is the only and the largest dataset that captures the tweeting activity of users suffering from mental health disorders during the COVID-19 period.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference on Web and Social Media

自引率

0.00%

发文量