An Online Community of Data Enthusiasts Collaborates to Seek, Share, and Make Sense of Data

IF 0.4 Q4 INFORMATION SCIENCE & LIBRARY SCIENCE

Evidence Based Library and Information Practice Pub Date : 2023-03-15 DOI:10.18438/eblip30280

Jordan Patterson

{"title":"An Online Community of Data Enthusiasts Collaborates to Seek, Share, and Make Sense of Data","authors":"Jordan Patterson","doi":"10.18438/eblip30280","DOIUrl":null,"url":null,"abstract":"A Review of:\nStvilia, B., & Gibradze, L. (2022). Seeking and sharing datasets in an online community of data enthusiasts. Library & Information Science Research 44(3). https://doi.org/10.1016/j.lisr.2022.101160\nObjective – To understand the major activities, tools, sources, and challenges of online communities focused on datasets.\nDesign – Content analysis informed by activity theory.\nSetting – The r/Datasets subreddit, a web forum for sharing, seeking, and discussing datasets.\nSubjects – 1232 “hot” or “top” discussion threads (1232 original posts and 6813 responding comments) first posted between 2010 and 2020.\nMethods – The researchers used Reddit’s API to collect their sample of threads. Using a random subset of the sample, the researchers developed a coding scheme for content analysis, which identified major themes in the data. Through this process, they controlled for quality: each researcher coded half the subset independently, then together evaluated their intercoder reliability and discussed and resolved disagreements. The researchers also employed labelled latent Dirchlet allocation to construct topic models corresponding to the theme’s manual content analysis, which produced profiles of the top 100 terms most likely to appear in that topic. Finally, the researchers extracted URLs from threads in the sample to ascertain types of information and data sources used by the community. Presenting their findings, the researchers discussed notable themes and proposed a metadata model for describing datasets, the Data Q&A metadata (DQAM) model.\nMain Results – The r/Datasets community engages in three distinct activities: asking and answering questions, disseminating information, and community building. The closely related Q&A and dissemination activities shared themes of obtaining and aggregating data, sensemaking, collaborating and crowdsourcing, and data evaluation. Community members frequently discussed tools, competencies, and sources for data work. Major challenges for members of the community related to the general themes of data quality, accessibility, ethics, and legality. A proposed 16-element metadata schema should meet the needs of data enthusiasts.\nConclusion – The content analysis reveals a dedicated community engaged in an array of data-seeking and data-sharing activities. Data producers should be mindful of how their data can be accessed and used outside of their original professional or scholarly contexts.","PeriodicalId":45227,"journal":{"name":"Evidence Based Library and Information Practice","volume":" ","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evidence Based Library and Information Practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18438/eblip30280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

A Review of: Stvilia, B., & Gibradze, L. (2022). Seeking and sharing datasets in an online community of data enthusiasts. Library & Information Science Research 44(3). https://doi.org/10.1016/j.lisr.2022.101160 Objective – To understand the major activities, tools, sources, and challenges of online communities focused on datasets. Design – Content analysis informed by activity theory. Setting – The r/Datasets subreddit, a web forum for sharing, seeking, and discussing datasets. Subjects – 1232 “hot” or “top” discussion threads (1232 original posts and 6813 responding comments) first posted between 2010 and 2020. Methods – The researchers used Reddit’s API to collect their sample of threads. Using a random subset of the sample, the researchers developed a coding scheme for content analysis, which identified major themes in the data. Through this process, they controlled for quality: each researcher coded half the subset independently, then together evaluated their intercoder reliability and discussed and resolved disagreements. The researchers also employed labelled latent Dirchlet allocation to construct topic models corresponding to the theme’s manual content analysis, which produced profiles of the top 100 terms most likely to appear in that topic. Finally, the researchers extracted URLs from threads in the sample to ascertain types of information and data sources used by the community. Presenting their findings, the researchers discussed notable themes and proposed a metadata model for describing datasets, the Data Q&A metadata (DQAM) model. Main Results – The r/Datasets community engages in three distinct activities: asking and answering questions, disseminating information, and community building. The closely related Q&A and dissemination activities shared themes of obtaining and aggregating data, sensemaking, collaborating and crowdsourcing, and data evaluation. Community members frequently discussed tools, competencies, and sources for data work. Major challenges for members of the community related to the general themes of data quality, accessibility, ethics, and legality. A proposed 16-element metadata schema should meet the needs of data enthusiasts. Conclusion – The content analysis reveals a dedicated community engaged in an array of data-seeking and data-sharing activities. Data producers should be mindful of how their data can be accessed and used outside of their original professional or scholarly contexts.

查看原文本刊更多论文

一个在线社区的数据爱好者合作寻找，共享，并使数据的意义

综述：Stvilia，B.和Gibradze，L.（2022）。在数据爱好者的在线社区中寻找和共享数据集。图书馆与信息科学研究44（3）。https://doi.org/10.1016/j.lisr.2022.101160Objective–了解以数据集为重点的在线社区的主要活动、工具、来源和挑战。设计——根据活动理论进行内容分析。设置–r/Datasets子版块reddit，一个用于共享、查找和讨论数据集的网络论坛。受试者——2010年至2020年间首次发布的1232条“热门”或“热门”讨论话题（1232条原创帖子和6813条回复评论）。方法——研究人员使用Reddit的API收集他们的话题样本。研究人员使用样本的随机子集，开发了一种用于内容分析的编码方案，该方案确定了数据中的主要主题。通过这个过程，他们控制了质量：每个研究人员独立地对一半的子集进行编码，然后一起评估他们的代码间可靠性，并讨论和解决分歧。研究人员还使用标记的潜在Dirchlet分配来构建与主题的手动内容分析相对应的主题模型，该模型生成了最有可能出现在该主题中的前100个术语的简介。最后，研究人员从样本中的线程中提取URL，以确定社区使用的信息和数据源的类型。在介绍他们的发现时，研究人员讨论了值得注意的主题，并提出了一个用于描述数据集的元数据模型，即数据问答元数据（DQAM）模型。主要结果–r/Datasets社区参与三项不同的活动：询问和回答问题、传播信息和社区建设。密切相关的问答和传播活动共有获取和汇总数据、感知、合作和众包以及数据评估等主题。社区成员经常讨论数据工作的工具、能力和来源。社区成员面临的主要挑战涉及数据质量、可访问性、道德和合法性等一般主题。所提出的16元素元数据模式应该满足数据爱好者的需求。结论——内容分析显示，一个专门从事一系列数据搜索和数据共享活动的社区。数据生产者应注意如何在其原始专业或学术背景之外访问和使用其数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊