Bernard J. Jansen , Joni Salminen , Soon-gyo Jung , Hind Almerekhi
{"title":"The illusion of data validity: Why numbers about people are likely wrong","authors":"Bernard J. Jansen , Joni Salminen , Soon-gyo Jung , Hind Almerekhi","doi":"10.1016/j.dim.2022.100020","DOIUrl":"10.1016/j.dim.2022.100020","url":null,"abstract":"<div><p>This reflection article addresses a difficulty faced by scholars and practitioners working with numbers about people, which is that <em>those who study people want numerical data about these people. Unfortunately, time and time again, this numerical data about people is wrong.</em> Addressing the potential causes of this wrongness, we present examples of analyzing people numbers, i.e., numbers derived from digital data by or about people, and discuss the comforting illusion of data validity. We first lay a foundation by highlighting potential inaccuracies in collecting people data, such as selection bias. Then, we discuss inaccuracies in analyzing people data, such as the flaw of averages, followed by a discussion of errors that are made when trying to make sense of people data through techniques such as posterior labeling. Finally, we discuss a root cause of people data often being wrong – the conceptual conundrum of thinking the numbers are <em>counts</em> when they are actually <em>measures</em>. Practical solutions to address this illusion of data validity are proposed. The implications for theories derived from people data are also highlighted, namely that these people theories are generally wrong as they are often derived from people numbers that are wrong.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"6 4","pages":"Article 100020"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122001188/pdfft?md5=ea40bf274d8b7c53f1cf69e1a4a2e214&pid=1-s2.0-S2543925122001188-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82717217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hind Almerekhi , Haewoon Kwak , Joni Salminen , Bernard J. Jansen
{"title":"PROVOKE: Toxicity trigger detection in conversations from the top 100 subreddits","authors":"Hind Almerekhi , Haewoon Kwak , Joni Salminen , Bernard J. Jansen","doi":"10.1016/j.dim.2022.100019","DOIUrl":"10.1016/j.dim.2022.100019","url":null,"abstract":"<div><p>Promoting healthy discourse on community-based online platforms like Reddit can be challenging, especially when conversations show ominous signs of toxicity. Therefore, in this study, we find the turning points (i.e., toxicity triggers) making conversations toxic. Before finding toxicity triggers, we built and evaluated various machine learning models to detect toxicity from Reddit comments.</p><p>Subsequently, we used our best-performing model, a fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model that achieved an area under the receiver operating characteristic curve (AUC) score of 0.983 to detect toxicity. Next, we constructed conversation threads and used the toxicity prediction results to build a training set for detecting toxicity triggers. This procedure entailed using our large-scale dataset to refine toxicity triggers' definition and build a trigger detection dataset using 991,806 conversation threads from the top 100 communities on Reddit. Then, we extracted a set of sentiment shift, topical shift, and context-based features from the trigger detection dataset, using them to build a dual embedding biLSTM neural network that achieved an AUC score of 0.789. Our trigger detection dataset analysis showed that specific triggering keywords are common across all communities, like ‘racist’ and ‘women’. In contrast, other triggering keywords are specific to certain communities, like ‘overwatch’ in r/Games. Implications are that toxicity trigger detection algorithms can leverage generic approaches but must also tailor detections to specific communities.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"6 4","pages":"Article 100019"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122001176/pdfft?md5=a441c48620ba8685678f44afdb856b82&pid=1-s2.0-S2543925122001176-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85301907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of interoperability, security and usability of digital repositories in Kenyan Institutions of Higher Learning","authors":"Johnson Mulongo Masinde , Otuoma Sanya","doi":"10.1016/j.dim.2022.100011","DOIUrl":"10.1016/j.dim.2022.100011","url":null,"abstract":"<div><p>Kenya has experienced a significant growth in the number of institutional repositories in the recent past. The number grew from a paltry two (2) in 2009 to 42 in August 2020. The growth is a positive indicator as repositories play a crucial role in solving some of the problems experienced in the broader area of scholarly communication. This study sought to establish the current extent to which institutions of higher learning in Kenya have established and implemented digital repositories, from a technical perspective. To achieve this goal, the study undertook a technical analysis of institutional repositories implemented by accredited universities in Kenya by the Commission for University Education as at June 2020. The analysis focused on numerous metrics on interoperability, security and usability of the analyzed institutional repositories. The study employed an exploratory approach to collecting the data. The data collected was stored on a MySQL database using the PhpMyAdmin tool. Data analysis was done by SQL querying and the result set copied to MS Excel for generation of graphical visualizations. From a total of 49 institutions examined, 34 (69%) had institutional repositories while 15 (39%) did not have institutional repositories. All the 34 institutions with repositories were using Dspace software. Of all the metrics analyzed, the study established that most of the institutional repositories did not implement essential features that improve interoperability, security and usability of their repository platforms. The study recommends either further training for repository managers or outsourcing of the technical process of establishing and maintaining functional institutional repositories. We further recommend more comprehensive studies to cover all the aspects of the FAIR principles of data management in Kenya.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"6 4","pages":"Article 100011"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122001097/pdfft?md5=fd66351c4bd813fb9395e180deb5f70d&pid=1-s2.0-S2543925122001097-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88986941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gain-framed product descriptions are more appealing to elderly consumers in live streaming E-commerce: Implications from a controlled experiment","authors":"Zhumo Sun , Shiting Fu , Tingting Jiang","doi":"10.1016/j.dim.2022.100022","DOIUrl":"10.1016/j.dim.2022.100022","url":null,"abstract":"<div><p>Live streaming e-commerce has become increasingly popular among elderly consumers. This new form of online shopping allows the elderly, who might be less effective in making purchase decisions than younger people, to better understand the products sold through the comprehensive descriptions provided by the anchors. This study is interested in investigating the effects of the gain-loss framing of product descriptions on the elderly's purchase intention. A total of 36 participants between the ages of 60 and 70 were invited to watch a number of live streaming videos involving either gain- or loss-framed product descriptions in a controlled experiment. The results show that the gain-framed descriptions of the products engendered significantly higher purchase intention among the participants than the loss-framed ones. In particular, the gain-framed descriptions were effective for the participants with high approach motivation, but not for those with low approach motivation, which suggests the significant moderating effect of approach motivation. This study focused on the elderly customers whose life quality can be greatly enhanced by live streaming e-commerce. The findings not only add to the knowledge about the effects of message framing, but also provide useful implications for live-streaming e-commerce practitioners to increase the persuasiveness of their product descriptions.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"6 4","pages":"Article 100022"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122001206/pdfft?md5=8a433638ccccae0818a89fa816f9ac93&pid=1-s2.0-S2543925122001206-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73773429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Empowering linked data in cultural heritage institutions: A knowledge management perspective","authors":"Lei Zhang","doi":"10.1016/j.dim.2022.100013","DOIUrl":"10.1016/j.dim.2022.100013","url":null,"abstract":"<div><p>This reported research explores the barriers and challenges in linked data implementation in cultural heritage institutions, i.e., libraries, archives, and museums. Various data were collected from different sources regarding the linked data use cases related to libraries, archives, and museums over the past decade and analyzed from multiple facets. The analysis revealed very few activities of effective knowledge management in the linked data implementation and suggested that the crucial role of knowledge management and innovation should deserve enough attention in linked data projects and services. The findings will add value to the literature on knowledge management in the context of linked data and the semantic web.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"6 3","pages":"Article 100013"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122001115/pdfft?md5=ec643c8a122444273c5fb106e23ab65f&pid=1-s2.0-S2543925122001115-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75079078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge management and innovation","authors":"Lu An, Alton Y.K. Chua, Md Anwarul Islam","doi":"10.1016/j.dim.2022.100018","DOIUrl":"10.1016/j.dim.2022.100018","url":null,"abstract":"","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"6 3","pages":"Article 100018"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122001164/pdfft?md5=4a5c7ccb600c67f9a5032cd2d910de22&pid=1-s2.0-S2543925122001164-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76928293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of influencing factors of knowledge dissemination and sharing based on the SEIRR model","authors":"Yiwen Zhou","doi":"10.1016/j.dim.2022.100010","DOIUrl":"10.1016/j.dim.2022.100010","url":null,"abstract":"<div><p>To study the key factors affecting knowledge sharing and knowledge dissemination is helpful to promote knowledge innovation. Based on the probability of acquiring knowledge from the environment, learning willingness, learning ability, and the speed of forgetting knowledge, a knowledge dissemination model SEIRR(Susceptible, Exposed, Infectious, Recovered with the knowledge and Recovered without the knowledge) is proposed. Through the simulation network analysis, the influence of the above four factors on knowledge transmission and sharing is revealed. It is found that improving the probability of acquiring knowledge from the environment, learning willingness, and learning ability can help improve the effect of knowledge dissemination and knowledge sharing. Reducing the speed of forgetting knowledge is also effective. Thus, it is helpful to improve knowledge sharing by establishing open libraries and databases, promoting groups' learning willingness, cultivating groups' learning ability, and strengthening groups' application of knowledge.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"6 3","pages":"Article 100010"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122001085/pdfft?md5=4438cfab3ba69bdbe62a5f043a0090da&pid=1-s2.0-S2543925122001085-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76204999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenghao Liu , Zhijian Zhang , Xi Zeng , Huakui Lv
{"title":"Representation and association of Chinese financial equity knowledge driven by multilayer ontology","authors":"Zhenghao Liu , Zhijian Zhang , Xi Zeng , Huakui Lv","doi":"10.1016/j.dim.2022.100009","DOIUrl":"10.1016/j.dim.2022.100009","url":null,"abstract":"<div><p>Aiming at the current situation of complex financial ownership structure and isolated data organization, this study referring to the methods for multi-layer hierarchical construct domain ontology modeling. At the same time, the three dimensions of industry, company and internal environment were integrated, and the concept cube was designed and constructed based on knowledge extraction and text classification technology, so as to provide a multi-level and fine-grained knowledge representation and association method for financial equity knowledge. The experimental results show that conceptual cube structure represents semantic information as a dense low-dimensional representation vector, which greatly enhances semantic relevance and interpretability. The multi-layer ontology-driven ownership structure reflects a variety of knowledge association patterns, and in the “Intelligent Financial Big Data System” developed by the research team, the association query of three categories of association relationships in the field of industry, enterprise and internal environment is realized, as well as the dynamic analysis and supervision of typical financial management problems.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"6 3","pages":"Article 100009"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122001073/pdfft?md5=b611425c68045abd8898e051c965c03b&pid=1-s2.0-S2543925122001073-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88830961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kun Huang, Xiaoyu Wang, Shichao Luo, Qiuping Su, Lei Li
{"title":"What difficulties did the college students encountered in information seeking during the COVID-19 pandemic?","authors":"Kun Huang, Xiaoyu Wang, Shichao Luo, Qiuping Su, Lei Li","doi":"10.1016/j.dim.2022.100005","DOIUrl":"10.1016/j.dim.2022.100005","url":null,"abstract":"<div><p>To better promote information service and fight the infodemic, this paper investigated the difficulties that Chinese college students encountered in information seeking during the COVID-19 pandemic. We collected data in two stages. In the first stage in November 2020, we collected data from <em>the Foundation of Information Science</em> course. 54 college students who took the course completed an assignment to illustrate their information needs and difficulties during the pandemic. In the second stage in March 2021, trough convenience sampling we conducted an online survey by WenJuanXing. The participants were required to answer the same question as the question in the first stage. We collected 204 valid responses. Then, based on the search task difficulty reason scheme proposed by Liu et al. (2015) (denoted LKC15), we used content analysis to code the responses to analyze the difficulties that Chinese students encountered. LKC15's difficulty reasons were classified from three aspects: user, task, and user-task interaction. The findings indicated that 14 of the 21 difficulty reasons in LKC15 were identified in this study. Moreover, we added 17 new Difficulty reasons to revise the scheme. The difficulty reasons of user-task interaction were mentioned most frequently. In terms of user-task interaction, the difficulty reasons related to document features were mentioned most frequently, followed by the search results. Finally, it provided some suggestions and discussed the directions for future study.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"6 2","pages":"Article 100005"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9015718/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10432918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The geopolitical and socioeconomic factors of digitization in Vietnam: Technology adoption in the art and cultural sector during the COVID-19 pandemic","authors":"Emma Duester","doi":"10.1016/j.dim.2022.100012","DOIUrl":"https://doi.org/10.1016/j.dim.2022.100012","url":null,"abstract":"<div><p>This paper argues that there is an emergent digital culture in the art and cultural sector in Hanoi, which is producing a paradigm shift in the nature of work for cultural professionals, the way of preserving and displaying art collections as well as the nature of international connections. The advent of the ‘fourth industrial revolution’ in Vietnam has brought about advances in digitization. While this transition is crucial in achieving national sustainable development goals, Vietnam remains at a disadvantage on a global scale due to country-specific challenges in digitization, including lack of human, technical, and financial resources. These challenges are hindering the pace and quality of the digitization process and impeding cultural professionals' ability to utilize digital platforms. In addition, the global digital divide is having impacts on access, inclusion and representation. This shows that the challenges faced in the digitization process are not only about access to technology but also about much more deep-seated issues related to culture, history, and social inequalities. This issue has become especially pertinent during the Covid-19 pandemic and has highlighted digital inequalities in access and inclusion.</p><p>The research draws on 20 semi-structured interviews with cultural professionals across Hanoi. The interviews were carried out during the Covid-19 pandemic and addressed its impact on digitization projects as well as use of digital technologies for work. The findings show how geopolitical and socioeconomic factors can supress the ability to adopt new digital technologies and hinder the ability to exploit the opportunities of digitization. The Covid-19 pandemic has allowed more time to focus on digitization projects and to utilize digital tools and platforms, especially with free open-source platforms such as Facebook. This has become one route towards exploiting the opportunities of digitization for increased exposure, inclusion in the discourse, and creation of digital resources on Vietnamese art and culture.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"6 2","pages":"Article 100012"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122001103/pdfft?md5=f8657c70ac1f12daa26d054816f4edda&pid=1-s2.0-S2543925122001103-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137115674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}