Frontiers in Big DataPub Date : 2023-10-19eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1271639
Luca Clissa, Mario Lassnig, Lorenzo Rinaldi
{"title":"How big is Big Data? A comprehensive survey of data production, storage, and streaming in science and industry.","authors":"Luca Clissa, Mario Lassnig, Lorenzo Rinaldi","doi":"10.3389/fdata.2023.1271639","DOIUrl":"https://doi.org/10.3389/fdata.2023.1271639","url":null,"abstract":"<p><p>The contemporary surge in data production is fueled by diverse factors, with contributions from numerous stakeholders across various sectors. Comparing the volumes at play among different big data entities is challenging due to the scarcity of publicly available data. This survey aims to offer a comprehensive perspective on the orders of magnitude involved in yearly data generation by some public and private leading organizations, using an array of online sources for estimation. These estimates are based on meaningful, individual data production metrics and plausible per-unit sizes. The primary objective is to offer insights into the comparative scales of major big data players, their sources, and data production flows, rather than striving for precise measurements or incorporating the latest updates. The results are succinctly conveyed through a visual representation of the relative data generation volumes across these entities.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1271639"},"PeriodicalIF":3.1,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10620515/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71488775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2023-10-17eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1258051
Namita Gupta
{"title":"Editorial: Smart cities challenges, technologies and trends.","authors":"Namita Gupta","doi":"10.3389/fdata.2023.1258051","DOIUrl":"https://doi.org/10.3389/fdata.2023.1258051","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1258051"},"PeriodicalIF":3.1,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10616893/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71432285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2023-10-16eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1301942
Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini, Thea Aarrestad, Steffen Bähr, Jürgen Becker, Anne-Sophie Berthold, Richard J Bonventre, Tomás E Müller Bravo, Markus Diefenthaler, Zhen Dong, Nick Fritzsche, Amir Gholami, Ekaterina Govorkova, Dongning Guo, Kyle J Hazelwood, Christian Herwig, Babar Khan, Sehoon Kim, Thomas Klijnsma, Yaling Liu, Kin Ho Lo, Tri Nguyen, Gianantonio Pezzullo, Seyedramin Rasoulinezhad, Ryan A Rivera, Kate Scholberg, Justin Selig, Sougata Sen, Dmitri Strukov, William Tang, Savannah Thais, Kai Lukas Unger, Ricardo Vilalta, Belina von Krosigk, Shen Wang, Thomas K Warburton
{"title":"Corrigendum: Applications and techniques for fast machine learning in science.","authors":"Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini, Thea Aarrestad, Steffen Bähr, Jürgen Becker, Anne-Sophie Berthold, Richard J Bonventre, Tomás E Müller Bravo, Markus Diefenthaler, Zhen Dong, Nick Fritzsche, Amir Gholami, Ekaterina Govorkova, Dongning Guo, Kyle J Hazelwood, Christian Herwig, Babar Khan, Sehoon Kim, Thomas Klijnsma, Yaling Liu, Kin Ho Lo, Tri Nguyen, Gianantonio Pezzullo, Seyedramin Rasoulinezhad, Ryan A Rivera, Kate Scholberg, Justin Selig, Sougata Sen, Dmitri Strukov, William Tang, Savannah Thais, Kai Lukas Unger, Ricardo Vilalta, Belina von Krosigk, Shen Wang, Thomas K Warburton","doi":"10.3389/fdata.2023.1301942","DOIUrl":"https://doi.org/10.3389/fdata.2023.1301942","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.3389/fdata.2022.787421.].</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1301942"},"PeriodicalIF":3.1,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10614289/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71432284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2023-10-13eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1303367
Lin-Ching Chang, Anastasia Angelopoulou
{"title":"Editorial: Women in AI medicine and public health 2022.","authors":"Lin-Ching Chang, Anastasia Angelopoulou","doi":"10.3389/fdata.2023.1303367","DOIUrl":"https://doi.org/10.3389/fdata.2023.1303367","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1303367"},"PeriodicalIF":3.1,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10614155/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71429080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2023-10-12eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1249997
Peter Müllner, Elisabeth Lex, Markus Schedl, Dominik Kowald
{"title":"Differential privacy in collaborative filtering recommender systems: a review.","authors":"Peter Müllner, Elisabeth Lex, Markus Schedl, Dominik Kowald","doi":"10.3389/fdata.2023.1249997","DOIUrl":"10.3389/fdata.2023.1249997","url":null,"abstract":"<p><p>State-of-the-art recommender systems produce high-quality recommendations to support users in finding relevant content. However, through the utilization of users' data for generating recommendations, recommender systems threaten users' privacy. To alleviate this threat, often, differential privacy is used to protect users' data via adding random noise. This, however, leads to a substantial drop in recommendation quality. Therefore, several approaches aim to improve this trade-off between accuracy and user privacy. In this work, we first overview threats to user privacy in recommender systems, followed by a brief introduction to the differential privacy framework that can protect users' privacy. Subsequently, we review recommendation approaches that apply differential privacy, and we highlight research that improves the trade-off between recommendation quality and user privacy. Finally, we discuss open issues, e.g., considering the relation between privacy and fairness, and the users' different needs for privacy. With this review, we hope to provide other researchers an overview of the ways in which differential privacy has been applied to state-of-the-art collaborative filtering recommender systems.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1249997"},"PeriodicalIF":2.4,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10601453/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71415185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2023-10-11eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1214029
Iustina Ivanova, Mike Wald
{"title":"Climbing crags recommender system in Arco, Italy: a comparative study.","authors":"Iustina Ivanova, Mike Wald","doi":"10.3389/fdata.2023.1214029","DOIUrl":"https://doi.org/10.3389/fdata.2023.1214029","url":null,"abstract":"<p><p>Outdoor sport climbing is popular in Northern Italy due to its vast amount of rock climbing places (such as crags). New climbing crags appear yearly, creating an information overload problem for tourists who plan their sport climbing vacation. Recommender systems partly addressed this issue by suggesting climbing crags according to the most visited places or the number of suitable climbing routes. Unfortunately, these methods do not consider contextual information. However, in sport climbing, as in other outdoor activities, the possibility of visiting certain places depends on several contextual factors, for instance, a suitable season (winter/summer), parking space availability if traveling with a car, or the possibility of climbing with children if traveling with children. To address this limitation, we collected and analyzed the crag visits in Arco (Italy) from an online guidebook. We found that climbing contextual information, similar to users' content preferences, can be modeled by a correlation between recorded visits and crags features. Based on that, we developed and evaluated a novel context-aware climbing crags recommender system Visit & Climb, which consists of three stages as follows: (1) contextual information and content tastes are learned automatically from the users' logs by computing correlation between users' visits and crags' features; (2) those learned tastes are further made adjustable in a preference elicitation web interface; (3) the user receives recommendations on the map according to the number of visits made by a climber with similar learned tastes. To measure the quality of this system, we performed an offline evaluation (where we calculated Mean Average Precision, Recall, and Normalized Discounted Cumulative Gain for top-N), a formative study, and an online evaluation (in a within-subject design with experienced outdoor climbers <i>N</i> = 40, who tried three similar systems including Visit & Climb). Offline tests showed that the proposed system suggests crags to climbers accurately as the other classical models for top-N recommendations. Meanwhile, online tests indicated that the system provides a significantly higher level of information sufficiency than other systems in this domain. The overall results demonstrated that the developed system provides recommendations according to the users' requirements, and incorporating contextual information and crag characteristics into the climbing recommender system leads to increased information sufficiency caused by transparency, which improves satisfaction and use intention.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1214029"},"PeriodicalIF":3.1,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10598720/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"54232132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2023-10-06eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1245198
Deepak Kumar, Tessa Grosz, Navid Rekabsaz, Elisabeth Greif, Markus Schedl
{"title":"Fairness of recommender systems in the recruitment domain: an analysis from technical and legal perspectives.","authors":"Deepak Kumar, Tessa Grosz, Navid Rekabsaz, Elisabeth Greif, Markus Schedl","doi":"10.3389/fdata.2023.1245198","DOIUrl":"10.3389/fdata.2023.1245198","url":null,"abstract":"<p><p>Recommender systems (RSs) have become an integral part of the hiring process, be it via job advertisement ranking systems (job recommenders) for the potential employee or candidate ranking systems (candidate recommenders) for the employer. As seen in other domains, RSs are prone to harmful biases, unfair algorithmic behavior, and even discrimination in a legal sense. Some cases, such as salary equity in regards to gender (gender pay gap), stereotypical job perceptions along gendered lines, or biases toward other subgroups sharing specific characteristics in candidate recommenders, can have profound ethical and legal implications. In this survey, we discuss the current state of fairness research considering the fairness definitions (e.g., demographic parity and equal opportunity) used in recruitment-related RSs (RRSs). We investigate from a technical perspective the approaches to improve fairness, like synthetic data generation, adversarial training, protected subgroup distributional constraints, and <i>post-hoc</i> re-ranking. Thereafter, from a legal perspective, we contrast the fairness definitions and the effects of the aforementioned approaches with existing EU and US law requirements for employment and occupation, and second, we ascertain whether and to what extent EU and US law permits such approaches to improve fairness. We finally discuss the advances that RSs have made in terms of fairness in the recruitment domain, compare them with those made in other domains, and outline existing open challenges.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1245198"},"PeriodicalIF":3.1,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10587596/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49693820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2023-10-05eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1277976
Rawan Elragal, Ahmed Elragal, Abdolrasoul Habibipour
{"title":"Healthcare analytics-A literature review and proposed research agenda.","authors":"Rawan Elragal, Ahmed Elragal, Abdolrasoul Habibipour","doi":"10.3389/fdata.2023.1277976","DOIUrl":"10.3389/fdata.2023.1277976","url":null,"abstract":"<p><p>This research addresses the demanding need for research in healthcare analytics, by explaining how previous studies have used big data, AI, and machine learning to identify, address, or solve healthcare problems. Healthcare science methods are combined with contemporary data science techniques to examine the literature, identify research gaps, and propose a research agenda for researchers, academic institutions, and governmental healthcare organizations. The study contributes to the body of literature by providing a state-of-the-art review of healthcare analytics as well as proposing a research agenda to advance the knowledge in this area. The results of this research can be beneficial for both healthcare science and data science researchers as well as practitioners in the field.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1277976"},"PeriodicalIF":3.1,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10585099/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49693821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2023-09-29eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1164885
Dagny Aurich, Aida Horaniet Ibañez
{"title":"How can data visualization support interdisciplinary research? LuxTIME: studying historical exposomics in Belval.","authors":"Dagny Aurich, Aida Horaniet Ibañez","doi":"10.3389/fdata.2023.1164885","DOIUrl":"10.3389/fdata.2023.1164885","url":null,"abstract":"<p><p>The Luxembourg Time Machine (LuxTIME) is an interdisciplinary project that studies the historical exposome during the industrialization of the Minett region, located in the south of Luxembourg. Exposome research encompasses all external and internal non-genetic factors influencing the health of the population, such as air pollution, green spaces, noise, work conditions, physical activity, and diet. Due to the wide scope of the interdisciplinary project, the historical study of the exposome in Belval involved the collection of quantitative and qualitative data from the National Archive of Luxembourg, various local archives (e.g., the communes of Esch-sur-Alzette and Sanem), the National Library, the Library of National Statistics STATEC, the National Geoportal of Luxembourg, scientific data from other research centers, and information from newspapers and journals digitized in eluxemburgensia. The data collection and the resulting inventory were performed to create a proof of concept to critically test the potential of a multi-layered research design for the study of the historical exposome in Belval. The guiding navigation tool throughout the project was data visualization. It has facilitated the exploration of the data collected (or just the data) and the metadata. It has also been a valuable tool for mapping knowledge and defining the scope of the project. Furthermore, different data visualization techniques have helped us to reflect on the process of knowledge sharing, to understand how the relevance of certain topics changed throughout the project and why, and to learn about the publication process in different journals and the experience of the participants. Data visualization is used not only as a means to an end but also to embrace the idea of <i>sandcastles</i> using a speculative and process-oriented approach to advance knowledge within all research fields involved. LuxTIME has proven to be an ideal case study to explore the possibilities offered by different data visualization concepts and techniques resulting in a <i>data visualization toolbox</i> that could be evaluated and extended in other interdisciplinary projects.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1164885"},"PeriodicalIF":3.1,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10571050/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41240846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frontiers in Big DataPub Date : 2023-09-28eCollection Date: 2023-01-01DOI: 10.3389/fdata.2023.1278153
Chenwei Yan, Xinyue Fang, Xiaotong Huang, Chenyi Guo, Ji Wu
{"title":"A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph.","authors":"Chenwei Yan, Xinyue Fang, Xiaotong Huang, Chenyi Guo, Ji Wu","doi":"10.3389/fdata.2023.1278153","DOIUrl":"10.3389/fdata.2023.1278153","url":null,"abstract":"<p><p>The knowledge graph is one of the essential infrastructures of artificial intelligence. It is a challenge for knowledge engineering to construct a high-quality domain knowledge graph for multi-source heterogeneous data. We propose a complete process framework for constructing a knowledge graph that combines structured data and unstructured data, which includes data processing, information extraction, knowledge fusion, data storage, and update strategies, aiming to improve the quality of the knowledge graph and extend its life cycle. Specifically, we take the construction process of an enterprise knowledge graph as an example and integrate enterprise register information, litigation-related information, and enterprise announcement information to enrich the enterprise knowledge graph. For the unstructured text, we improve existing model to extract triples and the F1-score of our model reached 72.77%. The number of nodes and edges in our constructed enterprise knowledge graph reaches 1,430,000 and 3,170,000, respectively. Furthermore, for each type of multi-source heterogeneous data, we apply corresponding methods and strategies for information extraction and data storage and carry out a detailed comparative analysis of graph databases. From the perspective of practical use, the informative enterprise knowledge graph and its timely update can serve many actual business needs. Our proposed enterprise knowledge graph has been deployed in HuaRong RongTong (Beijing) Technology Co., Ltd. and is used by the staff as a powerful tool for corporate due diligence. The key features are reported and analyzed in the case study. Overall, this paper provides an easy-to-follow solution and practice for domain knowledge graph construction, as well as demonstrating its application in corporate due diligence.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"6 ","pages":"1278153"},"PeriodicalIF":3.1,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10569599/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41240845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}