RADAR-Pipeline: Scalable Feature Generation for Mobile Health Data

International Journal of Population Data Science Pub Date : 2024-06-10 DOI:10.23889/ijpds.v9i4.2421

H. Sankesara, Y. Ranjan, P. Conde, Z. Rashid, Akash Roy Choudhury, A. Folarin

{"title":"RADAR-Pipeline: Scalable Feature Generation for Mobile Health Data","authors":"H. Sankesara, Y. Ranjan, P. Conde, Z. Rashid, Akash Roy Choudhury, A. Folarin","doi":"10.23889/ijpds.v9i4.2421","DOIUrl":null,"url":null,"abstract":"Introduction & BackgroundRADAR-Pipeline is an open-source Python framework designed to simplify and enhance mobile health data analysis. It has been designed to efficiently read and process the large amount of data generated through the RADAR-Base platform. RADAR-base is a scalable, real-time streaming and analytics open-source platform to facilitate research access and customisation requirements. Studies using the Radar-base platform have collected fine-grained longitudinal data from wearables and phones. The data can potentially create multitudes of digital biomarkers, which can be used to inform us greatly about the disease condition. Due to the sheer size of the data, it can be difficult for researchers to read and process those data -- a common task is identifying useful features and common data processing/analysis steps previously used by the community. Up to now, these have been hand-crafted by individual data scientists, often lacking the capability to be easily reused by the community without author-specific knowledge. \nFurthermore, generating variables based on already established research on a larger scale can be challenging and could hinder replication. Hence, we have designed RADAR-Pipeline to help researchers overcome these challenges. It empowers them to create and share their data analysis and visualisation pipelines, fostering collaboration and knowledge sharing within the research community. \nObjectives & ApproachThe primary objective of RADAR-Pipeline is to offer researchers a user-friendly and powerful platform to develop and share their research. Researchers can build reusable analysis and visualisation pipelines to ensure consistent and reliable results. It simplifies big data analysis by leveraging Apache Spark to handle large and complex mobile health datasets efficiently. Researchers can also save time and effort by reusing and extending existing pipelines built by others. Finally, the RADAR-Pipeline promotes collaboration and recognition by allowing researchers to share their work through the RADAR-base Analytics Catalogue, making their pipelines citable and accessible to the wider research community. \nWhilst Radar-pipeline has been designed to read data from Radar-base, it can also be used to read data from any dataset which uses Hadoop Distributed File System (HDFS) file system namespace. \nRelevance to Digital FootprintsMobile health data is rich and valuable for understanding human behaviour and health. RADAR-Pipeline addresses the challenges associated with analysing large and complex mobile health datasets, enabling researchers to extract valuable insights that can be used to (1) Improve public health: By enabling efficient analysis of large-scale mobile health data, RADAR-Pipeline can contribute to research efforts aimed at improving population health outcomes and developing effective interventions; (2) Personalised healthcare: By facilitating the extraction of individual-level features from mobile health data, RADAR-Pipeline can seamlessly be integrated with Kafka data streams and machine learning pipelines to process the data in real-time, which can then be utilised to create more effective and targeted real-time interventions. (3) Promote reproducible research: The framework's emphasis on transparency and reproducibility in research aligns with the conference's focus on the responsible use of digital mobile health data. \nConclusions & ImplicationsRADAR-Pipeline is a valuable tool for researchers, offering them the means to harness the potential of mobile health data. By adopting this framework, researchers can achieve efficient and scalable data analysis, thereby streamlining the extracting insights from digital footprints. This efficiency enables researchers to delve deeper into the data and uncover valuable patterns and trends. \nFurthermore, RADAR-Pipeline promotes collaboration and knowledge sharing within the research community. By providing a standardised framework for data analysis, RADAR-Pipeline facilitates collaboration among researchers, leading to the sharing of best practices and the dissemination of knowledge.","PeriodicalId":507952,"journal":{"name":"International Journal of Population Data Science","volume":"111 46","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Population Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23889/ijpds.v9i4.2421","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction & BackgroundRADAR-Pipeline is an open-source Python framework designed to simplify and enhance mobile health data analysis. It has been designed to efficiently read and process the large amount of data generated through the RADAR-Base platform. RADAR-base is a scalable, real-time streaming and analytics open-source platform to facilitate research access and customisation requirements. Studies using the Radar-base platform have collected fine-grained longitudinal data from wearables and phones. The data can potentially create multitudes of digital biomarkers, which can be used to inform us greatly about the disease condition. Due to the sheer size of the data, it can be difficult for researchers to read and process those data -- a common task is identifying useful features and common data processing/analysis steps previously used by the community. Up to now, these have been hand-crafted by individual data scientists, often lacking the capability to be easily reused by the community without author-specific knowledge. Furthermore, generating variables based on already established research on a larger scale can be challenging and could hinder replication. Hence, we have designed RADAR-Pipeline to help researchers overcome these challenges. It empowers them to create and share their data analysis and visualisation pipelines, fostering collaboration and knowledge sharing within the research community. Objectives & ApproachThe primary objective of RADAR-Pipeline is to offer researchers a user-friendly and powerful platform to develop and share their research. Researchers can build reusable analysis and visualisation pipelines to ensure consistent and reliable results. It simplifies big data analysis by leveraging Apache Spark to handle large and complex mobile health datasets efficiently. Researchers can also save time and effort by reusing and extending existing pipelines built by others. Finally, the RADAR-Pipeline promotes collaboration and recognition by allowing researchers to share their work through the RADAR-base Analytics Catalogue, making their pipelines citable and accessible to the wider research community. Whilst Radar-pipeline has been designed to read data from Radar-base, it can also be used to read data from any dataset which uses Hadoop Distributed File System (HDFS) file system namespace. Relevance to Digital FootprintsMobile health data is rich and valuable for understanding human behaviour and health. RADAR-Pipeline addresses the challenges associated with analysing large and complex mobile health datasets, enabling researchers to extract valuable insights that can be used to (1) Improve public health: By enabling efficient analysis of large-scale mobile health data, RADAR-Pipeline can contribute to research efforts aimed at improving population health outcomes and developing effective interventions; (2) Personalised healthcare: By facilitating the extraction of individual-level features from mobile health data, RADAR-Pipeline can seamlessly be integrated with Kafka data streams and machine learning pipelines to process the data in real-time, which can then be utilised to create more effective and targeted real-time interventions. (3) Promote reproducible research: The framework's emphasis on transparency and reproducibility in research aligns with the conference's focus on the responsible use of digital mobile health data. Conclusions & ImplicationsRADAR-Pipeline is a valuable tool for researchers, offering them the means to harness the potential of mobile health data. By adopting this framework, researchers can achieve efficient and scalable data analysis, thereby streamlining the extracting insights from digital footprints. This efficiency enables researchers to delve deeper into the data and uncover valuable patterns and trends. Furthermore, RADAR-Pipeline promotes collaboration and knowledge sharing within the research community. By providing a standardised framework for data analysis, RADAR-Pipeline facilitates collaboration among researchers, leading to the sharing of best practices and the dissemination of knowledge.

查看原文本刊更多论文

RADAR-Pipeline：移动健康数据的可扩展特征生成

简介与背景RADAR-Pipeline 是一个开源 Python 框架，旨在简化和增强移动健康数据分析。它旨在高效读取和处理通过 RADAR-Base 平台生成的大量数据。RADAR-base 是一个可扩展的实时流和分析开源平台，可满足研究访问和定制要求。使用 RADAR-base 平台进行的研究已经从可穿戴设备和手机中收集了精细的纵向数据。这些数据可能会产生大量数字生物标记，可用于向我们提供有关疾病状况的大量信息。由于数据量巨大，研究人员很难读取和处理这些数据--一项常见的任务是识别有用的特征和社区以前使用的常见数据处理/分析步骤。迄今为止，这些都是由个别数据科学家手工制作的，往往缺乏能力，无法在没有特定作者知识的情况下被社区轻松重用。此外，在已经建立的研究基础上生成更大规模的变量可能具有挑战性，并可能阻碍复制。因此，我们设计了 RADAR-Pipeline 来帮助研究人员克服这些挑战。它使研究人员能够创建和共享他们的数据分析和可视化管道，促进研究界的合作和知识共享。目标与方法 RADAR-Pipeline 的主要目标是为研究人员提供一个用户友好、功能强大的平台，用于开发和共享他们的研究成果。研究人员可以建立可重复使用的分析和可视化管道，以确保结果的一致性和可靠性。它利用 Apache Spark 高效处理大型复杂的移动健康数据集，从而简化了大数据分析。研究人员还可以通过重用和扩展他人构建的现有管道来节省时间和精力。最后，RADAR 管道允许研究人员通过 RADAR-base 分析目录分享他们的工作，使他们的管道可被更广泛的研究社区引用和访问，从而促进合作和认可。Radar-pipeline 设计用于从雷达基地读取数据，但也可用于从使用 Hadoop 分布式文件系统 (HDFS) 文件系统命名空间的任何数据集读取数据。与数字足迹的相关性移动健康数据非常丰富，对于了解人类行为和健康状况非常有价值。RADAR-Pipeline 解决了与分析大型复杂移动健康数据集相关的挑战，使研究人员能够提取有价值的见解，用于 (1) 改善公共健康：通过对大规模移动健康数据进行高效分析，RADAR-Pipeline 可以促进旨在改善人群健康结果和开发有效干预措施的研究工作；(2) 个性化医疗保健：通过促进从移动健康数据中提取个人层面的特征，RADAR-Pipeline 可以与 Kafka 数据流和机器学习管道无缝集成，实时处理数据，然后利用这些数据制定更有效、更有针对性的实时干预措施。(3) 促进可重复研究：该框架强调研究的透明度和可重复性，这与会议对负责任地使用数字移动健康数据的关注点不谋而合。结论与启示RADAR-Pipeline 是研究人员的宝贵工具，为他们提供了利用移动健康数据潜力的手段。通过采用这一框架，研究人员可以实现高效、可扩展的数据分析，从而简化从数字足迹中提取见解的过程。这种效率使研究人员能够深入研究数据，发现有价值的模式和趋势。此外，RADAR-Pipeline 还促进了研究界的合作和知识共享。通过提供标准化的数据分析框架，RADAR-Pipeline 促进了研究人员之间的合作，实现了最佳实践的共享和知识的传播。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Population Data Science

自引率

0.00%

发文量