构建安全的人类研究平台:计算机科学家的重要性

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing Pub Date : 2017-06-26 DOI:10.1145/3078597.3078618

J. Lane

{"title":"构建安全的人类研究平台:计算机科学家的重要性","authors":"J. Lane","doi":"10.1145/3078597.3078618","DOIUrl":null,"url":null,"abstract":"Businesses and government are using new approaches to decision-making. They are exploiting new streams of (mostly) digital personal data, such as daily transaction records, web-browsing data, cell phone location data, and social media activity; and they are applying new analytical models and tools. Social science researchers, who are not trained in the stewardship of these new kinds of data, must now collect, manage and use them appropriately. There are many technical challenges: disparate datasets must be ingested, their provenance determined and metadata documented. Researchers must be able to query datasets to know what data are available and how they can be used. Datasets must be joined in a scientific manner, which means that workflows need to be traced and managed in such a way that the research can be replicated(Lane, 2017). Computer scientists' expertise is of critical value in many of these areas, but of greatest interest to this group is the facilities in which data on human subjects are stored. The data must be securely housed, and privacy and confidentiality must be protected using the best approaches available. The access and use must be documented to meet the needs of data providers. Yet the technology currently used to provide access to sensitive data is largely artisanal and manual. The stewardship restrictions placed on the use of confidential administrative data prevent the use of best practices for research data management. As a result, links between data sources are rarely validated, results often are not replicated, and connected datasets, results, and methods are not accessible to subsequent researchers in the same field. This is where computer scientists' expertise can come to play in building approaches that will enable sensitive data from different sources to be discovered, integrated, and analyzed in a carefully controlled manner, and that will, furthermore, allow researchers to share analysis methods, results, and expertise in ways not easily possible today","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Building Secure Platforms for Research on Human Subjects: The Importance of Computer Scientists\",\"authors\":\"J. Lane\",\"doi\":\"10.1145/3078597.3078618\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Businesses and government are using new approaches to decision-making. They are exploiting new streams of (mostly) digital personal data, such as daily transaction records, web-browsing data, cell phone location data, and social media activity; and they are applying new analytical models and tools. Social science researchers, who are not trained in the stewardship of these new kinds of data, must now collect, manage and use them appropriately. There are many technical challenges: disparate datasets must be ingested, their provenance determined and metadata documented. Researchers must be able to query datasets to know what data are available and how they can be used. Datasets must be joined in a scientific manner, which means that workflows need to be traced and managed in such a way that the research can be replicated(Lane, 2017). Computer scientists' expertise is of critical value in many of these areas, but of greatest interest to this group is the facilities in which data on human subjects are stored. The data must be securely housed, and privacy and confidentiality must be protected using the best approaches available. The access and use must be documented to meet the needs of data providers. Yet the technology currently used to provide access to sensitive data is largely artisanal and manual. The stewardship restrictions placed on the use of confidential administrative data prevent the use of best practices for research data management. As a result, links between data sources are rarely validated, results often are not replicated, and connected datasets, results, and methods are not accessible to subsequent researchers in the same field. This is where computer scientists' expertise can come to play in building approaches that will enable sensitive data from different sources to be discovered, integrated, and analyzed in a carefully controlled manner, and that will, furthermore, allow researchers to share analysis methods, results, and expertise in ways not easily possible today\",\"PeriodicalId\":436194,\"journal\":{\"name\":\"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3078597.3078618\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3078597.3078618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

企业和政府正在使用新的决策方法。他们正在利用新的(大部分)数字个人数据流，如日常交易记录、网页浏览数据、手机位置数据和社交媒体活动;他们正在应用新的分析模型和工具。没有受过管理这些新数据培训的社会科学研究人员现在必须适当地收集、管理和使用这些数据。有许多技术挑战:必须摄取不同的数据集，确定它们的来源并记录元数据。研究人员必须能够查询数据集，以了解哪些数据是可用的，以及如何使用它们。数据集必须以科学的方式连接，这意味着需要以一种可以复制研究的方式跟踪和管理工作流(Lane, 2017)。计算机科学家的专业知识在许多这些领域都具有关键价值，但这一群体最感兴趣的是存储人类受试者数据的设施。数据必须安全存放，必须使用可用的最佳方法保护隐私和机密性。访问和使用必须记录，以满足数据提供者的需要。然而，目前用于访问敏感数据的技术主要是手工和手动的。对使用机密行政数据的管理限制阻碍了对研究数据管理最佳做法的使用。因此，数据源之间的联系很少得到验证，结果通常无法复制，并且同一领域的后续研究人员无法访问连接的数据集、结果和方法。这就是计算机科学家的专业知识可以发挥作用的地方，他们可以建立方法，使来自不同来源的敏感数据能够以一种谨慎控制的方式被发现、集成和分析，而且，这将允许研究人员以今天不容易实现的方式共享分析方法、结果和专业知识

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Building Secure Platforms for Research on Human Subjects: The Importance of Computer Scientists

Businesses and government are using new approaches to decision-making. They are exploiting new streams of (mostly) digital personal data, such as daily transaction records, web-browsing data, cell phone location data, and social media activity; and they are applying new analytical models and tools. Social science researchers, who are not trained in the stewardship of these new kinds of data, must now collect, manage and use them appropriately. There are many technical challenges: disparate datasets must be ingested, their provenance determined and metadata documented. Researchers must be able to query datasets to know what data are available and how they can be used. Datasets must be joined in a scientific manner, which means that workflows need to be traced and managed in such a way that the research can be replicated(Lane, 2017). Computer scientists' expertise is of critical value in many of these areas, but of greatest interest to this group is the facilities in which data on human subjects are stored. The data must be securely housed, and privacy and confidentiality must be protected using the best approaches available. The access and use must be documented to meet the needs of data providers. Yet the technology currently used to provide access to sensitive data is largely artisanal and manual. The stewardship restrictions placed on the use of confidential administrative data prevent the use of best practices for research data management. As a result, links between data sources are rarely validated, results often are not replicated, and connected datasets, results, and methods are not accessible to subsequent researchers in the same field. This is where computer scientists' expertise can come to play in building approaches that will enable sensitive data from different sources to be discovered, integrated, and analyzed in a carefully controlled manner, and that will, furthermore, allow researchers to share analysis methods, results, and expertise in ways not easily possible today

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

自引率

0.00%

发文量