应用细胞抑制和扰动对聚合遗传数据的影响

2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE) Pub Date : 2012-11-11 DOI:10.1109/BIBE.2012.6399777

A. Antoniades, J. Keane, Aristos Aristodimou, Christa Philipou, A. Constantinou, Christos Georgousopoulos, F. Tozzi, K. Kyriacou, A. Hadjisavvas, M. Loizidou, C. Demetriou, C. Pattichis

{"title":"应用细胞抑制和扰动对聚合遗传数据的影响","authors":"A. Antoniades, J. Keane, Aristos Aristodimou, Christa Philipou, A. Constantinou, Christos Georgousopoulos, F. Tozzi, K. Kyriacou, A. Hadjisavvas, M. Loizidou, C. Demetriou, C. Pattichis","doi":"10.1109/BIBE.2012.6399777","DOIUrl":null,"url":null,"abstract":"The key test for confidence in any association discovered within the medical domain is replication testing. That is, the ability of the association to be detected in independent populations. At the same time, in order to increase the likelihood of discovering statistically significant associations there is a clear need to increase the statistical power of any given study. A key methodology for increasing statistical power is through the use of as many subjects as possible that match a study's inclusion criteria. Thus many have attempted to merge data from multiple independent sources/sites/studies that contain the same inclusion criteria for subjects as a way of creating a much larger study with significantly more statistical power. For these approaches to work though data from multiple sites need to be made available to a single analysis. This practice is significantly limited by the need to respect legal and ethical requirements that are often complicated, ambiguous and inconsistent across different countries. The common approach to achieve merging of data is by sharing aggregated data rather than subject's personal data. Aggregated data however may still in some cases be reverse engineered, therefore traditionally cells within the aggregated data with small values were suppressed, and some or all of the aggregated data were perturbed in order to add noise inhibiting any attempts at identifying personal information of a specific person or sub-group in the original data. In this paper we study the effects of cell-suppression and perturbation on the results of the data analysis. Each approach is looked at by itself as well as in combination using the typical settings documented in the literature. The tests are based on a real dataset that looks for associations between phenotypes and genetic markers. This work is part of the Linked2Safety project that aims to dynamically interconnect distributed patients' data to better enable medical research efforts, whilst respecting patients' anonymity, as well as European and national legislation.","PeriodicalId":330164,"journal":{"name":"2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"The effects of applying cell-suppression and perturbation to aggregated genetic data\",\"authors\":\"A. Antoniades, J. Keane, Aristos Aristodimou, Christa Philipou, A. Constantinou, Christos Georgousopoulos, F. Tozzi, K. Kyriacou, A. Hadjisavvas, M. Loizidou, C. Demetriou, C. Pattichis\",\"doi\":\"10.1109/BIBE.2012.6399777\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The key test for confidence in any association discovered within the medical domain is replication testing. That is, the ability of the association to be detected in independent populations. At the same time, in order to increase the likelihood of discovering statistically significant associations there is a clear need to increase the statistical power of any given study. A key methodology for increasing statistical power is through the use of as many subjects as possible that match a study's inclusion criteria. Thus many have attempted to merge data from multiple independent sources/sites/studies that contain the same inclusion criteria for subjects as a way of creating a much larger study with significantly more statistical power. For these approaches to work though data from multiple sites need to be made available to a single analysis. This practice is significantly limited by the need to respect legal and ethical requirements that are often complicated, ambiguous and inconsistent across different countries. The common approach to achieve merging of data is by sharing aggregated data rather than subject's personal data. Aggregated data however may still in some cases be reverse engineered, therefore traditionally cells within the aggregated data with small values were suppressed, and some or all of the aggregated data were perturbed in order to add noise inhibiting any attempts at identifying personal information of a specific person or sub-group in the original data. In this paper we study the effects of cell-suppression and perturbation on the results of the data analysis. Each approach is looked at by itself as well as in combination using the typical settings documented in the literature. The tests are based on a real dataset that looks for associations between phenotypes and genetic markers. This work is part of the Linked2Safety project that aims to dynamically interconnect distributed patients' data to better enable medical research efforts, whilst respecting patients' anonymity, as well as European and national legislation.\",\"PeriodicalId\":330164,\"journal\":{\"name\":\"2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBE.2012.6399777\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2012.6399777","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

对医学领域内发现的任何关联进行信心测试的关键是复制测试。也就是说，这种关联在独立人群中被检测到的能力。同时，为了增加发现统计上显著关联的可能性，显然需要提高任何给定研究的统计能力。提高统计能力的一个关键方法是通过使用尽可能多的符合研究纳入标准的受试者。因此，许多人试图合并来自多个独立来源/站点/研究的数据，这些数据包含相同的受试者纳入标准，作为创建更大的研究的一种方式，具有更大的统计能力。为了使这些方法发挥作用，来自多个站点的数据需要提供给单个分析。由于需要尊重法律和道德要求，这种做法受到很大限制，这些要求在不同国家往往是复杂、模糊和不一致的。实现数据合并的常用方法是共享聚合数据，而不是主体的个人数据。然而，聚合数据在某些情况下仍可能被逆向工程，因此，传统上聚合数据中具有小值的单元被抑制，并且部分或全部聚合数据被扰动，以增加噪声，从而抑制任何试图识别原始数据中特定人员或子组的个人信息的尝试。本文研究了细胞抑制和扰动对数据分析结果的影响。每种方法都可以单独研究，也可以结合文献中记录的典型设置进行研究。这些测试基于一个真实的数据集，该数据集寻找表型和遗传标记之间的关联。这项工作是Linked2Safety项目的一部分，该项目旨在动态互联分散的患者数据，以更好地实现医学研究工作，同时尊重患者的匿名性，以及欧洲和国家立法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The effects of applying cell-suppression and perturbation to aggregated genetic data

The key test for confidence in any association discovered within the medical domain is replication testing. That is, the ability of the association to be detected in independent populations. At the same time, in order to increase the likelihood of discovering statistically significant associations there is a clear need to increase the statistical power of any given study. A key methodology for increasing statistical power is through the use of as many subjects as possible that match a study's inclusion criteria. Thus many have attempted to merge data from multiple independent sources/sites/studies that contain the same inclusion criteria for subjects as a way of creating a much larger study with significantly more statistical power. For these approaches to work though data from multiple sites need to be made available to a single analysis. This practice is significantly limited by the need to respect legal and ethical requirements that are often complicated, ambiguous and inconsistent across different countries. The common approach to achieve merging of data is by sharing aggregated data rather than subject's personal data. Aggregated data however may still in some cases be reverse engineered, therefore traditionally cells within the aggregated data with small values were suppressed, and some or all of the aggregated data were perturbed in order to add noise inhibiting any attempts at identifying personal information of a specific person or sub-group in the original data. In this paper we study the effects of cell-suppression and perturbation on the results of the data analysis. Each approach is looked at by itself as well as in combination using the typical settings documented in the literature. The tests are based on a real dataset that looks for associations between phenotypes and genetic markers. This work is part of the Linked2Safety project that aims to dynamically interconnect distributed patients' data to better enable medical research efforts, whilst respecting patients' anonymity, as well as European and national legislation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE)

自引率

0.00%

发文量