IEEE data descriptions最新文献

筛选
英文 中文
Descriptor: Synthetic Genomic Dataset With Diverse Ancestry (SynGen6). 描述:具有不同祖先的合成基因组数据集(SynGen6)。
IEEE data descriptions Pub Date : 2024-01-01 Epub Date: 2024-11-26 DOI: 10.1109/ieeedata.2024.3505852
Xinyue Wang, Sitao Min, Jaideep Vaidya
{"title":"Descriptor: <i>Synthetic Genomic Dataset With Diverse Ancestry (SynGen6)</i>.","authors":"Xinyue Wang, Sitao Min, Jaideep Vaidya","doi":"10.1109/ieeedata.2024.3505852","DOIUrl":"https://doi.org/10.1109/ieeedata.2024.3505852","url":null,"abstract":"<p><p>Advancements in genomic analysis techniques and data-driven research are driving precision medicine. However, in many cases, these advances are not equitable and do not help all subpopulations, since many existing genomic datasets lack diversity, limiting their applicability for studying populations beyond those of European ancestry. Thus, to advance genomic analysis and to allow for a fair benchmarking of novel proposed approaches, there is a significant demand for balanced and representative datasets. To address this issue, we developed, <i>SynGen6</i>, a synthetic dataset that encompasses six distinct populations, providing balanced representation across various ancestry groups. Using the <i>All of Us</i> dataset as a foundation, we utilized principal component analysis (PCA) and <i>ϵ</i>-local differential privacy (LDP) to generate synthetic samples while preserving genetic diversity and the privacy of individuals. To further enhance the dataset, we simulated phenotype vectors associated with significant single nucleotide polymorphisms (SNPs), mirroring real-world gene-disease associations. We also generated synthetic SNPs to watermark the dataset, enabling verification of cloud-based genomic computations for accuracy. Last, synthetic relatives were created to support research on kinship inference and family-based genomic analyses, resulting in a comprehensive dataset of 34 200 samples and 7120 SNPs across six populations. In this article, we describe the dataset and provide the Python scripts used to generate the dataset, which can be extended to create additional synthetic datasets, aiming to fuel advancements in genomic data analysis.</p>","PeriodicalId":520344,"journal":{"name":"IEEE data descriptions","volume":"2 ","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12007885/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144052525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Descriptor: Benchmarking Secure Neural Network Evaluation Methods for Protein Sequence Classification (iDASH24). 描述:蛋白质序列分类的基准安全神经网络评估方法(iDASH24)。
IEEE data descriptions Pub Date : 2024-01-01 Epub Date: 2024-10-17 DOI: 10.1109/ieeedata.2024.3482283
Arif Harmanci, Luyao Chen, Miran Kim, Xiaoqian Jiang
{"title":"Descriptor: <i>Benchmarking Secure Neural Network Evaluation Methods for Protein Sequence Classification (iDASH24)</i>.","authors":"Arif Harmanci, Luyao Chen, Miran Kim, Xiaoqian Jiang","doi":"10.1109/ieeedata.2024.3482283","DOIUrl":"10.1109/ieeedata.2024.3482283","url":null,"abstract":"<p><p>To uniformly test and benchmark the secure evaluation of transformer-based models, we designed the iDASH24 homomorphic encryption track dataset. The dataset comprises a protein family classification model with a transformer architecture and an example dataset that is used to build and test the secure evaluation strategies. This dataset was used in the challenge period of iDASH24 Genomic Privacy Competition, where the teams designed secure evaluation of the classification model using a homomorphic encryption scheme. Combined with the benchmarking results and companion methods, iDASH24 dataset is a unique resource that can be used to benchmark secure evaluation of neural network models.</p>","PeriodicalId":520344,"journal":{"name":"IEEE data descriptions","volume":"1 ","pages":"109-112"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11660429/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142879574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信