[Statistical methods for extremely unbalanced data in genome-wide association study (2)].

Q1 Medicine
N Xie, W J Bi, Z W Zhang, F Shao, Y Y Wei, Y Zhao, R Y Zhang, F Chen
{"title":"[Statistical methods for extremely unbalanced data in genome-wide association study (2)].","authors":"N Xie, W J Bi, Z W Zhang, F Shao, Y Y Wei, Y Zhao, R Y Zhang, F Chen","doi":"10.3760/cma.j.cn112338-20240712-00422","DOIUrl":null,"url":null,"abstract":"<p><p>Extremely unbalanced data refers to datasets with independent or dependent variables showing severe imbalances in proportions, which might lead to deviation of classical test statistics from theoretical distribution and difficulties in controlling type Ⅰ error. The increased availability of genome-wide resources from large population cohorts has highlighted the growing demand for efficient and accurate statistical methods for the process of extremely unbalanced data to improve the development of genetic statistical methods. This paper introduces two widely used correction methods in current genome-wide association study for extremely unbalanced data, i.e. Firth correction and saddle point approximation, describes their effectiveness in controlling type Ⅰ errors confirmed by simulation experiments, finally, and summarizes the commonly used software for extremely unbalanced genomic data to provide theoretical reference and suggestion for its application for the statistical analysis on extremely unbalanced data in future.</p>","PeriodicalId":23968,"journal":{"name":"中华流行病学杂志","volume":"46 1","pages":"147-153"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"中华流行病学杂志","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3760/cma.j.cn112338-20240712-00422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Extremely unbalanced data refers to datasets with independent or dependent variables showing severe imbalances in proportions, which might lead to deviation of classical test statistics from theoretical distribution and difficulties in controlling type Ⅰ error. The increased availability of genome-wide resources from large population cohorts has highlighted the growing demand for efficient and accurate statistical methods for the process of extremely unbalanced data to improve the development of genetic statistical methods. This paper introduces two widely used correction methods in current genome-wide association study for extremely unbalanced data, i.e. Firth correction and saddle point approximation, describes their effectiveness in controlling type Ⅰ errors confirmed by simulation experiments, finally, and summarizes the commonly used software for extremely unbalanced genomic data to provide theoretical reference and suggestion for its application for the statistical analysis on extremely unbalanced data in future.

[全基因组关联研究中极度不平衡数据的统计方法[2]]。
极度不平衡数据是指自变量或因变量比例严重不平衡的数据集,这可能导致经典检验统计量偏离理论分布,难以控制Ⅰ型误差。来自大群体群体的全基因组资源的可得性增加,突出了对有效和准确的统计方法的日益增长的需求,用于处理极不平衡的数据,以改善遗传统计方法的发展。本文介绍了目前全基因组关联研究中对极度不平衡数据广泛使用的两种校正方法,即Firth校正和鞍点近似,并描述了它们在控制模拟实验证实的Ⅰ型误差方面的有效性。并对极端不平衡基因组数据的常用软件进行了总结,为其今后在极端不平衡数据统计分析中的应用提供理论参考和建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
中华流行病学杂志
中华流行病学杂志 Medicine-Medicine (all)
CiteScore
5.60
自引率
0.00%
发文量
8981
期刊介绍: Chinese Journal of Epidemiology, established in 1981, is an advanced academic periodical in epidemiology and related disciplines in China, which, according to the principle of integrating theory with practice, mainly reports the major progress in epidemiological research. The columns of the journal include commentary, expert forum, original article, field investigation, disease surveillance, laboratory research, clinical epidemiology, basic theory or method and review, etc.  The journal is included by more than ten major biomedical databases and index systems worldwide, such as been indexed in Scopus, PubMed/MEDLINE, PubMed Central (PMC), Europe PubMed Central, Embase, Chemical Abstract, Chinese Science and Technology Paper and Citation Database (CSTPCD), Chinese core journal essentials overview, Chinese Science Citation Database (CSCD) core database, Chinese Biological Medical Disc (CBMdisc), and Chinese Medical Citation Index (CMCI), etc. It is one of the core academic journals and carefully selected core journals in preventive and basic medicine in China.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信