Aitchison’s Compositional Data Analysis 40 Years on: A Reappraisal

IF 3.9 1区 数学 Q1 STATISTICS & PROBABILITY
M. Greenacre, E. Grunsky, J. Bacon-Shone, Ionas Erb, T. Quinn
{"title":"Aitchison’s Compositional Data Analysis 40 Years on: A Reappraisal","authors":"M. Greenacre, E. Grunsky, J. Bacon-Shone, Ionas Erb, T. Quinn","doi":"10.1214/22-sts880","DOIUrl":null,"url":null,"abstract":"The development of John Aitchison's approach to compositional data analysis is followed since his paper read to the Royal Statistical Society in 1982. Aitchison's logratio approach, which was proposed to solve the problematic aspects of working with data with a fixed sum constraint, is summarized and reappraised. It is maintained that the properties on which this approach was originally built, the main one being subcompositional coherence, are not required to be satisfied exactly -- quasi-coherence is sufficient, that is near enough to being coherent for all practical purposes. This opens up the field to using simpler data transformations, such as power transformations, that permit zero values in the data. The additional property of exact isometry, which was subsequently introduced and not in Aitchison's original conception, imposed the use of isometric logratio transformations, but these are complicated and problematic to interpret, involving ratios of geometric means. If this property is regarded as important in certain analytical contexts, for example unsupervised learning, it can be relaxed by showing that regular pairwise logratios, as well as the alternative quasi-coherent transformations, can also be quasi-isometric, meaning they are close enough to exact isometry for all practical purposes. It is concluded that the isometric and related logratio transformations such as pivot logratios are not a prerequisite for good practice, although many authors insist on their obligatory use. This conclusion is fully supported here by case studies in geochemistry and in genomics, where the good performance is demonstrated of pairwise logratios, as originally proposed by Aitchison, or Box-Cox power transforms of the original compositions where no zero replacements are necessary.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2022-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Science","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/22-sts880","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 11

Abstract

The development of John Aitchison's approach to compositional data analysis is followed since his paper read to the Royal Statistical Society in 1982. Aitchison's logratio approach, which was proposed to solve the problematic aspects of working with data with a fixed sum constraint, is summarized and reappraised. It is maintained that the properties on which this approach was originally built, the main one being subcompositional coherence, are not required to be satisfied exactly -- quasi-coherence is sufficient, that is near enough to being coherent for all practical purposes. This opens up the field to using simpler data transformations, such as power transformations, that permit zero values in the data. The additional property of exact isometry, which was subsequently introduced and not in Aitchison's original conception, imposed the use of isometric logratio transformations, but these are complicated and problematic to interpret, involving ratios of geometric means. If this property is regarded as important in certain analytical contexts, for example unsupervised learning, it can be relaxed by showing that regular pairwise logratios, as well as the alternative quasi-coherent transformations, can also be quasi-isometric, meaning they are close enough to exact isometry for all practical purposes. It is concluded that the isometric and related logratio transformations such as pivot logratios are not a prerequisite for good practice, although many authors insist on their obligatory use. This conclusion is fully supported here by case studies in geochemistry and in genomics, where the good performance is demonstrated of pairwise logratios, as originally proposed by Aitchison, or Box-Cox power transforms of the original compositions where no zero replacements are necessary.
艾奇逊40年来的成分数据分析:再评价
自1982年约翰·艾奇逊的论文在英国皇家统计学会上发表以来,他对成分数据分析方法的发展一直受到关注。艾奇逊的logratio方法是为了解决在固定和约束下处理数据的问题而提出的,它被总结和重新评估。有人认为,这种方法最初建立的性质,主要是亚成分相干性,并不需要完全满足——准相干性是足够的,即接近于所有实际目的的相干性。这使得该字段可以使用更简单的数据转换,例如允许数据中的零值的幂转换。精确等距的附加性质随后被引入,而不是艾奇逊最初的概念,强制使用等距的logratio变换,但这些是复杂的和有问题的解释,涉及几何平均的比率。如果这个性质在某些分析环境中被认为是重要的,例如无监督学习,那么它可以通过显示规则的成对logratios以及可选的准相干变换也可以是准等距的来放松,这意味着它们在所有实际目的中都足够接近精确的等距。结论是,等距和相关的坐标变换,如枢轴坐标变换,并不是良好实践的先决条件,尽管许多作者坚持必须使用它们。这一结论在这里得到了地球化学和基因组学案例研究的充分支持,在这些研究中,Aitchison最初提出的两两logratios或原始成分的Box-Cox幂变换的良好性能得到了证明,其中不需要零替换。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Statistical Science
Statistical Science 数学-统计学与概率论
CiteScore
6.50
自引率
1.80%
发文量
40
审稿时长
>12 weeks
期刊介绍: The central purpose of Statistical Science is to convey the richness, breadth and unity of the field by presenting the full range of contemporary statistical thought at a moderate technical level, accessible to the wide community of practitioners, researchers and students of statistics and probability.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信