一种新颖的距离,可以减少连续字符的信息损失

IF 2 4区 地球科学 Q1 Earth and Planetary Sciences
Gerardo A. Lo Valvo, Oscar E. R. Lehmann, Diego Balseiro
{"title":"一种新颖的距离,可以减少连续字符的信息损失","authors":"Gerardo A. Lo Valvo, Oscar E. R. Lehmann, Diego Balseiro","doi":"10.26879/1250","DOIUrl":null,"url":null,"abstract":"The calculation of pairwise distances is a fundamental step in many statistical analyses in biology and paleontology. The most commonly used distances work with a single observation per object and character, but there are scenarios where multiple observations are available per object. In these situations, the information for the character spans an interval, and pairs of objects can have overlapping intervals, which further complicates the distance calculation. Some coefficients can deal with this wealth of information but are either too coarse to provide detailed results or too computationally demanding for even moderately large data sets. Here, we present the Distance Between Intervals (DBI) as a novel semi-metric distance that can accommodate both singular and multiple observations per object by analyzing them as intervals. The DBI ranges from 0 to 1 when there is an overlap between the objects and from 1 to infinity when there is no overlap between them. It is easy to calculate and can be applied to a wide variety of data types. Both simulated and empirical test cases show that the DBI correctly ranks pairs of objects by their level of overlap and non-overlap, while other distances struggle to do it. Therefore, the DBI can provide a finer level of definition than other available distances for empirical data sets, while generally agreeing with the broad results they provide. An implementation of DBI is provided for the R program-ming language.","PeriodicalId":56100,"journal":{"name":"Palaeontologia Electronica","volume":"1 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel distance that reduces information loss in continuous characters with few observations\",\"authors\":\"Gerardo A. Lo Valvo, Oscar E. R. Lehmann, Diego Balseiro\",\"doi\":\"10.26879/1250\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The calculation of pairwise distances is a fundamental step in many statistical analyses in biology and paleontology. The most commonly used distances work with a single observation per object and character, but there are scenarios where multiple observations are available per object. In these situations, the information for the character spans an interval, and pairs of objects can have overlapping intervals, which further complicates the distance calculation. Some coefficients can deal with this wealth of information but are either too coarse to provide detailed results or too computationally demanding for even moderately large data sets. Here, we present the Distance Between Intervals (DBI) as a novel semi-metric distance that can accommodate both singular and multiple observations per object by analyzing them as intervals. The DBI ranges from 0 to 1 when there is an overlap between the objects and from 1 to infinity when there is no overlap between them. It is easy to calculate and can be applied to a wide variety of data types. Both simulated and empirical test cases show that the DBI correctly ranks pairs of objects by their level of overlap and non-overlap, while other distances struggle to do it. Therefore, the DBI can provide a finer level of definition than other available distances for empirical data sets, while generally agreeing with the broad results they provide. An implementation of DBI is provided for the R program-ming language.\",\"PeriodicalId\":56100,\"journal\":{\"name\":\"Palaeontologia Electronica\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Palaeontologia Electronica\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.26879/1250\",\"RegionNum\":4,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Earth and Planetary Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Palaeontologia Electronica","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.26879/1250","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Earth and Planetary Sciences","Score":null,"Total":0}
引用次数: 0

摘要

两两距离的计算是生物学和古生物学中许多统计分析的基本步骤。最常用的距离适用于每个对象和角色的单个观察,但也存在每个对象可用多个观察的场景。在这些情况下,角色的信息跨越一个区间,而成对的对象可能有重叠的区间,这进一步使距离计算变得复杂。有些系数可以处理这些丰富的信息,但要么过于粗糙,无法提供详细的结果,要么对于中等规模的数据集来说计算要求太高。在这里,我们提出了间隔距离(DBI)作为一种新的半度量距离,它可以通过将单个和多个观测值分析为间隔来容纳每个对象。当对象之间有重叠时,DBI的取值范围为0 ~ 1;当对象之间没有重叠时,DBI的取值范围为1 ~∞。它易于计算,并且可以应用于各种数据类型。模拟和经验测试用例都表明,DBI根据对象的重叠和非重叠程度正确地对它们进行排序,而其他距离则很难做到这一点。因此,DBI可以为经验数据集提供比其他可用距离更精细的定义,同时通常与它们提供的广泛结果一致。为R编程语言提供了DBI的实现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A novel distance that reduces information loss in continuous characters with few observations
The calculation of pairwise distances is a fundamental step in many statistical analyses in biology and paleontology. The most commonly used distances work with a single observation per object and character, but there are scenarios where multiple observations are available per object. In these situations, the information for the character spans an interval, and pairs of objects can have overlapping intervals, which further complicates the distance calculation. Some coefficients can deal with this wealth of information but are either too coarse to provide detailed results or too computationally demanding for even moderately large data sets. Here, we present the Distance Between Intervals (DBI) as a novel semi-metric distance that can accommodate both singular and multiple observations per object by analyzing them as intervals. The DBI ranges from 0 to 1 when there is an overlap between the objects and from 1 to infinity when there is no overlap between them. It is easy to calculate and can be applied to a wide variety of data types. Both simulated and empirical test cases show that the DBI correctly ranks pairs of objects by their level of overlap and non-overlap, while other distances struggle to do it. Therefore, the DBI can provide a finer level of definition than other available distances for empirical data sets, while generally agreeing with the broad results they provide. An implementation of DBI is provided for the R program-ming language.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Palaeontologia Electronica
Palaeontologia Electronica 地学-古生物学
CiteScore
3.60
自引率
0.00%
发文量
20
审稿时长
>12 weeks
期刊介绍: Founded in 1997, Palaeontologia Electronica (PE) is the longest running open-access, peer-reviewed electronic journal and covers all aspects of palaeontology. PE uses an external double-blind peer review system for all manuscripts. Copyright of scientific papers is held by one of the three sponsoring professional societies at the author''s choice. Reviews, commentaries, and other material is placed in the public domain. PE papers comply with regulations for taxonomic nomenclature established in the International Code of Zoological Nomenclature and the International Code of Nomenclature for Algae, Fungi, and Plants.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信