Cillian Hourican, Jie Li, Pashupati P Mishra, Terho Lehtimäki, Binisha H Mishra, Mika Kähönen, Olli T Raitakari, Reijo Laaksonen, Liisa Keltikangas-Järvinen, Markus Juonala, Rick Quax
{"title":"在高维数据集中识别协同关联的高效搜索算法","authors":"Cillian Hourican, Jie Li, Pashupati P Mishra, Terho Lehtimäki, Binisha H Mishra, Mika Kähönen, Olli T Raitakari, Reijo Laaksonen, Liisa Keltikangas-Järvinen, Markus Juonala, Rick Quax","doi":"10.3390/e26110968","DOIUrl":null,"url":null,"abstract":"<p><p>In recent years, there has been a notably increased interest in the study of multivariate interactions and emergent higher-order dependencies. This is particularly evident in the context of identifying synergistic sets, which are defined as combinations of elements whose joint interactions result in the emergence of information that is not present in any individual subset of those elements. The scalability of frameworks such as partial information decomposition (PID) and those based on multivariate extensions of mutual information, such as O-information, is limited by combinational explosion in the number of sets that must be assessed. In order to address these challenges, we propose a novel approach that utilises stochastic search strategies in order to identify synergistic triplets within datasets. Furthermore, the methodology is extensible to larger sets and various synergy measures. By employing stochastic search, our approach circumvents the constraints of exhaustive enumeration, offering a scalable and efficient means to uncover intricate dependencies. The flexibility of our method is illustrated through its application to two epidemiological datasets: The Young Finns Study and the UK Biobank Nuclear Magnetic Resonance (NMR) data. Additionally, we present a heuristic for reducing the number of synergistic sets to analyse in large datasets by excluding sets with overlapping information. We also illustrate the risks of performing a feature selection before assessing synergistic information in the system.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"26 11","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11592859/pdf/","citationCount":"0","resultStr":"{\"title\":\"Efficient Search Algorithms for Identifying Synergistic Associations in High-Dimensional Datasets.\",\"authors\":\"Cillian Hourican, Jie Li, Pashupati P Mishra, Terho Lehtimäki, Binisha H Mishra, Mika Kähönen, Olli T Raitakari, Reijo Laaksonen, Liisa Keltikangas-Järvinen, Markus Juonala, Rick Quax\",\"doi\":\"10.3390/e26110968\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In recent years, there has been a notably increased interest in the study of multivariate interactions and emergent higher-order dependencies. This is particularly evident in the context of identifying synergistic sets, which are defined as combinations of elements whose joint interactions result in the emergence of information that is not present in any individual subset of those elements. The scalability of frameworks such as partial information decomposition (PID) and those based on multivariate extensions of mutual information, such as O-information, is limited by combinational explosion in the number of sets that must be assessed. In order to address these challenges, we propose a novel approach that utilises stochastic search strategies in order to identify synergistic triplets within datasets. Furthermore, the methodology is extensible to larger sets and various synergy measures. By employing stochastic search, our approach circumvents the constraints of exhaustive enumeration, offering a scalable and efficient means to uncover intricate dependencies. The flexibility of our method is illustrated through its application to two epidemiological datasets: The Young Finns Study and the UK Biobank Nuclear Magnetic Resonance (NMR) data. Additionally, we present a heuristic for reducing the number of synergistic sets to analyse in large datasets by excluding sets with overlapping information. We also illustrate the risks of performing a feature selection before assessing synergistic information in the system.</p>\",\"PeriodicalId\":11694,\"journal\":{\"name\":\"Entropy\",\"volume\":\"26 11\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11592859/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Entropy\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.3390/e26110968\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e26110968","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
Efficient Search Algorithms for Identifying Synergistic Associations in High-Dimensional Datasets.
In recent years, there has been a notably increased interest in the study of multivariate interactions and emergent higher-order dependencies. This is particularly evident in the context of identifying synergistic sets, which are defined as combinations of elements whose joint interactions result in the emergence of information that is not present in any individual subset of those elements. The scalability of frameworks such as partial information decomposition (PID) and those based on multivariate extensions of mutual information, such as O-information, is limited by combinational explosion in the number of sets that must be assessed. In order to address these challenges, we propose a novel approach that utilises stochastic search strategies in order to identify synergistic triplets within datasets. Furthermore, the methodology is extensible to larger sets and various synergy measures. By employing stochastic search, our approach circumvents the constraints of exhaustive enumeration, offering a scalable and efficient means to uncover intricate dependencies. The flexibility of our method is illustrated through its application to two epidemiological datasets: The Young Finns Study and the UK Biobank Nuclear Magnetic Resonance (NMR) data. Additionally, we present a heuristic for reducing the number of synergistic sets to analyse in large datasets by excluding sets with overlapping information. We also illustrate the risks of performing a feature selection before assessing synergistic information in the system.
期刊介绍:
Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.