Johannes Hirn , Verónica Sanz , José Enrique García , Marta Goberna , Alicia Montesinos-Navarro , José Antonio Navarro-Cano , Ricardo Sánchez-Martín , Alfonso Valiente-Banuet , Miguel Verdú
{"title":"Transfer learning of species co-occurrence patterns between plant communities","authors":"Johannes Hirn , Verónica Sanz , José Enrique García , Marta Goberna , Alicia Montesinos-Navarro , José Antonio Navarro-Cano , Ricardo Sánchez-Martín , Alfonso Valiente-Banuet , Miguel Verdú","doi":"10.1016/j.ecoinf.2024.102826","DOIUrl":null,"url":null,"abstract":"<div><h3>Aim</h3><div>The use of neural networks (NNs) is spreading to all areas of life, and Ecology is no exception. However, the data-hungry nature of NNs can leave out many small, valuable datasets. Here we show how to apply transfer learning to rescue small datasets that can be invaluable in understanding patterns of species co-occurrence.</div></div><div><h3>Location</h3><div>Semiarid plant communities in Spain and México.</div></div><div><h3>Time period</h3><div>2016–2022.</div></div><div><h3>Major taxa studied</h3><div>Angiosperms.</div></div><div><h3>Methods</h3><div>Based on a large sample of plant species co-occurrence in vegetation patches in a semi-arid area of eastern Spain, we fit a generative artificial intelligence (AI) model that correctly reproduces which species live with which in these patches. Subsequently, we train the same type of model on two communities for which we only have smaller datasets (another semi-arid community in eastern Spain, and a tropical community in Mexico).</div></div><div><h3>Results</h3><div>When we transfer the knowledge learnt from the large dataset directly to the other two, the predictions improve for the community more similar to our reference one. As for the more dissimilar community, improving the accuracy of the transfer requires a further tuning of the model to the local data. In particular, the knowledge transferred relates primarily to species frequency and, to a lesser extent, to their phylogenetic relationships, which are known to be determinants of species interaction patterns.</div></div><div><h3>Main conclusions</h3><div>This AI-based approach can be performed for communities similar or not so similar to the reference community, opening the door to systematic transfer learning for accurate predictions on small datasets. Interestingly, this transfer operates by matching unrelated species between the origin and target datasets, implying that arbitrary datasets can then be transferred to, or even combined in order to augment each other, irrespective of the species involved, potentially allowing such models to be applied to a wide range of plant communities in different climates.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"83 ","pages":"Article 102826"},"PeriodicalIF":5.8000,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954124003686","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Aim
The use of neural networks (NNs) is spreading to all areas of life, and Ecology is no exception. However, the data-hungry nature of NNs can leave out many small, valuable datasets. Here we show how to apply transfer learning to rescue small datasets that can be invaluable in understanding patterns of species co-occurrence.
Location
Semiarid plant communities in Spain and México.
Time period
2016–2022.
Major taxa studied
Angiosperms.
Methods
Based on a large sample of plant species co-occurrence in vegetation patches in a semi-arid area of eastern Spain, we fit a generative artificial intelligence (AI) model that correctly reproduces which species live with which in these patches. Subsequently, we train the same type of model on two communities for which we only have smaller datasets (another semi-arid community in eastern Spain, and a tropical community in Mexico).
Results
When we transfer the knowledge learnt from the large dataset directly to the other two, the predictions improve for the community more similar to our reference one. As for the more dissimilar community, improving the accuracy of the transfer requires a further tuning of the model to the local data. In particular, the knowledge transferred relates primarily to species frequency and, to a lesser extent, to their phylogenetic relationships, which are known to be determinants of species interaction patterns.
Main conclusions
This AI-based approach can be performed for communities similar or not so similar to the reference community, opening the door to systematic transfer learning for accurate predictions on small datasets. Interestingly, this transfer operates by matching unrelated species between the origin and target datasets, implying that arbitrary datasets can then be transferred to, or even combined in order to augment each other, irrespective of the species involved, potentially allowing such models to be applied to a wide range of plant communities in different climates.
期刊介绍:
The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change.
The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.