{"title":"Probabilistic crop type mapping for ex-ante modelling and spatial disaggregation","authors":"Josef Baumert, Thomas Heckelei, Hugo Storm","doi":"10.1016/j.ecoinf.2024.102836","DOIUrl":"10.1016/j.ecoinf.2024.102836","url":null,"abstract":"<div><div>Agricultural land use and management fundamentally impacts the condition of natural resources like waterbodies, soils, and biodiversity. Modelling the anthropogenic effects on those resources over time requires detailed knowledge of the temporal and spatial distribution of crops. However, currently available crop type maps for Europe either lack the required spatial resolution or the temporal and spatial coverage. We develop and apply a probabilistic, spatially explicit crop type mapping approach that is suitable for ex-post and ex-ante modelling. The approach allows to quantify epistemic and aleatoric uncertainty related to estimated crop shares by providing an ensemble of maps. We implement the method for the EU-28 for the years 2010 – 2020, distinguishing between 28 different crop types at 1 km resolution. Based on a model of the data generating process that conceptually links field-, grid cell- and region-level crop acreages, our approach considers soil, climate, and topography information, as well as administrative data. The validation with ground-truthing data for France indicates that the generated crop type maps are plausible. The provided uncertainty intervals capture differences in uncertainty across space and time and correctly identify grid cells and crops where estimations are less precise. The generated maps constitute a unique data product of high practical value, e.g., for agri-environmental modelling applications. We see additional potential in using the approach to disaggregate the regional or national predictions of socio-economic ex-ante prediction models.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"83 ","pages":"Article 102836"},"PeriodicalIF":5.8,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142419593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessment of nomadic pastoralists' livelihood vulnerability to the changing climate in the Third Pole region: Case study in the Altai Mountains of western Mongolia","authors":"Altansukh Ochir , Woo-Kyun Lee , Sonam Wangyel Wang , Otgonbayar Demberel , Undarmaa Enkhsaikhan , Byambadash Turbat , Munkhnasan Lamchin , Bayarmaa Munkhbat , Oyunchimeg Namsrai","doi":"10.1016/j.ecoinf.2024.102835","DOIUrl":"10.1016/j.ecoinf.2024.102835","url":null,"abstract":"<div><div>The High Mountains of Asia, often called the “Third Pole” because they constitute the third largest reserve of water after the North and South Poles, are an important landscape worldwide. Western Mongolia forms part of the northeastern extent of the Third Pole, characterized by high mountain ranges and river catchment areas. The ecosystems in these high mountains, including the nomads that inhabit them, are fragile and vulnerable to environmental changes. In this study, we conducted household interviews with nomads in the Tsambagarav (TsGM) and the Munkhkhairkhan (MKhM) Mountains and, used a sustainable livelihood approach to assess the livelihood vulnerability index (LVI) of the nomads. The results showed that the overall LVI was 0.41 for TsGM and 0.44 for MKhM, with corresponding Intergovernmental Panel on Climate Change-LVI of 0.01 for TsGM and − 0.02 for MKhM. Based on the findings, we recommend that decision-makers should focus on several key areas: effectively managing pasture land; implementing policies for sustainable yields; establishing an insurance-based compensation system, post-disaster communication system; and a mobile-economy informative early warning system; and lowering the loan interest rate. Among recommendations, developing a mobile-economy informative early warning system is an innovative idea to mitigate climate change disasters. These actions can contribute to a long-term sustainable livelihood in the fast-changing climate.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"83 ","pages":"Article 102835"},"PeriodicalIF":5.8,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142419594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samuel A. Cushman , Zaneta M. Kaszta , Patrick Burns , Christopher R. Hakkenberg , Patrick Jantz , David W. Macdonald , Jedediah F. Brodie , Mairin C.M. Deith , Scott Goetz
{"title":"Simulating multi-scale optimization and variable selection in species distribution modeling","authors":"Samuel A. Cushman , Zaneta M. Kaszta , Patrick Burns , Christopher R. Hakkenberg , Patrick Jantz , David W. Macdonald , Jedediah F. Brodie , Mairin C.M. Deith , Scott Goetz","doi":"10.1016/j.ecoinf.2024.102832","DOIUrl":"10.1016/j.ecoinf.2024.102832","url":null,"abstract":"<div><div>Species distribution modeling (SDM) is a fundamental tool in theoretical and applied ecology. However, relatively little is known about the performance of different approaches for scale optimization, model selection, and algorithmic prediction in the context of nonlinear, multiscale and interactive relationships between environmental variables and species occurrence. Modelers often struggle to optimize a tradeoff between ecological relevance, model robustness, complexity, and overfitting. In this paper, we investigated several methods designed to optimize spatial scale and variable selection in SDMs, in each case evaluating model fitness, parsimony and predictive performance. We used a simulation approach to produce a large pool of alternative underlying habitat relationships that reflect a broad range of realistic habitat associations. We also compared several different modeling algorithms, including logistic regression with a generalized linear model (GLM), Lasso and Elastic-Net Regularized GLMs (GLMNet), and random forest (RF), as well as alternative variable and scale selection methods. We found that GLM methods employing all-subsets dredge routines for variable selection were consistently the best predictors based on all criteria of our model performance assessment and across all attributes of the simulated underlying relationship, including nonlinearity and interaction. We had expected machine learning approaches, such as random forest, to perform better in these more complex forms of species-environment relationships. GLM using dredge variable selection was also the method that included the fewest spurious covariates and included the most correct predictors as a proportion of all predictors. We found that univariate scaling was the most robust method of variable and scale selection, along with Minimal Redundancy Maximal Relevancy (MRMR) which performed equivalently. The simulation experiment presented here provides a robust assessment of simulated multi-species distribution model performance, complexity and fidelity. By simulating a large range of potential habitat relationships with varying spatial scale, effect sizes, linearity, and interactions, we comprehensively evaluated model performance across gradients of complexity of the underlying relationships and violations of classical statistical assumptions. This study provides a valuable assessment and a broader example of the power and utility of controlled simulation experiments in habitat relationships and other ecological spatial predictive modeling.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"83 ","pages":"Article 102832"},"PeriodicalIF":5.8,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142323746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconstructing daytime and nighttime MODIS land surface temperature in desert areas using multi-channel singular spectrum analysis","authors":"Fahime Arabi Aliabad , Mohammad Zare , Hamidreza Ghafarian Malamiri , Amanehalsadat Pouriyeh , Himan Shahabi , Ebrahim Ghaderpour , Paolo Mazzanti","doi":"10.1016/j.ecoinf.2024.102830","DOIUrl":"10.1016/j.ecoinf.2024.102830","url":null,"abstract":"<div><div>The availability of continuous spatiotemporal land surface temperature (LST) with high resolution is critical for many disciplines including hydrology, meteorology, ecology, and geology. Like other remote sensing data, satellite–based LST is also encountered with the cloud issue. In this research, over 5000 daytime and nighttime MODIS–LST images are utilized during 2014–2020 for Yazd–Ardakan plain in Yazd, Iran. The multi–channel singular spectrum analysis (MSSA) model is employed to reconstruct missing values due to dusts, clouds, and sensor defect. The selection of eigenvalues is based on the Monte Carlo test and the spectral analysis of eigenvalues. It is found that enlarging the window size has no effect on the number of significant components of the signal which account for the most variance of the data. However, data variance changes for all the three components. Employing two images per day, window sizes 60, 180, 360, and 720 are examined for reconstructing one year LST, where these selections are based on monthly, seasonal, semi-annual, and annual LST cycles, respectively. The results show that window size 60 had the least computational cost and the highest accuracy with RMSE (root mean square error) of 2.6 °C for the entire study region and 1.4 °C for a selected pixel. The gap–filling performance of MSSA is also compared with the one by the harmonic analysis of time series (HANTS) model, showing the superiority of MSSA with an improved RMSE of about 2.7 °C for the study region. In addition, daytime and nighttime LST series for different land covers are compared. Lastly, the maximum, minimum, and average LST for each day and night as well as average and standard deviation of LST images in the seven-year-long time series are also computed.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"83 ","pages":"Article 102830"},"PeriodicalIF":5.8,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142323668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dezhi Wang , Zhenxiu Cao , Minghui Wu , Bo Wan , Sifeng Wu , Quanfa Zhang
{"title":"Articulating environmental sustainability dynamics with space-time cube","authors":"Dezhi Wang , Zhenxiu Cao , Minghui Wu , Bo Wan , Sifeng Wu , Quanfa Zhang","doi":"10.1016/j.ecoinf.2024.102833","DOIUrl":"10.1016/j.ecoinf.2024.102833","url":null,"abstract":"<div><div>Conceptually, environmental sustainability involves maintaining crucial environmental functions while considering both present and future development. However, existing methods for expressing environmental sustainability are mainly derived from a steady state with minimal spatial explicitness. Furthermore, the environmental impact of certain events may exhibit a lag, particularly in basins. Here, we propose a framework that employs a space-time cube to articulate environmental sustainability. This cube can visualize the environment's evolution over time, identify hot and cold spots in space, and concurrently determine underlying influencing factors via spatial regression analysis. Unlike traditional methods, the space-time cube incorporates not only spatial dimensions but also temporal dimensions. We applied this framework to China's upper Han River basin, using the Remote Sensing Ecological Index (RSEI) as an indicator of environmental sustainability. It enabled us to chart the basin's ecological trajectory with spatial and temporal explicitness from 1990 to 2020. Our findings reveal that climate change (represented by temperature and precipitation changes) and human activities (represented by nighttime light) were the main factors driving changes in environmental sustainability from 2000 to 2020 in the basin. Therefore, our proposed spatial-temporal integration framework proves to be an efficient tool in articulating environmental sustainability.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"83 ","pages":"Article 102833"},"PeriodicalIF":5.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142326679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jing Gao , Shenglong Zhao , Lucang Wang , Xiaoping Wang
{"title":"Spatiotemporal analysis of carbon emissions in the Yangtze River Delta Urban Agglomeration: Insights from nighttime light data (1992–2019)","authors":"Jing Gao , Shenglong Zhao , Lucang Wang , Xiaoping Wang","doi":"10.1016/j.ecoinf.2024.102831","DOIUrl":"10.1016/j.ecoinf.2024.102831","url":null,"abstract":"<div><div>Continuous evaluation and monitoring of long-term energy usage and carbon emissions are essential for developing, implementing, and assessing regional carbon reduction efforts. This study presents a spatiotemporal analysis of carbon emission trends in the Yangtze River Delta Urban Agglomeration (YRDUA) from 1992 to 2019. Researchers used nighttime light data from the Defense Meteorological Satellite Program's Operational Linescan System (DMSP/OLS) and the National Polar-orbiting Partnership's Visible Infrared Imaging Radiometer Suite (NPP/VIIRS) to assess the evolution of carbon emission patterns. Advanced spatial analysis methods, including geographic autocorrelation, geographical panel modeling, and spatial Markov chains, were applied to explore the spatial impacts, processes, and regional context of these trends. The results show: (1) Carbon emissions in the YRDUA increased by 262.56 %, with high-emission spheres and axial expansion. High-high emission clusters emerged in metropolitan areas, while low-low clusters formed in peripheral mountain regions. (2) Carbon emission types were stable (66.5 %), but 17.6 % showed higher emissions transitioning to lower, particularly in northeast Jiangsu. (3) The broader regional background had a stronger influence on the spatial impacts of carbon emissions than nearest neighbor effects, enhancing both outlier convergence and “club convergence” among similar regions. (4) Spatiotemporal patterns were shaped by the lock-in effect in low-carbon areas and spillover effects in high-carbon areas, with economic scale and industrial structure as key drivers. This study provides crucial insights for regional carbon reduction strategies in the YRDUA.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"83 ","pages":"Article 102831"},"PeriodicalIF":5.8,"publicationDate":"2024-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142358616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"River quality management: Integrating uncertainty, failure probability, and assimilation capacity","authors":"Mohsen Dehghani Darmian, Britta Schmalz","doi":"10.1016/j.ecoinf.2024.102829","DOIUrl":"10.1016/j.ecoinf.2024.102829","url":null,"abstract":"<div><div>Managing river water quality is challenging due to uncertainties in hydraulic and hydrologic parameters. This study integrates the symmetric exponential function (SEF) approach for solving the advection-dispersion equation with the Monte Carlo method in MATLAB. This combination allows us to explore the river's assimilation capacity and the failure probability (<span><math><msub><mi>P</mi><mi>f</mi></msub></math></span>) of maintaining desired water quality standards. Here, <span><math><msub><mi>P</mi><mi>f</mi></msub></math></span> represents the likelihood of pollutant concentrations exceeding acceptable limits under varying river conditions. A key contribution of this study is the introduction of a novel equation, developed using the Genetic Programming (GP) soft computing tool, to calculate assimilation capacity considering the failure probability of water quality provision. This equation provides a valuable tool for risk assessment in water resource management by quantifying pollutant assimilation dynamics. Its robustness is validated through high Coefficient of Determination (R<sup>2</sup>) and Overall Index (OI) values near 1, along with low Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The study identifies critical river characteristics, such as flow velocity and pollutant load, significantly influencing the reliability index (<span><math><mi>β</mi></math></span>). By outlining how adjustments in these parameters can achieve a target reliability index (<span><math><mi>β</mi><mo>=</mo><mn>4.526</mn></math></span>), our study offers a practical approach to safeguarding river ecosystems. For example, increasing flow velocity by 76 % can shift the river from a safe state (<span><math><msub><mi>P</mi><mi>f</mi></msub><mo>=</mo><mn>3</mn><mo>×</mo><msup><mn>10</mn><mrow><mo>−</mo><mn>5</mn></mrow></msup></math></span>) to a hazardous state (<span><math><msub><mi>P</mi><mi>f</mi></msub><mo>=</mo><mn>1</mn></math></span>), while a 44 % decrease in velocity allows for 57 % more pollutant assimilation. These findings highlight the importance of flow control as a cost-effective strategy for mitigating high pollutant concentrations and ensuring sustainable water quality management. By integrating numerical approaches with reliability sampling methods and soft computing techniques, this study enhances understanding of river system dynamics and supports informed decision-making for protecting water resources.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"83 ","pages":"Article 102829"},"PeriodicalIF":5.8,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142323669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dario Dematties , Samir Rajani , Rajesh Sankaran , Sean Shahkarami , Bhupendra Raut , Scott Collis , Pete Beckman , Nicola Ferrier
{"title":"Acoustic fingerprints in nature: A self-supervised learning approach for ecosystem activity monitoring","authors":"Dario Dematties , Samir Rajani , Rajesh Sankaran , Sean Shahkarami , Bhupendra Raut , Scott Collis , Pete Beckman , Nicola Ferrier","doi":"10.1016/j.ecoinf.2024.102823","DOIUrl":"10.1016/j.ecoinf.2024.102823","url":null,"abstract":"<div><div>According to the World Health Organization, <em>healthy communities rely on well-functioning ecosystems</em>. Clean air, fresh water, and nutritious food are inextricably linked to ecosystem health. Changes in biological activity convey important information about ecosystem dynamics, and understanding such changes is crucial for the survival of our species. Scientific edge cyberinfrastructures collect distributed data and process it in situ, often using machine learning algorithms. Most current machine learning algorithms deployed on edge cyberinfrastructures, however, are trained on data that does not accurately represent the real stream of data collected at the edge. In this work we explore the applicability of two new self-supervised learning algorithms for characterizing an insufficiently curated, imbalanced, and unlabeled dataset collected by using a set of nine microphones at different locations at the Morton Arboretum, an internationally recognized tree-focused botanical garden and research center in Lisle, IL. Our implementations showed completely autonomous characterization capabilities, such as the separation of spectrograms by recording location, month, week, and hour of the day. The models also showed the ability to discriminate spectrograms by biological and atmospheric activity, including rain, insects, and bird activity, in a completely unsupervised fashion. We validated our findings using a supervised deep learning approach and with a dataset labeled by experts, confirming competitive performance in several features. Toward explainability of our self-supervised learning approach, we used acoustic indices and false color spectrograms, showing that the topology and orientation of the clouds of points in the output space over a 24-h period are strongly linked to the unfolding of biological activity. Our findings show that self-supervised learning has the potential to learn from and process data collected at the edge, characterizing it with minimal human intervention. We believe that further research is crucial to extending this approach for complete autonomous characterization of raw data collected on distributed sensors at the edge.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"83 ","pages":"Article 102823"},"PeriodicalIF":5.8,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1574954124003650/pdfft?md5=879940a92e3b5b36fc5955d07c153779&pid=1-s2.0-S1574954124003650-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142312681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehdi Fasihi , Beatrice Portelli , Luca Cadez , Antonio Tomao , Alex Falcon , Giorgio Alberti , Giuseppe Serra
{"title":"Assessing ensemble models for carbon sequestration and storage estimation in forests using remote sensing data","authors":"Mehdi Fasihi , Beatrice Portelli , Luca Cadez , Antonio Tomao , Alex Falcon , Giorgio Alberti , Giuseppe Serra","doi":"10.1016/j.ecoinf.2024.102828","DOIUrl":"10.1016/j.ecoinf.2024.102828","url":null,"abstract":"<div><div>Forests play a crucial role in storing much of the world's carbon (C). Accurately estimating C sequestration is essential for addressing and mitigating the impacts of global warming. While many studies have used machine learning models to estimate carbon storage (CS) in forests based on remote sensing data, this research further examines C sequestration (i.e., the annual carbon uptake by trees; CSE). The objectives of this study are two-fold: firstly, to identify the best models for estimating CSE and CS by testing various methods, and secondly, to examine the effect of climatic data and the canopy height model (CHM) on the estimation of CSE. To achieve the first objective, we will compare the performance of fourteen models, including twelve machine learning models, one deep learning model, and an ensemble model that combines the top four independent models. For the second objective, we study the effect of four input configurations: the first is a baseline configuration based solely on attributes extracted from satellite images (Sentinel-2) and geomorphology; the second combines satellite features with climatic data; the third uses a CHM derived from LiDAR instead of climatic data; and the fourth combines all available features: satellite images, climatic data, and CHM. The results show that adding climatic data does not improve the estimation of CSE and CS. However, adding CHM features significantly improves the models' performance for both targets. The implemented ensemble model demonstrated the best performance across all configurations.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"83 ","pages":"Article 102828"},"PeriodicalIF":5.8,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1574954124003704/pdfft?md5=eb92d5fb2830af94093c7200733c38bc&pid=1-s2.0-S1574954124003704-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142315035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Johannes Hirn , Verónica Sanz , José Enrique García , Marta Goberna , Alicia Montesinos-Navarro , José Antonio Navarro-Cano , Ricardo Sánchez-Martín , Alfonso Valiente-Banuet , Miguel Verdú
{"title":"Transfer learning of species co-occurrence patterns between plant communities","authors":"Johannes Hirn , Verónica Sanz , José Enrique García , Marta Goberna , Alicia Montesinos-Navarro , José Antonio Navarro-Cano , Ricardo Sánchez-Martín , Alfonso Valiente-Banuet , Miguel Verdú","doi":"10.1016/j.ecoinf.2024.102826","DOIUrl":"10.1016/j.ecoinf.2024.102826","url":null,"abstract":"<div><h3>Aim</h3><div>The use of neural networks (NNs) is spreading to all areas of life, and Ecology is no exception. However, the data-hungry nature of NNs can leave out many small, valuable datasets. Here we show how to apply transfer learning to rescue small datasets that can be invaluable in understanding patterns of species co-occurrence.</div></div><div><h3>Location</h3><div>Semiarid plant communities in Spain and México.</div></div><div><h3>Time period</h3><div>2016–2022.</div></div><div><h3>Major taxa studied</h3><div>Angiosperms.</div></div><div><h3>Methods</h3><div>Based on a large sample of plant species co-occurrence in vegetation patches in a semi-arid area of eastern Spain, we fit a generative artificial intelligence (AI) model that correctly reproduces which species live with which in these patches. Subsequently, we train the same type of model on two communities for which we only have smaller datasets (another semi-arid community in eastern Spain, and a tropical community in Mexico).</div></div><div><h3>Results</h3><div>When we transfer the knowledge learnt from the large dataset directly to the other two, the predictions improve for the community more similar to our reference one. As for the more dissimilar community, improving the accuracy of the transfer requires a further tuning of the model to the local data. In particular, the knowledge transferred relates primarily to species frequency and, to a lesser extent, to their phylogenetic relationships, which are known to be determinants of species interaction patterns.</div></div><div><h3>Main conclusions</h3><div>This AI-based approach can be performed for communities similar or not so similar to the reference community, opening the door to systematic transfer learning for accurate predictions on small datasets. Interestingly, this transfer operates by matching unrelated species between the origin and target datasets, implying that arbitrary datasets can then be transferred to, or even combined in order to augment each other, irrespective of the species involved, potentially allowing such models to be applied to a wide range of plant communities in different climates.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"83 ","pages":"Article 102826"},"PeriodicalIF":5.8,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142323745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}