Dario Dematties , Samir Rajani , Rajesh Sankaran , Sean Shahkarami , Bhupendra Raut , Scott Collis , Pete Beckman , Nicola Ferrier
{"title":"Acoustic fingerprints in nature: A self-supervised learning approach for ecosystem activity monitoring","authors":"Dario Dematties , Samir Rajani , Rajesh Sankaran , Sean Shahkarami , Bhupendra Raut , Scott Collis , Pete Beckman , Nicola Ferrier","doi":"10.1016/j.ecoinf.2024.102823","DOIUrl":"10.1016/j.ecoinf.2024.102823","url":null,"abstract":"<div><div>According to the World Health Organization, <em>healthy communities rely on well-functioning ecosystems</em>. Clean air, fresh water, and nutritious food are inextricably linked to ecosystem health. Changes in biological activity convey important information about ecosystem dynamics, and understanding such changes is crucial for the survival of our species. Scientific edge cyberinfrastructures collect distributed data and process it in situ, often using machine learning algorithms. Most current machine learning algorithms deployed on edge cyberinfrastructures, however, are trained on data that does not accurately represent the real stream of data collected at the edge. In this work we explore the applicability of two new self-supervised learning algorithms for characterizing an insufficiently curated, imbalanced, and unlabeled dataset collected by using a set of nine microphones at different locations at the Morton Arboretum, an internationally recognized tree-focused botanical garden and research center in Lisle, IL. Our implementations showed completely autonomous characterization capabilities, such as the separation of spectrograms by recording location, month, week, and hour of the day. The models also showed the ability to discriminate spectrograms by biological and atmospheric activity, including rain, insects, and bird activity, in a completely unsupervised fashion. We validated our findings using a supervised deep learning approach and with a dataset labeled by experts, confirming competitive performance in several features. Toward explainability of our self-supervised learning approach, we used acoustic indices and false color spectrograms, showing that the topology and orientation of the clouds of points in the output space over a 24-h period are strongly linked to the unfolding of biological activity. Our findings show that self-supervised learning has the potential to learn from and process data collected at the edge, characterizing it with minimal human intervention. We believe that further research is crucial to extending this approach for complete autonomous characterization of raw data collected on distributed sensors at the edge.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1574954124003650/pdfft?md5=879940a92e3b5b36fc5955d07c153779&pid=1-s2.0-S1574954124003650-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142312681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehdi Fasihi , Beatrice Portelli , Luca Cadez , Antonio Tomao , Alex Falcon , Giorgio Alberti , Giuseppe Serra
{"title":"Assessing ensemble models for carbon sequestration and storage estimation in forests using remote sensing data","authors":"Mehdi Fasihi , Beatrice Portelli , Luca Cadez , Antonio Tomao , Alex Falcon , Giorgio Alberti , Giuseppe Serra","doi":"10.1016/j.ecoinf.2024.102828","DOIUrl":"10.1016/j.ecoinf.2024.102828","url":null,"abstract":"<div><div>Forests play a crucial role in storing much of the world's carbon (C). Accurately estimating C sequestration is essential for addressing and mitigating the impacts of global warming. While many studies have used machine learning models to estimate carbon storage (CS) in forests based on remote sensing data, this research further examines C sequestration (i.e., the annual carbon uptake by trees; CSE). The objectives of this study are two-fold: firstly, to identify the best models for estimating CSE and CS by testing various methods, and secondly, to examine the effect of climatic data and the canopy height model (CHM) on the estimation of CSE. To achieve the first objective, we will compare the performance of fourteen models, including twelve machine learning models, one deep learning model, and an ensemble model that combines the top four independent models. For the second objective, we study the effect of four input configurations: the first is a baseline configuration based solely on attributes extracted from satellite images (Sentinel-2) and geomorphology; the second combines satellite features with climatic data; the third uses a CHM derived from LiDAR instead of climatic data; and the fourth combines all available features: satellite images, climatic data, and CHM. The results show that adding climatic data does not improve the estimation of CSE and CS. However, adding CHM features significantly improves the models' performance for both targets. The implemented ensemble model demonstrated the best performance across all configurations.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1574954124003704/pdfft?md5=eb92d5fb2830af94093c7200733c38bc&pid=1-s2.0-S1574954124003704-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142315035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Johannes Hirn , Verónica Sanz , José Enrique García , Marta Goberna , Alicia Montesinos-Navarro , José Antonio Navarro-Cano , Ricardo Sánchez-Martín , Alfonso Valiente-Banuet , Miguel Verdú
{"title":"Transfer learning of species co-occurrence patterns between plant communities","authors":"Johannes Hirn , Verónica Sanz , José Enrique García , Marta Goberna , Alicia Montesinos-Navarro , José Antonio Navarro-Cano , Ricardo Sánchez-Martín , Alfonso Valiente-Banuet , Miguel Verdú","doi":"10.1016/j.ecoinf.2024.102826","DOIUrl":"10.1016/j.ecoinf.2024.102826","url":null,"abstract":"<div><h3>Aim</h3><div>The use of neural networks (NNs) is spreading to all areas of life, and Ecology is no exception. However, the data-hungry nature of NNs can leave out many small, valuable datasets. Here we show how to apply transfer learning to rescue small datasets that can be invaluable in understanding patterns of species co-occurrence.</div></div><div><h3>Location</h3><div>Semiarid plant communities in Spain and México.</div></div><div><h3>Time period</h3><div>2016–2022.</div></div><div><h3>Major taxa studied</h3><div>Angiosperms.</div></div><div><h3>Methods</h3><div>Based on a large sample of plant species co-occurrence in vegetation patches in a semi-arid area of eastern Spain, we fit a generative artificial intelligence (AI) model that correctly reproduces which species live with which in these patches. Subsequently, we train the same type of model on two communities for which we only have smaller datasets (another semi-arid community in eastern Spain, and a tropical community in Mexico).</div></div><div><h3>Results</h3><div>When we transfer the knowledge learnt from the large dataset directly to the other two, the predictions improve for the community more similar to our reference one. As for the more dissimilar community, improving the accuracy of the transfer requires a further tuning of the model to the local data. In particular, the knowledge transferred relates primarily to species frequency and, to a lesser extent, to their phylogenetic relationships, which are known to be determinants of species interaction patterns.</div></div><div><h3>Main conclusions</h3><div>This AI-based approach can be performed for communities similar or not so similar to the reference community, opening the door to systematic transfer learning for accurate predictions on small datasets. Interestingly, this transfer operates by matching unrelated species between the origin and target datasets, implying that arbitrary datasets can then be transferred to, or even combined in order to augment each other, irrespective of the species involved, potentially allowing such models to be applied to a wide range of plant communities in different climates.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142323745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stochastic network to model the global spreading of respiratory diseases: From SARS-CoV-2 to pathogen X pandemic","authors":"Leonardo López , Xavier Rodó","doi":"10.1016/j.ecoinf.2024.102827","DOIUrl":"10.1016/j.ecoinf.2024.102827","url":null,"abstract":"<div><div>The recent COVID-19 pandemic has underscored the vulnerability of global health systems. Emerging in November 2019 in Hubei, China, COVID-19 has had far-reaching consequences, affecting every corner of the globe. The impact has been particularly severe, causing widespread collapse of public health systems and contraction of the world economy. The imposition of stringent sanitary restrictions by the majority of countries, in response to SARS-CoV-2, disrupted various economic sectors on a massive scale. The existing gap between developed and underdeveloped countries further complicates the global scenario, raising uncertainties. This concern is amplified when considering the potential threat of other infectious diseases with dynamics akin to SARS-CoV-2, such as a new recombining H5N1 flu strain. Such a strain, if easily transmissible among humans, could lead to another pandemic. In this study, we introduce a stochastic network model designed to assess control strategies on a global scale. This model enables us to project how new variants, evading immunity, might respond to either a coordinated global response from governments or a complete lack of coordination. Our connectivity model between countries is based on a network of contacts derived from actual commercial air connectivity data. The disease dynamics within each country are simulated using a population-based approach with differential equations. The epidemiological model is fine-tuned using real SARS-CoV-2 data reported by various countries from 2019 to 2023.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142326678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Monica Dimson , Kyle C. Cavanaugh , Erica von Allmen , David A. Burney , Kapua Kawelo , Jane Beachy , Thomas W. Gillespie
{"title":"Monitoring native, non-native, and restored tropical dry forest with Landsat: A case study from the Hawaiian Islands","authors":"Monica Dimson , Kyle C. Cavanaugh , Erica von Allmen , David A. Burney , Kapua Kawelo , Jane Beachy , Thomas W. Gillespie","doi":"10.1016/j.ecoinf.2024.102821","DOIUrl":"10.1016/j.ecoinf.2024.102821","url":null,"abstract":"<div><p>Tropical dry forests are highly threatened at a global scale. Long-term monitoring of remaining stands is needed to assess forest health, efficacy of management practices, and potential impacts of climate change. Using a multi-seasonal Landsat time series, we examined Normalized Difference Vegetation Index (NDVI) patterns in native dry forest, non-native vegetation types, and dry forest restoration sites from 1999 to 2022 in the Hawaiian Islands. We calculated trends in median NDVI and robust coefficient of variation of NDVI for dry and wet seasons, and used Breaks for Additive Seasonal and Trend analysis to detect trend departures. To assess the impact of regional drying trends, NDVI trends were compared to the seasonal long-term precipitation anomaly and cumulative precipitation anomaly. We found that native dry forest was less green than non-native forest, particularly during the dry season, and that median NDVI increased in both native and non-native dry forests over the study period despite negative precipitation anomaly trends. This result differs from coarser-scale studies in Hawaii, but is supported by trends in other dry forest regions. Greening was also observed in restoration study sites, especially larger sites where native species establishment and recruitment has been reported. Non-native grassland NDVI exhibited a strong positive link to precipitation anomalies, suggesting that drier climate scenarios may exacerbate the invasive grass-wildfire cycle that threatens native dry forest. These results demonstrate that Landsat time series may be used to detect seasonal variation in dry forest plots and to support restoration site monitoring in a highly fragmented ecosystem.</p></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1574954124003637/pdfft?md5=27e562428781b1279ae61aeb6096c8bc&pid=1-s2.0-S1574954124003637-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142230331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yingxuan He , Wei Chen , Zhou Huang , Qingpeng Wang
{"title":"MoMFormer: Mixture of modality transformer model for vegetation extraction under shadow conditions","authors":"Yingxuan He , Wei Chen , Zhou Huang , Qingpeng Wang","doi":"10.1016/j.ecoinf.2024.102818","DOIUrl":"10.1016/j.ecoinf.2024.102818","url":null,"abstract":"<div><p>Accurate estimation of fractional vegetation coverage (FVC) is essential for assessing the ecological environment and acquiring ecological information. However, under natural lighting conditions, shadows in vegetation scenes can easily lead to confusion between shadowed vegetation and shadowed soil, leading to misclassification and omission errors. This issue limits the precision of both vegetation extraction and FVC estimation. To address this challenge, this study introduces a novel deep learning model, the Mixture of Modality Transformer (MoMFormer), which is specifically designed to mitigate shadow interference in vegetation extraction. Our model uses the Swin-transformer V2 as a feature extractor, effectively capturing vegetation features from a dual-modality (regular-exposure RGB and high dynamic range HDR) dataset. A dynamic aggregation module (DAM) is integrated to adaptively blend the most relevant vegetation features. We selected several state-of-the-art (SOTA) methods and conducted extensive experiments using a self-annotated dataset featuring diverse vegetation–soil scenes and compare our model with several state-of-the-art methods. The results demonstrate that MoMFormer achieves an accuracy of 89.43 % on the HDR-RGB dual-modality dataset, with an FVC accuracy of 87.57 %, outperforming other algorithms and demonstrating high vegetation extraction accuracy and adaptability under natural lighting conditions. This research offers new insights into accurate vegetation information extraction in naturally lit environments with shadows, providing robust technical support for high-precision validation of vegetation coverage products and algorithms based on multimodal data. The code and datasets used in this study are publicly available at <span><span>https://github.com/hhhxiaohe/MoMFormer</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1574954124003601/pdfft?md5=f86e3b9567567c1cac9fdc7b86af1f24&pid=1-s2.0-S1574954124003601-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142173880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dávid D.Kovács , Pablo Reyes-Muñoz , Katja Berger , Viktor Ixion Mészáros , Gabriel Caballero , Jochem Verrelst
{"title":"Multi-decadal temporal reconstruction of Sentinel-3 OLCI-based vegetation products with multi-output Gaussian process regression","authors":"Dávid D.Kovács , Pablo Reyes-Muñoz , Katja Berger , Viktor Ixion Mészáros , Gabriel Caballero , Jochem Verrelst","doi":"10.1016/j.ecoinf.2024.102816","DOIUrl":"10.1016/j.ecoinf.2024.102816","url":null,"abstract":"<div><p>Operational Earth observation missions, like the Sentinel-3 (S3) satellites, aim to provide imagery for long-term environmental assessment to monitor and analyze vegetation changes and dynamics. However, the S3 archive is limited in temporal availability to the year 2016. Although S3 provides continuity of previous missions, key vegetation products (VPs) including leaf area index (LAI), fraction of photosynthetically active radiation (FAPAR), fractional vegetation cover (FVC), and leaf chlorophyll content (LCC), can be reliably produced from Ocean and Land Colour Instrument (OLCI) data only since the sensors' launch. To overcome this limitation, our study proposes a reconstruction workflow that extends the data record beyond its data acquisition. By using multi-output Gaussian process regression (MOGPR) fusion, we explored guiding predictor VPs from the Moderate Resolution Imaging Spectroradiometer (MODIS) sensor for the reconstruction of multi-decadal (spanning two decades, 2002–2022) temporal profiles of four OLCI-derived VPs (S3-MOGPR), moving past S3's launch. We first evaluated three MODIS-derived inputs as predictor variables: LAI, FAPAR, and the Normalised Difference Vegetation Index (NDVI) over nine sites with distinct land covers from the Ground-Based Observations for Validation (GBOV) service. Each predictor produced a distinct time series for the four reconstructed S3 VPs. To determine which predictor variable most accurately reconstructs data streams of the targeted variable, all S3-MOGPR VPs were compared to satellite-based products from the Copernicus Global Land Service (CGLS). MOGPR models were trained for 2019 and compared to reference data. Since MODIS LAI demonstrated the best reconstruction performance of all predictors, S3-MOGPR VPs were fully reconstructed from 2022 back to 2002 using guiding MODIS LAI and evaluated with in-situ data. The most consistent reconstructed product was FVC (<span><math><mi>R</mi><mo>=</mo><mn>0.96</mn></math></span>, NRMSE = 0.17) over mixed forests compared to CGLS estimates. FVC also yielded the highest validation statistics (<span><math><mi>R</mi><mo>=</mo><mn>0.93</mn></math></span>, <span><math><mi>ρ</mi><mo>=</mo><mn>0.92</mn></math></span>, NRMSE = 0.14) over croplands. The highest correlation coefficients were achieved by the predictor variable LAI reconstructing FVC with mean <span><math><mi>R</mi></math></span>, <span><math><mi>ρ</mi></math></span> and NRMSE = 0.11 among all sites of 0.91 and 0.88, respectively. In the absence of both satellite and ground-based LCC reference measurements, the reconstructed LCC profiles were compared to the OLCI and MERIS Terrestrial Chlorophyll Index (OTCI, MTCI). The correlation metrics provided strong evidence of the reconstructed LCC product's integrity, with the highest correlation over deciduous broadleaf, mixed forests and croplands (<span><math><mi>R</mi><mo>></mo><mn>0.9</mn></math></span>). The lowest correlations for all reconstructe","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1574954124003583/pdfft?md5=e9b712c255026d945be9ad65c09438f4&pid=1-s2.0-S1574954124003583-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142242524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Whitney M. Woelmer , R. Quinn Thomas , Freya Olsson , Bethel G. Steele , Kathleen C. Weathers , Cayelan C. Carey
{"title":"Process-based forecasts of lake water temperature and dissolved oxygen outperform null models, with variability over time and depth","authors":"Whitney M. Woelmer , R. Quinn Thomas , Freya Olsson , Bethel G. Steele , Kathleen C. Weathers , Cayelan C. Carey","doi":"10.1016/j.ecoinf.2024.102825","DOIUrl":"10.1016/j.ecoinf.2024.102825","url":null,"abstract":"<div><p>Near-term iterative ecological forecasting has great potential for providing new insights into our ability to predict multiple ecological variables. However, true, out-of-sample probabilistic forecasts remain rare, and variability in forecast performance has largely been unexamined in process-based forecasts which predict multiple ecosystem variables. To explore how forecast performance varies for water temperature and dissolved oxygen, two freshwater variables important for lake ecosystem functioning, we produced probabilistic forecasts at multiple depths over two open-water seasons in Lake Sunapee, NH, USA. Our forecasting system, FLARE (Forecasting Lake And Reservoir Ecosystems), uses a 1-D coupled hydrodynamic-biogeochemical process model, which we assessed relative to both climatology and persistence null models to quantify how much information process-based FLARE forecasts provide over null models across varying environmental conditions. We found that FLARE water temperature forecasts were always more skillful than FLARE oxygen forecasts. Specifically, temperature forecasts outperformed both null models up to 11 days into the future, as compared to only two days for oxygen. Across different years, we observed variable forecast skill, with performance generally decreasing with depth for both variables. Overall, all temperature forecasts and surface oxygen, but not deep oxygen, forecasts were more skillful than at least one null model >80 % of the forecasted period, indicating that our process-based model was able to reproduce the dynamics of these two variables with greater reliability than the null models. However, process-based oxygen forecasts from deeper waters were less skillful than both null models during a majority of the forecasted period, which suggests that deep-water oxygen dynamics are dominated by autocorrelation and seasonal change, which are inherently captured by the null forecasts. Our results highlight that forecast performance varies among lake water quality metrics and that process-based forecasts can provide important information in conjunction with null models in varying environmental conditions. Altogether, these process-based forecasts can be used to develop quantitative tools which inform our understanding of future ecosystem change.</p></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1574954124003674/pdfft?md5=9a53fafcb216d3f908b82767ac100cd5&pid=1-s2.0-S1574954124003674-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142241916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Some limitations of the concordance correlation coefficient to characterise model accuracy","authors":"Alexandre M.J.-C. Wadoux , Budiman Minasny","doi":"10.1016/j.ecoinf.2024.102820","DOIUrl":"10.1016/j.ecoinf.2024.102820","url":null,"abstract":"<div><p>Perusal of the environmental modelling literature reveals that the Lin's concordance correlation coefficient is a popular validation statistic to characterise model or map quality. In this communication, we illustrate with synthetic examples three undesirable statistical properties of this coefficient. We argue that ignorance of these properties have led to a frequent misuse of this coefficient in modelling and mapping studies. The stand-alone use of the concordance correlation coefficient is insufficient because i) it does not inform on the relative contribution of bias and correlation, ii) the values cannot be compared across different datasets or studies and iii) it is prone to the same problems as other linear correlation statistics. The concordance coefficient was, in fact, thought initially for evaluating reproducibility studies over repeated trials of the same variable, not for characterising model accuracy. For the validation of models and maps, we recommend calculating statistics that, combined with the concordance correlation coefficient, represent various aspects of the model or map quality, which can be visualised together in a single figure with a Taylor or solar diagram.</p></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1574954124003625/pdfft?md5=598076128189827bbb1d60591fdbe37f&pid=1-s2.0-S1574954124003625-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142232943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonathan Donhauser , Anna Doménech-Pascual , Xingguo Han , Karen Jordaan , Jean-Baptiste Ramond , Aline Frossard , Anna M. Romaní , Anders Priemé
{"title":"Modelling soil prokaryotic traits across environments with the trait sequence database ampliconTraits and the R package MicEnvMod","authors":"Jonathan Donhauser , Anna Doménech-Pascual , Xingguo Han , Karen Jordaan , Jean-Baptiste Ramond , Aline Frossard , Anna M. Romaní , Anders Priemé","doi":"10.1016/j.ecoinf.2024.102817","DOIUrl":"10.1016/j.ecoinf.2024.102817","url":null,"abstract":"<div><p>We present a comprehensive, customizable workflow for inferring prokaryotic phenotypic traits from marker gene sequences and modelling the relationships between these traits and environmental factors, thus overcoming the limited ecological interpretability of marker gene sequencing data. We created the trait sequence database <em>ampliconTraits</em>, constructed by cross-mapping species from a phenotypic trait database to the SILVA sequence database and formatted to enable seamless classification of environmental sequences using the SINAPS algorithm. The R package <em>MicEnvMod</em> enables modelling of trait – environment relationships, combining the strengths of different model types and integrating an approach to evaluate the models' predictive performance in a single framework. Traits could be accurately predicted even for sequences with low sequence identity (80 %) with the reference sequences, indicating that our approach is suitable to classify a wide range of environmental sequences. Validating our approach in a large trans-continental soil dataset, we showed that trait distributions were robust to classification settings such as the bootstrap cutoff for classification and the number of discrete intervals for continuous traits. Using functions from <em>MicEnvMod,</em> we revealed precipitation seasonality and land cover as the most important predictors of genome size. We found Pearson correlation coefficients between observed and predicted values up to 0.70 using repeated split sampling cross validation, corroborating the predictive ability of our models beyond the training data. Predicting genome size across the Iberian Peninsula, we found the largest genomes in the northern part. Potential limitations of our trait inference approach include dependence on the phylogenetic conservation of traits and limited database coverage of environmental prokaryotes. Overall, our approach enables robust inference of ecologically interpretable traits combined with environmental modelling allowing to harness traits as bioindicators of soil ecosystem functioning.</p></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1574954124003595/pdfft?md5=a975351ee65c86e764ade9d9b4d869ae&pid=1-s2.0-S1574954124003595-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142173774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}