Ryan M. McGranaghan , Ellie Young , Cameron Powers , Swapnali Yadav , Edlira Vakaj
{"title":"The cultural-social nucleus of an open community: A multi-level community knowledge graph and NASA application","authors":"Ryan M. McGranaghan , Ellie Young , Cameron Powers , Swapnali Yadav , Edlira Vakaj","doi":"10.1016/j.acags.2023.100142","DOIUrl":"https://doi.org/10.1016/j.acags.2023.100142","url":null,"abstract":"<div><p>The challenges faced by science, engineering, and society are increasingly complex, requiring broad, cross-disciplinary teams to contribute to collective knowledge, cooperation, and sensemaking efforts. However, existing approaches to collaboration and knowledge sharing are largely manual, inadequate to meet the needs of teams that are not closely connected through personal ties or which lack the time to respond to dynamic requests for contextual information sharing. Nonetheless, in the current remote-first, complexity-driven, time-constrained workplace, such teams are both more common and more necessary. For example, the NASA Center for HelioAnalytics (CfHA) is a growing and cross-disciplinary community that is dedicated to aiding the application of emerging data science techniques and technologies, including AI/ML, to increase the speed, rigor, and depth of space physics scientific discovery. The members of that community possess innumerable skills and competencies and are involved in hundreds of projects, including proposals, committees, papers, presentations, conferences, groups, and missions. Traditional structures for information and knowledge representation do not permit the community to search and discover activities that are ongoing across the Center, nor to understand where skills and knowledge exist. The approaches that do exist are burdensome and result in inefficient use of resources, reinvention of solutions, and missed important connections. The challenge faced by the CfHA is a common one across modern groups and one that must be solved if we are to respond to the grand challenges that face our society, such as complex scientific phenomena, global pandemics and climate change. We present a solution to the problem: a community knowledge graph (KG) that aids an organization to better understand the resources (people, capabilities, affiliations, assets, content, data, models) available across its membership base, and thus supports a more cohesive community and more capable teams, enables robust and responsible application of new technologies, and provides the foundation for all members of the community to co-evolve the shared information space. We call this the Community Action and Understanding via Semantic Enrichment (CAUSE) ontology. We demonstrate the efficacy of KGs that can be instantiated from the ontology together with data from a given community (shown here for the CfHA). Finally, we discuss the implications, including the importance of the community KG for open science.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100142"},"PeriodicalIF":3.4,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000319/pdfft?md5=4019b0e03e4f84f5bfcd8583a36134a7&pid=1-s2.0-S2590197423000319-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92043917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The British Geological Survey Rock Classification Scheme, its representation as linked data, and a comparison with some other lithology vocabularies","authors":"Tim McCormick, Rachel E. Heaven","doi":"10.1016/j.acags.2023.100140","DOIUrl":"https://doi.org/10.1016/j.acags.2023.100140","url":null,"abstract":"<div><p>Controlled vocabularies are critical to constructing FAIR (findable, accessible, interoperable, re-useable) data. One of the most widely required, yet complex, vocabularies in earth science is for rock and sediment type, or ‘lithology’. Since 1999 the British Geological Survey has used its own Rock Classification Scheme in many of its workflows and products including the national digital geological map. This scheme pre-dates others that have been published, and is deeply embedded in BGS’ processes. By publishing this classification scheme now as a Simple Knowledge Organisation System (SKOS) machine-readable informal ontology, we make it available for ourselves and third parties to use in modern semantic applications, and we open the future possibility of using the tools SKOS provides to align our scheme with other published schemes. These include the IUGS-CGI Simple Lithology Scheme, the European Commission INSPIRE Lithology Code List, the Queensland Geological Survey Lithotype Scheme, the USGS Lithologic Classification of Geologic Map Units, and <span>Mindat.org</span><svg><path></path></svg>. The BGS lithology classification was initially based on four narrative reports that can be downloaded from the BGS website, although it has been added to subsequently. The classification is almost entirely mono-hierarchical in nature and includes 3454 currently valid concepts in a classification 11 levels deep. It includes igneous rocks and sediments, metamorphic rocks, sediments and sedimentary rocks, and superficial deposits including anthropogenic deposits. The SKOS informal ontology built on it is stored in a triplestore and the triples are updated nightly by extracting from a relational database where the ontology is maintained. Bulk downloads and version history are available on github. The RCS concepts themselves are used in other BGS linked data, namely the Lexicon of Named Rock Units and the linked data representation of the 1:625 000 scale geological map of the UK. Comparing the RCS with the other published lithology schemes, all are broadly similar but show characteristics that reveal the interests and requirements of the groups that developed them, in terms of their level of detail both overall and in constituent parts. It should be possible to align the RCS with the other classifications, and future work will focus on automated mechanisms to do this, and possibly on constructing a formal ontology for the RCS.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100140"},"PeriodicalIF":3.4,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49758675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
George Valakas, Matina Seferli, Konstantinos Modis
{"title":"Co-simulation of hydrofacies and piezometric data in the West Thessaly basin, Greece: A geostatistical application using the GeoSim R package","authors":"George Valakas, Matina Seferli, Konstantinos Modis","doi":"10.1016/j.acags.2023.100139","DOIUrl":"https://doi.org/10.1016/j.acags.2023.100139","url":null,"abstract":"<div><p>In the present study, we co-simulate hydrofacies and piezometric data in order to construct geostatistical realizations of underground geology in an area of the West Thessaly basin. This basin is of great importance in terms of sustainable water management and environmental perspective in Greece. Through Plurigaussian modeling, the hydrofacies are first transformed into Gaussian Random Fields. Then, a Linear Coregionalization Model is established to account for the dependencies between hydrofacies and the Normal scores of piezometric data. The effect of co-simulation shows an improvement of the facies transition probabilities in comparison with those of Plurigaussian simulation. For the purpose of this study, we use the GeoSim package in R developed by our team for the implementation of Plurigaussian simulation and co-simulation.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100139"},"PeriodicalIF":3.4,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49749114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camyla Innocente dos Santos , Tomas Carlotto , Leonardo Vilela Steiner , Pedro Luiz Borges Chaffe
{"title":"Development of the Synthetic Unit Hydrograph Tool – SUnHyT","authors":"Camyla Innocente dos Santos , Tomas Carlotto , Leonardo Vilela Steiner , Pedro Luiz Borges Chaffe","doi":"10.1016/j.acags.2023.100138","DOIUrl":"https://doi.org/10.1016/j.acags.2023.100138","url":null,"abstract":"<div><p>Unit hydrographs (UH) are widely used in scientific research and engineering projects to simulate rainfall-runoff processes. There are four main approaches for calculating UH: the traditional, the conceptual, the probabilistic, and the geomorphological approaches. Most software designed to facilitate the estimation of UH is usually based on only one UH approach, limiting its applicability for scientific hypotheses testing. This paper presents the Synthetic Unit Hydrograph Tool (SUnHyT), which provides nine different UH models from the four main approaches used in UH applications. SUnHyT is an open-source application that can be used intuitively through a graphical user interface. We tested the model in a case study that highlights the need for alternative approaches of UH when the traditional approach does not perform well. SUnHyT allows the estimation of design hydrographs in gauged and ungauged catchments and can be useful for hydrologists, water managers and decision-makers.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100138"},"PeriodicalIF":3.4,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49749115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel investigations of remote sensing and ground-truth Lake Chad's level data using statistical and machine learning methods","authors":"Kim-Ndor Djimadoumngar","doi":"10.1016/j.acags.2023.100135","DOIUrl":"10.1016/j.acags.2023.100135","url":null,"abstract":"<div><p>Lake Chad is facing critical situations since the 1960s due to the effects of climate change and anthropogenic activities. The statistical analyses of remote sensing climate variables (i.e., evapotranspiration, specific humidity, soil temperature, air temperature, precipitation, soil moisture) and remote sensing and ground-truth lake level applied to the period 1993–2012 reveal that remote sensing data has a skewed distribution; ground-truth data has a symmetrical distribution. Linear Regression (LR), Support Vector Regression (SVR), Regression Tree (RT), Random Forest Regression (RF), and Deep Learning (DL) methods show that (i) RF and LR, with the highest R<sup>2</sup> and EVS and least MAE, MSE, <span><math><mtext>RMSE</mtext></math></span> and, <span><math><mrow><msub><mtext>CV</mtext><mtext>MSE</mtext></msub></mrow></math></span> values seem the best models to further investigate remote sensing and ground-truth lake level data and (ii) the remote sensing data based models outperform the ground-truth data based models based on their <span><math><mtext>MAE</mtext></math></span>, <span><math><mtext>MSE</mtext></math></span>, <span><math><mtext>RMSE</mtext></math></span>, and <span><math><mrow><msub><mtext>CV</mtext><mtext>MSE</mtext></msub></mrow></math></span> values. The most useful variables to predict lake level are precipitation and air temperature. The data analysis methodology reported here is of fundamental importance for the perspectives of an integrated and forward-looking water management system for connecting climate change, vulnerability, and human activities in the Lake Chad human-environment system. Corroboration studies are needed when more ground-truth data eventually are obtainable.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100135"},"PeriodicalIF":3.4,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43420767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min Wen , Qinjun Qiu , Shiyu Zheng , Kai Ma , Shuai Zheng , Zhong Xie , Liufeng Tao
{"title":"Construction and application of a multilevel geohazard domain ontology: A case study of landslide geohazards","authors":"Min Wen , Qinjun Qiu , Shiyu Zheng , Kai Ma , Shuai Zheng , Zhong Xie , Liufeng Tao","doi":"10.1016/j.acags.2023.100134","DOIUrl":"10.1016/j.acags.2023.100134","url":null,"abstract":"<div><p>The occurrence of geohazards entails sudden, unpredictable, and cascading effects, with numerous conceptual frameworks and intricate spatiotemporal relationships existing between hazard events. Presently, the absence of a unified mechanism for describing and expressing geohazard knowledge poses substantial challenges in terms of sharing and reusing domain-specific knowledge pertaining to geohazards. Therefore, it is imperative to address the issue of constructing a cohesive descriptive model that facilitates the sharing and reuse of geohazard knowledge. In this study, we propose a multilayered ontology construction method tailored specifically for the domain of landslide geological hazards. By comparing existing methods, we establish a hierarchical structure and expression framework for the geological hazard ontology. Notably, our approach seamlessly integrates the conceptual and semantic layers in the relationship description at each level, enabling association representation of hazard data across multiple tiers. We define essential concepts and attributes related to landslide geological hazards, along with their respective interrelationships. To achieve effective knowledge sharing and reuse, we model the ontology of the landslide geological disaster domain using the Web Ontology Language (OWL). This modeling approach serves as a powerful tool that facilitates the sharing and reuse of disaster-related knowledge. Finally, we verify the method's validity and reliability by employing illustrative case studies. The results demonstrate that the proposed approach imposes an affordable workload on human resources. Additionally, the foundational domain ontology significantly enhances information retrieval performance, thereby yielding satisfactory outcomes.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100134"},"PeriodicalIF":3.4,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48064538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Geoweaver_cwl: Transforming geoweaver AI workflows to common workflow language to extend interoperability","authors":"Amruta Kale , Ziheng Sun , Chao Fan , Xiaogang Ma","doi":"10.1016/j.acags.2023.100126","DOIUrl":"10.1016/j.acags.2023.100126","url":null,"abstract":"<div><p>Recently, workflow management platforms are gaining more attention in the artificial intelligence (AI) community. Traditionally, researchers self-managed their workflows in a manual and tedious way that heavily relies on their memory. Due to the complexity and unpredictability of AI models, they often struggled to track and manage all the data, steps, and history of the workflow. AI workflows are time-consuming, redundant, and error-prone, especially when big data is involved. A common strategy to make these workflows more manageable is to use a workflow management system, and we recommend Geoweaver, an open-source workflow management system that enables users to create, modify and reuse AI workflows all in one place. To make our work in Geoweaver reusable by the other workflow management systems, we created an add-on functionality <strong><em>geoweaver_cwl</em></strong>, a Python package that automatically converts Geoweaver AI workflows into the Common Workflow Language (CWL) format. It will allow researchers to easily share, exchange, modify, reuse, and build a new workflow from existing ones in other CWL-compliant software. A user study was conducted with the existing workflows created by Geoweaver to collect suggestions and fill in the gaps between our package and Geoweaver. The evaluation confirms that <strong><em>geoweaver_cwl</em></strong> can lead to a well-versed AI process while disclosing opportunities for further extensions. The <strong><em>geoweaver_cwl</em></strong> package is publicly released online at <span>https://pypi.org/project/geoweaver-cwl/0.0.1/</span><svg><path></path></svg>.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"19 ","pages":"Article 100126"},"PeriodicalIF":3.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46784253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A practical approach for discriminating tectonic settings of basaltic rocks using machine learning","authors":"Kentaro Nakamura","doi":"10.1016/j.acags.2023.100132","DOIUrl":"10.1016/j.acags.2023.100132","url":null,"abstract":"<div><p>Elucidating the tectonic setting of unknown rock samples has long attracted the interest of not only igneous petrologists but also a wide range of geoscientists. Recently, attempts have been made to use machine learning to discriminate the tectonic setting of igneous rocks. However, few studies have designed methods that are applicable to altered rocks. This study proposes a novel approach that utilizes the ratio of elements less susceptible to weathering, alteration, and metamorphism as feature values for analyzing altered basalts. The method was evaluated using six well-established machine learning algorithms: K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest (RF), Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost), and Multi-Layer Perceptron (MLP). The results show that KNN achieved the highest classification score of 83.9% in the balanced accuracy of classifying the eight tectonic settings, closely followed by SVM with a score of 83.7%. In addition, oceanic and arc/continental basalts could also be discriminated against with an accuracy of more than ∼90% for KNN. This study suggested that the machine learning method can discriminate tectonic settings more accurately and reliably than previously used discrimination diagrams by designing appropriate feature values.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"19 ","pages":"Article 100132"},"PeriodicalIF":3.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46128134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GeoSim: An R-package for plurigaussian simulation and Co-simulation between categorical and continuous variables","authors":"George Valakas, Konstantinos Modis","doi":"10.1016/j.acags.2023.100130","DOIUrl":"https://doi.org/10.1016/j.acags.2023.100130","url":null,"abstract":"<div><p>Plurigaussian simulation is widely used to model geological facies in geosciences and is predominantly applied in mineral deposits and petroleum reservoirs exploration. GeoSim package builds geostatistical models of categorical regionalized variables via conditional or unconditional Plurigaussian simulation and co-simulation. Co-simulation between Gaussian Random Fields representing the geological facies and other numerical variables accounting for auxiliary hydrological or geophysical quantities, is also available in this package with the definition of a linear coregionalization model. The algorithm is not restricted by the number of simulated facies and the number of truncated Gaussians, while parts of the code requiring heavy computations are compiled in C++ taking benefits of the integration between R and C++. In this work, we introduce the GeoSim package and demonstrate its capabilities. We present a 3D application focused on a lignite mine in Greece, where we investigate the Plurigaussian simulation and co-simulation of five geological facies (categorical variables) and the lower calorific value (continuous variable). The findings of our study highlight the significant benefits of Plurigaussian and co-simulation to capture the geological spatial heterogeneity.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"19 ","pages":"Article 100130"},"PeriodicalIF":3.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49727523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D hydrostratigraphic and hydraulic conductivity modelling using supervised machine learning","authors":"Tewodros Tilahun , Jesse Korus","doi":"10.1016/j.acags.2023.100122","DOIUrl":"10.1016/j.acags.2023.100122","url":null,"abstract":"<div><p>Accurately modeling highly heterogenous aquifers is one of the big challenges in hydrogeology. There is a pressing need to develop new methods that transform high-resolution data into hydrogeological parameters representative of such aquifers. We use random forest-based machine learning to predict the distribution of hydrostratigraphic units and hydraulic conductivity (K) at a regional scale. We used lithologic logs from >2000 boreholes and resistivity-depth models from 2717 km of Airborne Electromagnetics (AEM). Eighty unique lithologic categories are lumped into 5 hydrostratigraphic units. K data is derived from descriptions of grain size and texture. The input data are resampled into a 200 × 200 × 1m grid and split into 70% training and 30% validation. K prediction had a training F1 score of 95% and 87% testing accuracy. After hyperparameter tuning these scores improved to 99.6% and 92%, respectively. Hydrostratigraphic unit prediction showed a training F1 score of 97% and 91% testing accuracy, improving to 100% and 95% after hyperparameter tuning. This method produces a high-resolution 3D model of K and hydrostratigraphic units that fills gaps between widely spaced boreholes. It is applicable in any setting where boreholes and AEM are available and can be used to build robust groundwater models for heterogeneous aquifers.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"19 ","pages":"Article 100122"},"PeriodicalIF":3.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49498133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}