{"title":"The Challenges of Big Data in Expanding Geoscience: Embracing New Initiatives to Untangle our World","authors":"Dène Tarkyth","doi":"10.12789/geocanj.2019.46.152","DOIUrl":null,"url":null,"abstract":"It was my pleasure to serve as the president of this organization through 2018 and part of 2019, and such an experience cannot help but remind me of the effort that comes from GAC staff and our many volunteers, but it also brought home the challenges that all of us face in organizing our time and activities in this so-called Information Age. We live in a world where both space and time are increasingly compressed, and all of us at times struggle to manage the demands of our work and our lives beyond the office walls. So I will start this address by asking you all to imagine that you had one extra day a week given to you some time that you could spend on fun science and investigating exciting questions, or just catching up on work and life. Would we not all welcome such a gift? But then look back over the last few weeks, months or even years and think about how much time you spent searching for information, skimming papers to finding sample locations, compiling and cleaning up data, georeferencing maps....just some of the many basic things that need to get done before you can get to the fun part of your job as a geoscientist. There are estimates that geologists now spend 80% of their time searching for, formatting and organizing information and data, and I do not find these hard to believe. A recent article highlighted the approach taken by Cameco, one of Canada’s leading mining companies, to change how they manage data in order to save 20% of their geologists’ time – one day a week – so that they would not have to spend countless hours looking for data and could do geology instead (Heffernan 2015). There are many efforts to amalgamate and process data in ways that make this process easier and more amenable to automation. A young student geologist at Princeton University, Julia Wilcots, undertook a summer project with a senior researcher at University of Wisconsin to examine the distribution of stromatolites through geological time by searching descriptive literature. Anyone who has worked in the Precambrian, or indeed in sedimentary rocks of any Eon or Era, can well imagine the immensity of that search. However, through the use of computer search techniques and the ‘Geodeepdive’ database, she was quickly able to identify over 10,000 papers that mentioned stromatolites (in the text, but not necessarily in the title) and extract the associated rock unit names from 10% of them. Then, by linking these results to the ‘Macrostat’ database, she was then able to come up with an estimate of the percentage of shallow marine rocks that contain stromatolites within different geological time periods. A more senior researcher at the University involved with the project estimated that doing this same search would have taken him sixteen months of tedium. The overall conclusions of the study – that the distribution of stromatolites is most closely linked to the abundance of dolomitic carbonate rocks (Peters et al. 2017) – are important, but the methodology demonstrates the ability of new techniques to unravel seemingly infinite tangles of data. What other questions could we address and what other problems could we solve as Earth Scientists if we were routinely able to query efficiently organized data with such rapidity? As a science, geology continues to evolve towards a bigger view from rocks alone, to facies, to entire sedimentary systems, to geodynamic environments, and to the Earth System as a whole. We increasingly recognize the interconnected nature of all geoscience data, and the need for a ‘Big Context’ to make sense of ‘Big Data’. This address seeks to emphasize the great potential of the data explosion that confronts us but sometimes confounds us, and also to specifically highlight some of the new and exciting tools and techniques that can help us exploit it. I seek to provide but a glimpse of an ever-expanding branch of our science, which will feature more and more in our professional lives in the 21 century.","PeriodicalId":55106,"journal":{"name":"Geoscience Canada","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2019-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoscience Canada","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.12789/geocanj.2019.46.152","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
It was my pleasure to serve as the president of this organization through 2018 and part of 2019, and such an experience cannot help but remind me of the effort that comes from GAC staff and our many volunteers, but it also brought home the challenges that all of us face in organizing our time and activities in this so-called Information Age. We live in a world where both space and time are increasingly compressed, and all of us at times struggle to manage the demands of our work and our lives beyond the office walls. So I will start this address by asking you all to imagine that you had one extra day a week given to you some time that you could spend on fun science and investigating exciting questions, or just catching up on work and life. Would we not all welcome such a gift? But then look back over the last few weeks, months or even years and think about how much time you spent searching for information, skimming papers to finding sample locations, compiling and cleaning up data, georeferencing maps....just some of the many basic things that need to get done before you can get to the fun part of your job as a geoscientist. There are estimates that geologists now spend 80% of their time searching for, formatting and organizing information and data, and I do not find these hard to believe. A recent article highlighted the approach taken by Cameco, one of Canada’s leading mining companies, to change how they manage data in order to save 20% of their geologists’ time – one day a week – so that they would not have to spend countless hours looking for data and could do geology instead (Heffernan 2015). There are many efforts to amalgamate and process data in ways that make this process easier and more amenable to automation. A young student geologist at Princeton University, Julia Wilcots, undertook a summer project with a senior researcher at University of Wisconsin to examine the distribution of stromatolites through geological time by searching descriptive literature. Anyone who has worked in the Precambrian, or indeed in sedimentary rocks of any Eon or Era, can well imagine the immensity of that search. However, through the use of computer search techniques and the ‘Geodeepdive’ database, she was quickly able to identify over 10,000 papers that mentioned stromatolites (in the text, but not necessarily in the title) and extract the associated rock unit names from 10% of them. Then, by linking these results to the ‘Macrostat’ database, she was then able to come up with an estimate of the percentage of shallow marine rocks that contain stromatolites within different geological time periods. A more senior researcher at the University involved with the project estimated that doing this same search would have taken him sixteen months of tedium. The overall conclusions of the study – that the distribution of stromatolites is most closely linked to the abundance of dolomitic carbonate rocks (Peters et al. 2017) – are important, but the methodology demonstrates the ability of new techniques to unravel seemingly infinite tangles of data. What other questions could we address and what other problems could we solve as Earth Scientists if we were routinely able to query efficiently organized data with such rapidity? As a science, geology continues to evolve towards a bigger view from rocks alone, to facies, to entire sedimentary systems, to geodynamic environments, and to the Earth System as a whole. We increasingly recognize the interconnected nature of all geoscience data, and the need for a ‘Big Context’ to make sense of ‘Big Data’. This address seeks to emphasize the great potential of the data explosion that confronts us but sometimes confounds us, and also to specifically highlight some of the new and exciting tools and techniques that can help us exploit it. I seek to provide but a glimpse of an ever-expanding branch of our science, which will feature more and more in our professional lives in the 21 century.
期刊介绍:
Established in 1974, Geoscience Canada is the main technical publication of the Geological Association of Canada (GAC). We are a quarterly journal that emphasizes diversity of material, and also the presentation of informative technical articles that can be understood not only by specialist research workers, but by non-specialists in other branches of the Earth Sciences. We aim to be a journal that you want to read, and which will leave you better informed, rather than more confused.