Data Science JournalPub Date : 2017-01-01Epub Date: 2017-04-04DOI: 10.5334/dsj-2017-015
Lalit Wanchoo, Nathan James, Hampapuram K Ramapriyan
{"title":"NASA EOSDIS Data Identifiers: Approach and System.","authors":"Lalit Wanchoo, Nathan James, Hampapuram K Ramapriyan","doi":"10.5334/dsj-2017-015","DOIUrl":"10.5334/dsj-2017-015","url":null,"abstract":"<p><p>NASA's Earth Science Data and Information System (ESDIS) Project began investigating the use of Digital Object Identifiers (DOIs) in 2010 with the goal of assigning DOIs to various data products. These Earth science research data products produced using Earth observations and models are archived and distributed by twelve Distributed Active Archive Centers (DAACs) located across the United States. Each data center serves a different Earth science discipline user community and, accordingly, has a unique approach and process for generating and archiving a variety of data products. These varied approaches present a challenge for developing a DOI solution. To address this challenge, the ESDIS Project has developed processes, guidelines, and several models for creating and assigning DOIs. Initially the DOI assignment and registration process was started as a prototype but now it is fully operational. In February 2012, the ESDIS Project started using the California Digital Library (CDL) EZID for registering DOIs. The DOI assignments were initially labor-intensive. The system is now automated, and the assignments are progressing rapidly. As of February 28, 2017, over 50% of the data products at the DAACs had been assigned DOIs. Citations using the DOIs increased from about 100 to over 370 between 2015 and 2016.</p>","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6839702/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43633157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wang Xuezhi, Zhao Jianghua, Zhou Yuanchun, Liao Jianhui
{"title":"The Geospatial Data Cloud: An Implementation of Applying Cloud Computing in Geosciences","authors":"Wang Xuezhi, Zhao Jianghua, Zhou Yuanchun, Liao Jianhui","doi":"10.2481/DSJ.14-042","DOIUrl":"https://doi.org/10.2481/DSJ.14-042","url":null,"abstract":"The rapid growth in the volume of remote sensing data and its increasing computational requirements bring huge challenges for researchers as traditional systems cannot adequately satisfy the huge demand for service. Cloud computing has the advantage of high scalability and reliability, which can provide firm technical support. This paper proposes a highly scalable geospatial cloud platform named the Geospatial Data Cloud, which is constructed based on cloud computing. The architecture of the platform is first introduced, and then two subsystems, the cloud-based data management platform and the cloud-based data processing platform, are described. ––– This paper was presented at the First Scientific Data Conference on Scientific Research, Big Data, and Data Science, organized by CODATA-China and held in Beijing on 24-25 February, 2014.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":"13 1","pages":"254-264"},"PeriodicalIF":0.0,"publicationDate":"2014-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69171009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Data-Driven Method for Selecting Optimal Models Based on Graphical Visualisation of Differences in Sequentially Fitted ROC Model Parameters","authors":"Kassim S. Mwitondi, R. Moustafa, A. Hadi","doi":"10.2481/dsj.WDS-045","DOIUrl":"https://doi.org/10.2481/dsj.WDS-045","url":null,"abstract":"Differences in modelling techniques and model performance assessments typically impinge on the quality of knowledge extraction from data. We propose an algorithm for determining optimal patterns in data by separately training and testing three decision tree models in the Pima Indians Diabetes and the Bupa Liver Disorders datasets. Model performance is assessed using ROC curves and the Youden Index. Moving differences between sequential fitted parameters are then extracted, and their respective probability density estimations are used to track their variability using an iterative graphical data visualisation technique developed for this purpose. Our results show that the proposed strategy separates the groups more robustly than the plain ROC/Youden approach, eliminates obscurity, and minimizes over-fitting. Further, the algorithm can easily be understood by non-specialists and demonstrates multi-disciplinary compliance.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":"12 1","pages":"WDS247-WDS253"},"PeriodicalIF":0.0,"publicationDate":"2013-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69171074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Harnessing Data Flow and Modelling Potentials for Sustainable Development","authors":"Kassim S. Mwitondi, J. Bugrien","doi":"10.2481/dsj.009-027","DOIUrl":"https://doi.org/10.2481/dsj.009-027","url":null,"abstract":"Tackling the global challenges relating to health, poverty, business, and the environment is heavily dependent on the flow and utilisation of data. However, while enhancements in data generation, storage, modelling, dissemination, and the related integration of global economies and societies are fast transforming the way we live and interact, the resulting dynamic, globalised, information society remains digitally divided. On the African continent in particular, this division has resulted in a gap between the knowledge generation and its transformation into tangible products and services. This paper proposes some fundamental approaches for a sustainable transformation of data into knowledge for the purpose of improving the people's quality of life. Its main strategy is based on a generic data sharing model providing access to data utilising and generating entities in a multi-disciplinary environment. It highlights the great potentials in using unsupervised and supervised modelling in tackling the typically predictive-in-nature challenges we face. Using both simulated and real data, the paper demonstrates how some of the key parameters may be generated and embedded in models to enhance their predictive power and reliability. The paper's conclusions include a proposed implementation framework setting the scene for the creation of decision support systems capable of addressing the key issues in society. It is expected that a sustainable data flow will forge synergies among the private sector, academic, and research institutions within and among countries. It is also expected that the paper's findings will help in the design and development of knowledge extraction from data in the wake of cloud computing and, hence, contribute towards the improvement in the people's overall quality of life. To avoid running high implementation costs, selected open source tools are recommended for developing and sustaining the system.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":"11 1","pages":"140-152"},"PeriodicalIF":0.0,"publicationDate":"2012-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69170777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Open Access to Digital Information: Opportunities and Challenges Identified During the Electronic Geophysical Year","authors":"W.K. (Bill) Peterson","doi":"10.2481/DSJ.IGY-002","DOIUrl":"https://doi.org/10.2481/DSJ.IGY-002","url":null,"abstract":"The vision of the Electronic Geophysical Year (eGY) is that we can achieve a major step forward in geoscience capability, knowledge, and usage throughout the world for the benefit of humanity by accelerating the adoption of modern and visionary practices such as virtual observatories for managing and sharing data and information. eGY has found that the biggest challenges to implementing the vision are educating program mangers and senior scientists on the need for modern data management techniques and providing incentives for practitioners of the new field of geoinformatics.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69171064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corrections made by the Editors","authors":"Chen-Yuan Liu, Jhen-Cheng Wang, S. Luo","doi":"10.2481/539","DOIUrl":"https://doi.org/10.2481/539","url":null,"abstract":"Wrong:Authors, Authors' affiliations, and Equations (4) and (5) Right:See the corrected PDF.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2007-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69170769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hu Lianglin, Hou Yufang, Liao Jianhui, Yin Ling, Shi Wenwen
{"title":"BrainBank Metadata Specification for the Human Brain Project and Neuroinformatics","authors":"Hu Lianglin, Hou Yufang, Liao Jianhui, Yin Ling, Shi Wenwen","doi":"10.2481/DSJ.6.S375","DOIUrl":"https://doi.org/10.2481/DSJ.6.S375","url":null,"abstract":"Many databases and platforms for human brain data have been established in China over the years, and metadata plays an important role in understanding and using them. The BrainBank Metadata Specification for the Human Brain Project and Neuroinformatics provides a structure for describing the context and content information of BrainBank databases and services. It includes six parts: identification, method, data schema, distribution of the database, metadata extension, and metadata reference The application of the BrainBank Metadata Specification will promote conservation and management of BrainBank databases and platforms. it will also greatly facilitate the retrieval, evaluation, acquisition, and application of the data.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":"6 1","pages":"375-378"},"PeriodicalIF":0.0,"publicationDate":"2007-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69171055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diane Smith Rumble, J. Rumble, Huadong Guo, Yun Xiao
{"title":"Correction made by the Authors","authors":"Diane Smith Rumble, J. Rumble, Huadong Guo, Yun Xiao","doi":"10.2481/538","DOIUrl":"https://doi.org/10.2481/538","url":null,"abstract":"Wrong:All papers have been reviewed and edited, but the reader should be clear that these contributions are papers contributed as conference proceedigs and not original research contributions. Right:All papers have been reviewed and edited, but the reader should be clear that these contributions are papers contributed as conference proceedings.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2007-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69170760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-data Mining for Understanding Leadership Behavior","authors":"N. Matsumura, Yoshihiro Sasaki","doi":"10.2481/dsj.6.S61","DOIUrl":"https://doi.org/10.2481/dsj.6.S61","url":null,"abstract":"We propose an approach for understanding leadership behavior in dot-jp, a non-profit organization, by analyzing heterogeneous multi-data composed of questionnaires and mailing list archives. Attitudes toward leaders were obtained from the questionnaires, and human networks were extracted from the mailing list archives. By integrating the results, we discovered that leaders must receive messages from other people as well as send messages to construct reliable relationships.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":"27 1","pages":"81-94"},"PeriodicalIF":0.0,"publicationDate":"2007-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74920114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discussion Visualization on a Bulletin Board System","authors":"W. Sunayama","doi":"10.2481/dsj.6.S51","DOIUrl":"https://doi.org/10.2481/dsj.6.S51","url":null,"abstract":"It is important for a collaborative community to decide its next action. The leader of a collaborative community must choose an action that increases rewards and reduces risks. When a leader cannot make this decision, action will be determined through community member discussion. However, this decision cannot be made in blind discussions, so systematic discussion is necessary to choose effective action in a limited time. In this paper, we propose a bulletin board system framework in which effective discussion is established through visualized discussion logs.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":"64 1","pages":"95-109"},"PeriodicalIF":0.0,"publicationDate":"2007-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89742435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}