Adrian Jose Sabado, Geoffrey A. Solano, M. Nicolas, R. Batista-Navarro, Roselyn Gabud, V. Hilomen
{"title":"Modelling the Coverage of Dipterocarp Trees in Central Visayas, Philippines","authors":"Adrian Jose Sabado, Geoffrey A. Solano, M. Nicolas, R. Batista-Navarro, Roselyn Gabud, V. Hilomen","doi":"10.1109/eScience.2017.91","DOIUrl":"https://doi.org/10.1109/eScience.2017.91","url":null,"abstract":"With the rapid decline of dipterocarp coverage in the Philippines, efforts in restoration would benefit from having a suitability map that would indicate the suitable areas where different dipterocarp species could thrive. Another use of these maps would be for assessing whether particular locations should be considered as Protected Areas in which exploitation would be limited, if not completely prohibited. We obtained data from the Department of Environment and Natural Resources of the Philippines, which consists of dipterocarp-related information gathered over localities in the country's Central Visayas region. Six climate data parameters were then chosen and obtained from the Philippine Atmospheric Geophysical and Astronomical Services Administration. We employed a maximum entropy-based niche modelling approach to generate a suitability map. The model that we produced obtained an area under the curve (AUC) score of 0.955 with minimum temperature being the variable with the highest contribution. The map produced indicates that dipterocarp trees are more likely to appear in the lower regions of Central Visayas.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131347316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating Exact Protein Structure Alignment with Graphics Processors","authors":"Yishui Wu, Shuang Qiu, Qiong Luo","doi":"10.1109/eScience.2017.17","DOIUrl":"https://doi.org/10.1109/eScience.2017.17","url":null,"abstract":"Among different structural alignment tools, DALIX is one capable of calculating an optimal structural alignment based on the DALI score in most cases. It outperforms DALI, one of the most popular structural alignment algorithms, on the alignment quality. However, the high time complexity of DALIX hinders its application to large protein or complex structure alignments. In this paper, we parallelize the major steps of DALIX on the GPU (Graphics Processing Units) to speed up its processing. Specifically, to better utilize the massive GPU thread parallelism, we design a two-level parallel algorithm for the dynamic programming, which is the most time-consuming component in the tool. We compact the decision table in the dynamic programming so that it can fit into the shared memory for inter-thread communication to further improve the performance. Results show that our GPU-DALIX achieves a speedup ranging from 5.5x to 20x, over the sequential version of DALIX on a set of real-world protein alignments. Especially, our GPU-DALIX provides significant performance improvement when the protein size is large or the structure is complex.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116762492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Small File I/O Performance for Massive Digital Archives","authors":"Hwajung Kim, H. Yeom","doi":"10.1109/eScience.2017.39","DOIUrl":"https://doi.org/10.1109/eScience.2017.39","url":null,"abstract":"With the growth of online services, a large amount of files have been generated by users or by the service itself. To make it easier to service users with different network environments and devices, online services usually keep different versions of the same file with various sizes. For users with high speed network and top of the line displays, a large size file with high precision can be supplied while users with mobile devices typically receive a smaller file with less precision. In some cases, a large file can be divided into small files to make it easier to transmit over the wide area networks. As a result, underlying filesystem should efficiently maintain a large number of small files. Providing such a huge number of files to applications is one of new challenges of existing filesystems. In this paper, we propose techniques to efficiently manage a large number of files in digital archives using data characteristics and access patterns of the application. Based on the knowledge we have of the upper layer applications, we have modified both in-memory and on-disk inode structure of the existing filesystem and were able to dramatically reduce the number of storage I/O operations to service the same files. Our experimental results show that the proposed methods significantly reduce the number of storage I/O operations both for reading and writing files, especially for small-sized ones. Moreover, we demonstrated that proposed techniques reduce the application-level latency as well as improve file operation throughput, using several synthetic- and microbenchmarks.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124479970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Geoscience Cyberinfrastructure in the Cloud: Data-Proximate Computing to Address Big Data and Open Science Challenges","authors":"M. Ramamurthy","doi":"10.1109/eScience.2017.63","DOIUrl":"https://doi.org/10.1109/eScience.2017.63","url":null,"abstract":"Data are not only the lifeblood of the geosciences but they have become the currency of the modern world both in science and in society. Rapid advances in computing, communications, and observational technologies – along with concomitant advances in high-resolution modeling, ensemble and coupled-systems predictions of the Earth system – are revolutionizing nearly every aspect of the geosciences. Modern data volumes from high-resolution ensemble prediction systems and next-generation remote-sensing systems like hyper-spectral satellite sensors and phased-array radars are staggering. The advent and maturity of cloud computing technologies and tools have opened new avenues for addressing both big data and Open Science challenges to accelerate scientific discoveries. There is broad consensus that as data volumes grow rapidly, it is particularly important to reduce data movement and bring processing and computations to the data. Data providers also need to give scientists an ecosystem that includes data, tools, workflows and other end-to-end applications and services needed to perform analysis, integration, interpretation, and synthesis - all in the same environment or platform. Instead of moving data to processing systems near users, as is the tradition, one will need to bring processing, computing, analysis and visualization to data - so called data proximate workbench capabilities, also known as server-side processing.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121290230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhengchun Liu, R. Kettimuthu, S. Leyffer, Prashant Palkar, Ian T Foster
{"title":"A Mathematical Programming- and Simulation-Based Framework to Evaluate Cyberinfrastructure Design Choices","authors":"Zhengchun Liu, R. Kettimuthu, S. Leyffer, Prashant Palkar, Ian T Foster","doi":"10.1109/eScience.2017.27","DOIUrl":"https://doi.org/10.1109/eScience.2017.27","url":null,"abstract":"Modern scientific experimental facilities such as x-ray light sources increasingly require on-demand access to large-scale computing for data analysis, for example to detect experimental errors or to select the next experiment. As the number of such facilities, the number of instruments at each facility, and the scale of computational demands all grow, the question arises as to how to meet these demands most efficiently and cost-effectively. A single computer per instrument is unlikely to be cost-effective because of low utilization and high operating costs. A single national compute facility, on the other hand, introduces a single point of failure and perhaps excessive communication costs. We introduce here methods for evaluating these and other potential design points, such as per-facility computer systems and a distributed multisite \"superfacility.\" We use the U.S. Department of Energy light sources as a use case and build a mixed-integer programming model and a customizable superfacility simulator to enable joint optimization of design choices and associated operational decisions. The methodology and tools provide new insights into design choices for on-demand computing facilities for real-time analysis of scientific experiment data. The simulator can also be used to support facility operations, for example by simulating the impact of events such as outages.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122710537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andres Garcia-Silva, Raúl Palma, José Manuél Gómez-Pérez
{"title":"Ensuring the Quality of Research Objects in the Earth Science Domain","authors":"Andres Garcia-Silva, Raúl Palma, José Manuél Gómez-Pérez","doi":"10.1109/eScience.2017.62","DOIUrl":"https://doi.org/10.1109/eScience.2017.62","url":null,"abstract":"Research objects were designed in data-intensive science under the premises of interoperability and machine-readability to describe scientific processes and findings including all the resources that were used in the research endeavour. In this poster we present our work with Earth Science communities, that have embraced the research object model for long-term preservation and reuse of knowledge, to design checklists, the main tool available to assess the quality of research objects and upon which quality metrics such completeness, stability and reliability are calculated.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122992269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lawrence Buckingham, Timothy Chappell, J. Hogan, S. Geva
{"title":"Similarity Projection: A Geometric Measure for Comparison of Biological Sequences","authors":"Lawrence Buckingham, Timothy Chappell, J. Hogan, S. Geva","doi":"10.1109/eScience.2017.46","DOIUrl":"https://doi.org/10.1109/eScience.2017.46","url":null,"abstract":"Sequence comparison is a fundamental task in computational biology, traditionally dominated by alignment-based methods such as the Smith-Waterman and Needleman-Wunsch algorithms, or by alignment based heuristics such as BLAST, the ubiquitous Basic Local Alignment Search Tool. For more than a decade researchers have examined a range of alignment-free alternatives to these approaches, citing concerns over scalability in the era of Next Generation Sequencing, the emergence of petascale sequence archives, and a lack of robustness of alignment methods in the face of structural sequence rearrangements. While some of these approaches have proven successful for particular tasks, many continue to exhibit a marked decline in sensitivity as closely related sequence sets diverge. Avoiding the alignment step allows the methods to scale to the challenges of modern sequence collections, but only at the cost of noticeably inferior search. In this paper we re-examine the problem of similarity measures for alignment-free sequence comparison, and introduce a new method which we term Similarity Projection. Similarity Projection offers markedly enhanced sensitivity – comparable to alignment based methods – while retaining the scalability characteristic of alignment-free approaches. As before, we rely on collections of k-mers; overlapping substrings of the molecular sequence of length k, collected without reference to position, but similarity relies on variants of the Hausdorff set distance, allowing similarity to be scored more effectively to the reflect those components which match, while lessening the impact of those which do not. Formally, the algorithm generates a large mutual similarity matrix between sequence pairs based on their component fragments; successive reduction steps yield a final score over the sequences. However, only a small fraction of these underlying comparisons need be performed, and by use of an approximate scheme based on vector quantization, we are able to achieve an order of magnitude improvement in execution time over the naive approach. We evaluate the approach on two large protein collections obtained from UniProtKB, showing that Similarity Projection achieves accuracy rivalling, and at times clearly exceeding, that of BLAST, while exhibiting markedly superior execution speed.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128383461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Mobile Application for Plant Recognition through Deep Learning","authors":"Min Gao, Yang Lin, R. Sinnott","doi":"10.1109/eScience.2017.15","DOIUrl":"https://doi.org/10.1109/eScience.2017.15","url":null,"abstract":"It is a simple task for humans to visually identify objects. However, computer-based image recognition remains challenging. In this paper we describe an approach for image recognition with specific focus on automated recognition of plants and flowers. The approach taken utilizes deep learning capabilities and unlike other approaches that focus on static images for feature classification, we utilize video data that compensates for the information that would otherwise be lost when comparing a static image with many others images of plants and flowers. We describe the steps taken in data collection, data cleaning and data purification, and the deep learning algorithms that were subsequently applied. We describe the mobile (iOS) application that was designed and finally we present the overall results that show that in the work undertaken thus far, the approach is able to identify 122/125 plants and 47/50 genera selected with degrees of confidence up to 95%. We also describe the performance speed up through the use of Cloud-based resources.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114814523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"In Pursuit of the Wisest: Building Cost-Effective Teams of Experts","authors":"Y. Najaflou, K. Bubendorfer","doi":"10.1109/eScience.2017.28","DOIUrl":"https://doi.org/10.1109/eScience.2017.28","url":null,"abstract":"Scientific collaboration networks are social networks in which vertices represent scientists and edges typically represent co-authorship. Such networks not only permit research into understanding the characteristics of scientific collaboration, but can also provide a basis for building collaborative research platforms to support research groups with functionality such as, information sharing, data repositories, attribution and communication. Collaboration networks are highly clustered, mapping closely to the real world relationships of individual researchers. However, just as eScience and big data constitute a well recognised disruptive change to the way basic research is carried out in many research fields, there is an equivalent and largely unexplored change in the collaborative relationships between researchers - which are becoming not only larger in scale, but also more distributed and interdisciplinary. One element in this, which we suggest will play a pivotal role in the future, is the formation of teams for large eScience and big data projects. This paper presents an innovative algorithm for expert team formation called Chemistry Oriented Team Formation (ChemoTF) based on two new metrics; Chemistry Level and Expertise Level. Chemistry Level measures scale of communication required by the task, while Expertise Level measures the overall expertise among potential teams filtered by Chemistry Level. This approach is tested using a large expertise corpus containing 472,365 individual authors. The ChemoTF algorithm is able to build teams for median average 90% of the expected cost, achieving 99% fit while remaining tractable for teams up to 16 individuals - resulting in the formation of more communicative and cost effective teams with higher expertise level.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115622361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards a Fully Automated Diagnostic System for Orthodontic Treatment in Dentistry","authors":"S. Murata, Chonho Lee, C. Tanikawa, S. Date","doi":"10.1109/eScience.2017.12","DOIUrl":"https://doi.org/10.1109/eScience.2017.12","url":null,"abstract":"A deep learning technique has emerged as a successful approach for diagnostic imaging. Along with the increasing demands for dental healthcare, the automation of diagnostic imaging is increasingly desired in the field of orthodontics for many reasons (e.g., remote assessment, cost reduction, etc.). However, orthodontic diagnoses generally require dental and medical scientists to diagnose a patient from a comprehensive perspective, by looking at the mouth and face from different angles and assessing various features. This assessment process takes a great deal of time even for a single patient, and tends to generate variation in the diagnosis among dental and medical scientists. In this paper, the authors propose a deep learning model to automate diagnostic imaging, which provides an objective morphological assessment of facial features for orthodontic treatment. The automated diagnostic imaging system dramatically reduces the time needed for the assessment process. It also helps provide objective diagnosis that is important for dental and medical scientists as well as their patients because the diagnosis directly affects to the treatment plan, treatment priorities, and even insurance coverage. The proposed deep learning model outperforms a conventional convolutional neural network model in its assessment accuracy. Additionally, the authors present a work-in-progress development of a data science platform with a secure data staging mechanism, which supports computation for training our proposed deep learning model. The platform is expected to allow users (e.g., dental and medical scientists) to securely share data and flexibly conduct their data analytics by running advanced machine learning algorithms (e.g., deep learning) on high performance computing resources (e.g., a GPU cluster).","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129467816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}