{"title":"Covirt:协同核的轻量级故障隔离和资源保护","authors":"Nicholas Gordon, J. Lange","doi":"10.1109/IPDPS49936.2021.00039","DOIUrl":null,"url":null,"abstract":"The challenges of the exascale era have generated a number of advancements in HPC systems software, with co-kernel architectures emerging as one such novel approach for HPC operating system and runtime (OS/R) design. Cokernels function by running multiple specialized, lightweight OS kernels natively on the same host as a general purpose OS/R. These specialized kernels are able to provide optimized OS/R environments for HPC applications while still retaining access to the full feature set of the co-running general purpose OS/R. While co-kernels are able to effectively optimize for performance, they generally lack effective mechanisms for cross OS/R fault isolation and resource protection. In this paper we present Covirt, a lightweight OS/R protection layer that leverages the hardware virtualization features found on modern CPUs. Covirt interposes a minimal hypervisor layer between a co-kernel OS/R and hardware to prevent OS level faults from impacting other OS/Rs running on the same system. Covirt is different from other virtualization-based approaches due to the level of integration necessary between the co-kernel instances, requiring the support of higher level semantic interfaces between the different OS/Rs. Covirt features a split architecture consisting of a hypervisor and controller module that continuously monitors changes to the underlying resource partitioning and translates those events to hypervisor configuration changes. We have implemented a prototype of Covirt in the context of the Hobbes exascale OS/R stack, specifically targeting the Pisces co-kernel framework and Kitten Lightweight Kernel. Our evaluation shows that Covirt is able to add fault isolation for memory and interrupt processing with minimal performance overheads.","PeriodicalId":372234,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"169 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Covirt: Lightweight Fault Isolation and Resource Protection for Co-Kernels\",\"authors\":\"Nicholas Gordon, J. Lange\",\"doi\":\"10.1109/IPDPS49936.2021.00039\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The challenges of the exascale era have generated a number of advancements in HPC systems software, with co-kernel architectures emerging as one such novel approach for HPC operating system and runtime (OS/R) design. Cokernels function by running multiple specialized, lightweight OS kernels natively on the same host as a general purpose OS/R. These specialized kernels are able to provide optimized OS/R environments for HPC applications while still retaining access to the full feature set of the co-running general purpose OS/R. While co-kernels are able to effectively optimize for performance, they generally lack effective mechanisms for cross OS/R fault isolation and resource protection. In this paper we present Covirt, a lightweight OS/R protection layer that leverages the hardware virtualization features found on modern CPUs. Covirt interposes a minimal hypervisor layer between a co-kernel OS/R and hardware to prevent OS level faults from impacting other OS/Rs running on the same system. Covirt is different from other virtualization-based approaches due to the level of integration necessary between the co-kernel instances, requiring the support of higher level semantic interfaces between the different OS/Rs. Covirt features a split architecture consisting of a hypervisor and controller module that continuously monitors changes to the underlying resource partitioning and translates those events to hypervisor configuration changes. We have implemented a prototype of Covirt in the context of the Hobbes exascale OS/R stack, specifically targeting the Pisces co-kernel framework and Kitten Lightweight Kernel. Our evaluation shows that Covirt is able to add fault isolation for memory and interrupt processing with minimal performance overheads.\",\"PeriodicalId\":372234,\"journal\":{\"name\":\"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"169 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS49936.2021.00039\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS49936.2021.00039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Covirt: Lightweight Fault Isolation and Resource Protection for Co-Kernels
The challenges of the exascale era have generated a number of advancements in HPC systems software, with co-kernel architectures emerging as one such novel approach for HPC operating system and runtime (OS/R) design. Cokernels function by running multiple specialized, lightweight OS kernels natively on the same host as a general purpose OS/R. These specialized kernels are able to provide optimized OS/R environments for HPC applications while still retaining access to the full feature set of the co-running general purpose OS/R. While co-kernels are able to effectively optimize for performance, they generally lack effective mechanisms for cross OS/R fault isolation and resource protection. In this paper we present Covirt, a lightweight OS/R protection layer that leverages the hardware virtualization features found on modern CPUs. Covirt interposes a minimal hypervisor layer between a co-kernel OS/R and hardware to prevent OS level faults from impacting other OS/Rs running on the same system. Covirt is different from other virtualization-based approaches due to the level of integration necessary between the co-kernel instances, requiring the support of higher level semantic interfaces between the different OS/Rs. Covirt features a split architecture consisting of a hypervisor and controller module that continuously monitors changes to the underlying resource partitioning and translates those events to hypervisor configuration changes. We have implemented a prototype of Covirt in the context of the Hobbes exascale OS/R stack, specifically targeting the Pisces co-kernel framework and Kitten Lightweight Kernel. Our evaluation shows that Covirt is able to add fault isolation for memory and interrupt processing with minimal performance overheads.