{"title":"Predictive Disk Provisioning for Adjustable Cloud Storage Solutions","authors":"Xuerong Wan, S. Bohacek","doi":"10.1109/JCC59055.2023.00017","DOIUrl":"https://doi.org/10.1109/JCC59055.2023.00017","url":null,"abstract":"Cloud service providers such as AWS and Azure have recently begun to offer storage solutions that allow disk performance to be adjusted “on-the-fly”. Such offerings allow the user to make use of short-term predictions of storage requirements. For example, instead of provisioning a single storage solution that is never under-provisioned, but frequently over-provisioned, one can configure the storage system to support higher performance during peak times; and cheaper, lower performance during periods with less demand. This paper explores the possibility of using a prediction system that utilizes past storage demands to predict the storage requirements over the next hour. We sought a single predictor that could perform well for all types of demand. The predictors were developed using approximately 200 years of storage performance requirements collected from high-performance storage systems in hundreds of companies. We have found that over-provisioning can be greatly reduced, but only at the expense of under-provisioning with a non-zero probability. However, the probability of being under-provisioned can be as low as 0.01%, which is similar to the target service level of cloud vendors. In addition, we have developed novel methods to search for effective predictors that perform well both on average and for rare events.","PeriodicalId":117254,"journal":{"name":"2023 IEEE International Conference on Joint Cloud Computing (JCC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115613899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhilin Yang, Yu Tang, Linbo Qiao, Xi Yang, Zhen Huang
{"title":"OLM2: Automatic Optimal Strategy Generating for Large-Scale Model Training with Limited-Memory","authors":"Zhilin Yang, Yu Tang, Linbo Qiao, Xi Yang, Zhen Huang","doi":"10.1109/JCC59055.2023.00006","DOIUrl":"https://doi.org/10.1109/JCC59055.2023.00006","url":null,"abstract":"The scale of model parameters and the amount of training data is exponentially increasing. It requires more GPU memory with the exponential increasement of model parameters. Recomputation and swapping are two main memory optimization methods that have been extensively studied, and there are also optimization strategies that combine the two methods. However, most of them are based on heuristic search strategies, which do not explore the complete solution space and can’t guarantee the optimality of the solution results. An optimal search strategy with tensor-level recomputation and swapping is expected in large-scale model training. In this paper, we propose an optimal strategy searching algorithm combining tensor-based recomputation and swapping. Specifically, the memory swapping strategy is reformulated as an optimization problem, which converts the memory constraints into mixed integer programming, to find the optimal memory optimization strategy. By leveraging the advantages of both recomputation and swapping, this approach minimizes computation consumption without exceeding the available memory limitation. Experimental results show that our method exhibits about 60% reduction in memory requirements during the training process. Furthermore, our method can reduce the overall training time beyond the existing algorithms. Compared to Checkmate, our approach achieves about 0.3–0.9% reduction in computation cost per iteration.","PeriodicalId":117254,"journal":{"name":"2023 IEEE International Conference on Joint Cloud Computing (JCC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121953413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Parallel Memory Defect Detection Method based on Sparse-Value-Flow Graph","authors":"Rulin Xu, Xiaoguang Mao, Luohui Chen, Yue Yu","doi":"10.1109/JCC59055.2023.00014","DOIUrl":"https://doi.org/10.1109/JCC59055.2023.00014","url":null,"abstract":"Memory vulnerability detection aims to identify software defects that can compromise memory safety. However, existing methods often struggle to achieve both high precision and efficiency. This paper presents a high-precision memory vulnerability detection approach based on value flow analysis and parallel computing. We first construct a static semantic representation called SVFG to enable precise detection of memory vulnerabilities such as null pointer dereference and use-after-free. We then perform dependency-aware path feasibility analysis using an SMT solver to reduce false positives. Finally, we develop a task-level parallel framework to accelerate the constraint solving process and improve efficiency.We evaluate our approach on the Juliet test set of over 2,000 test cases and 7 open-source projects. Experimental results show that our dependency-aware analysis can achieve 0.5%-2.05% false positive rates, outperforming traditional approaches and existing tools. Our task-level parallel framework can achieve up to 3.25x speedup with 4 computing nodes.Our study demonstrates that combining value flow analysis and parallel computing is a promising way to enable highly precise and efficient detection of memory vulnerabilities. For future work, we plan to integrate pointer analysis to support more complex codes, and optimize the granularity of parallelism to improve scalability. Overall, this paper presents a static analysis based method to address the inherent trade-off between precision and efficiency in memory vulnerability detection.","PeriodicalId":117254,"journal":{"name":"2023 IEEE International Conference on Joint Cloud Computing (JCC)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114108816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Secure and Efficient Runtime Environment for Smart Contracts on JointCloud","authors":"Yuhao Xue, Dong Du, Lei Zhang, Yubin Xia","doi":"10.1109/JCC59055.2023.00020","DOIUrl":"https://doi.org/10.1109/JCC59055.2023.00020","url":null,"abstract":"Many cloud providers, including Amazon, Google, Microsoft, and Alibaba Cloud, offer support for blockchain cloud services that rely on a runtime environment, such as the Ethereum Virtual Machine (EVM), to execute smart contracts and ensure consistency between participants. However, existing runtime systems suffer from two main limitations. Firstly, traditional runtime systems like EVM cannot guarantee privacy protection as all the data uploaded to the blockchain is visible to all participants. This restricts the use of blockchain in limited scenarios. Secondly, each computation on the runtime system must be synchronized to all nodes in the network, resulting in a significant increase in computational overhead, which can be challenging to implement for more complex applications. One approach to address these limitations is to utilize Trusted Execution Environments (TEE) for blockchain runtime, which can provide privacy protection and mitigate redundant synchronization operations. However, using TEE for blockchain may significantly increase cloud costs. To overcome these challenges, this paper proposes PL-EVM, a new runtime environment for smart contracts that utilizes jointcloud. PL-EVM achieves high-security guarantees by using TEE to protect privacy-sensitive data and incorporates dynamic migration and splitting mechanisms to achieve high efficiency and low costs. Our evaluation results show that PL-EVM can improve performance and reduce costs by 4% to 32.22%.","PeriodicalId":117254,"journal":{"name":"2023 IEEE International Conference on Joint Cloud Computing (JCC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122032054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HyCU: Hybrid Consistent Update for Software Defined Network","authors":"Xudong Mou, Jie Sun, Yingying Zhong, Tianyu Wo","doi":"10.1109/JCC59055.2023.00018","DOIUrl":"https://doi.org/10.1109/JCC59055.2023.00018","url":null,"abstract":"Software Defined Network (SDN) enables network operators to achieve the customization of network services, which tends to be more dynamic and fine-grained. However, the distributed nature of rule updating in SDN brings consistency problems, i.e., packets travel according to different versions of rules. It leads to the issues of blackholes, loops, congestion, and deadlock in the data plane, which may further affect the service quality of the application plane. With the emergence of new computing paradigms such as edge computing and fog computing, the heterogeneity of network devices and links, as well as the diversity of network application requirements have become increasingly prominent. Traditional update methods ignore these key factors when modeling, so they cannot cope with the increasingly complex network environment, resulting in delays or packet loss rates that do not meet service requirements. This paper proposes HyCU, which takes device performance as a constraint and optimizes updates based on flow service requirements. We conduct experiments under different scenarios and constraints over two real-world topologies with real-time running flows, demonstrating the effectiveness of HyCU.","PeriodicalId":117254,"journal":{"name":"2023 IEEE International Conference on Joint Cloud Computing (JCC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114739283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An empirical study on the structure evolution of deep learning models: taking SAR image processing a case study","authors":"Huanxi Liu, Xiang He, Dawei Feng, Han Bao","doi":"10.1109/JCC59055.2023.00008","DOIUrl":"https://doi.org/10.1109/JCC59055.2023.00008","url":null,"abstract":"With the continuous improvement on model performance, deep learning models have been widely deployed and achieved promising outcomes in various fields in recent years. However, due to the escalating volumes of training data and the complexity of application problems, it becomes more and more challenging to design a neural network with better performance by hand. Analysing the evolution of typical neural network structures has important reference significance for designing a network structure. In this paper, we select the open source models in SAR image processing for an empirical analysis on the evolution of neural network structures. We analyse the evolution of 239 open source deep learning models from the aspects of framework, computing unit, model computation amount and the combined use of various computing units. Results reveal that preference and co-occurrence exist in computing units, while the average number of convolution, activation and normalization layer increases significantly over time. Model complexity shows an overall upward trend, and the characteristics of SAR image are more and more taken into consideration during the model structure design.","PeriodicalId":117254,"journal":{"name":"2023 IEEE International Conference on Joint Cloud Computing (JCC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124972089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scout: An Efficient Federated Learning Client Selection Algorithm Driven by Heterogeneous Data and Resource","authors":"Ruilin Zhang, Zhenan Xu, Hao Yin","doi":"10.1109/JCC59055.2023.00012","DOIUrl":"https://doi.org/10.1109/JCC59055.2023.00012","url":null,"abstract":"Federated Learning is a novel distributed machine learning paradigm that leverages the computing power of numerous decentralized data sources for jointly training machine learning models while ensuring user privacy. In the most commonly used cross-device scenarios, the client cluster typically cover a vast number of heterogeneous end devices. Due to physical limitations such as bandwidth, only a few clients can participate in each round of training. The core issue of the client selection is to determine an appropriate client set for each training round. However, existing selection algorithms, especially the widely adopted random selection, suffer from a number of issues that prevent them from achieving a good balance between training efficiency and speed. Therefore, we propose Scout, which utilizes the heterogeneity features of clients’ data and resources to jointly model the utility function, and enhances the utilization of correlation among clients and the diversity among selected clients to achieve better training efficiency and speed. Furthermore, Scout maintains the scalability and fairness. Our experiments demonstrate that in large-scale heterogeneous clients scenarios, Scout outperforms three baseline algorithms and the state-of-the-art dual-feature dimension algorithm Oort in evaluation metrics.","PeriodicalId":117254,"journal":{"name":"2023 IEEE International Conference on Joint Cloud Computing (JCC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122873386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bi-level Multi-Agent Actor-Critic Methods with ransformers","authors":"Tianjiao Wan, Haibo Mi, Zijian Gao, Yuanzhao Zhai, Bo Ding, Dawei Feng","doi":"10.1109/JCC59055.2023.00007","DOIUrl":"https://doi.org/10.1109/JCC59055.2023.00007","url":null,"abstract":"Recently, deep multi-agent reinforcement learning methods have witnessed great progress, including multi-agent actor-critic methods. However, it’s worth noticing there is a performance gap between multi-agent actor-critic methods and state-of-the-art value-based methods. In this paper, we investigate the causes and attribute inferior performance to issues of contribution-mismatch and indiscriminate guidance. To overcome these problems, we introduce a novel bi-level multi-agent actorcritic reinforcement learning approach with transformers, called BMT. Specifically, we propose a simple but efficient bi-level optimization mechanism to learn both global critic and agentspecific critic, thus jointly guiding the policy update. In addition, we adopt the transformer-based model as the policy network to decouple complicated relationships and generate flexible policy. BMT is also general enough to be plugged into any actor-critic multi-agent reinforcement learning approach, such as MAPPO, and equips it with strong expression. On multiple benchmarks including multi-agent particle environments and a challenging set of StarCraft II micromanagement tasks, large-scale empirical experiments demonstrate that BMT-based multi-agent reinforcement learning methods achieve superior performance over both state-of-the-art actor-critic and value-based approaches.","PeriodicalId":117254,"journal":{"name":"2023 IEEE International Conference on Joint Cloud Computing (JCC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115903717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Filtering Alerts on Cloud Monitoring Systems","authors":"Fotios Voutsas, John Violos, Aris Leivadeas","doi":"10.1109/JCC59055.2023.00010","DOIUrl":"https://doi.org/10.1109/JCC59055.2023.00010","url":null,"abstract":"Recent advances in cloud computing and data centers have increased the demands for monitoring the network infrastructure and the applications that it hosts. The monitoring processes let network administrators to be aware of the status of the physical and logical units that compose their system. Since the goal of next generation networks is to minimise the administrators’ intervention, the alerting systems should minimize the frequency of notifications, emphasizing on critical scenarios such as when a monitoring metric surpasses a threshold or an anomalous behaviour is detected. However, current monitoring tools flood network administrators with hundreds of notifications every day. In this paper, we propose a binary classification approach, in order to decide if the administrators should be notified through monitoring alerts or not. To do so, our framework is build upon real monitoring logs and alerts, that show how the administrators reacted when receiving an alert. Extensive simulation results assess the performance of various classification approaches and reveal that random forests are great candidates for the binary classification alerting system that we propose, in terms of classification efficiency and computational overhead.","PeriodicalId":117254,"journal":{"name":"2023 IEEE International Conference on Joint Cloud Computing (JCC)","volume":"326 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115460175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}