Gabriel Bathie, L. Marchal, Y. Robert, Samuel Thibault
{"title":"Dynamic DAG Scheduling Under Memory Constraints for Shared-Memory Platforms","authors":"Gabriel Bathie, L. Marchal, Y. Robert, Samuel Thibault","doi":"10.15803/IJNC.11.1_27","DOIUrl":"https://doi.org/10.15803/IJNC.11.1_27","url":null,"abstract":"This work focuses on dynamic DAG scheduling under memory constraints. We target a shared-memory platform equipped with $p$ parallel processors. The goal is to bound the maximum amount of memory that may be needed by any schedule using p processors to execute the DAG. We refine the classical model that computes maximum cuts by introducing two types of memory edges in the DAG, black edges for regular precedence constraints and red edges for actual memory consumption during execution. A valid edge cut cannot include more than $p$ red edges. This limitation had never been taken into account in previous works, and dramatically changes the complexity of the problem, which was polynomial and becomes NP-hard. We introduce an Integer Linear Program (ILP) to solve it, together with an efficient heuristic based on rounding the rational solution of the ILP. In addition, we propose an exact polynomial algorithm for series-parallel graphs. We further study the extension of the approach where the scheduler is dynamically constrained to select tasks (among ready tasks) so that the total memory used does not exceed some threshold. We provide an extensive set of experiments, both with randomly-generated graphs and with graphs arising from practical applications, which demonstrate the impact of resource constraints on peak memory usage.","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133422179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Benoit, Valentin Le Fèvre, P. Raghavan, Y. Robert, Hongyang Sun
{"title":"Resilient Scheduling Heuristics for Rigid Parallel Jobs","authors":"A. Benoit, Valentin Le Fèvre, P. Raghavan, Y. Robert, Hongyang Sun","doi":"10.15803/IJNC.11.1_2","DOIUrl":"https://doi.org/10.15803/IJNC.11.1_2","url":null,"abstract":"This paper focuses on the resilient scheduling of parallel jobs on high-performance computing (HPC) platforms to minimize the overall completion time, or the makespan. We revisit the classical problem while assuming that jobs are subject to failures caused by transient or silent errors, and hence may need to be re-executed each time they fail to complete successfully. This work generalizes the classical framework where jobs are known offline and do not fail: in this framework, list scheduling that gives priority to the longest jobs is known to be a 3-approximation when imposing to use shelves, and a 2-approximation without this restriction. We show that when jobs can fail, using shelves can be arbitrarily bad, but unrestricted list scheduling remains a 2-approximation. The paper focuses on the design of several heuristics, some list-based and some shelf-based, along with different priority rules and backfilling strategies. We assess and compare their performance through an extensive set of simulations using both synthetic jobs and log traces from the Mira supercomputer.","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130458502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Preface: Special Issue on Workshop on Advances in Parallel and Distributed Computational Models 2020","authors":"Susumu Matsumae, M. Shibata","doi":"10.15803/IJNC.11.1_1","DOIUrl":"https://doi.org/10.15803/IJNC.11.1_1","url":null,"abstract":"The 22nd Workshop on Advances in Parallel and Distributed Computational Models (APDCM), which was held in conjunction with the International Parallel and Distributed Processing Symposium (IPDPS) on May 18 - May 22, 2020, aims to provide a timely forum for the exchange and dissemination of new ideas, techniques and research in the field of the parallel and distributed computational models. The APDCM workshop has a history of attracting participation from reputed researchers worldwide. The program committee has encouraged the authors of accepted papers to submit full-versions of their manuscripts to the International Journal of Networking and Computing (IJNC) after the workshop. After a thorough reviewing process, with extensive discussions, four articles on various topics have been selected for publication on the IJNC special issue on APDCM. On behalf of the APDCM workshop, we would like to express our appreciation for the large efforts of reviewers who reviewed papers submitted to the special issue. Likewise, we thank all the authors for submitting their excellent manuscripts to this special issue. We also express our sincere thanks to the editorial board of the International Journal of Networking and Computing, in particular, to the Editor-in-chief Professor Koji Nakano. This special issue would not have been possible without his support.","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131251072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessment of NVSHMEM for High Performance Computing","authors":"C. Hsu, N. Imam","doi":"10.15803/IJNC.11.1_78","DOIUrl":"https://doi.org/10.15803/IJNC.11.1_78","url":null,"abstract":"High Performance Computing has been a driving force behind important tasks such as scientific discovery and deep learning. It tends to achieve performance through greater concurrency and heterogeneity, where the underlying complexity of richer topologies is managed through software abstraction. In this paper, we present our assessment of NVSHMEM, an experimental programming library that supports the Partitioned Global Address Space programming model for NVIDIA GPU clusters. NVSHMEM offers several concrete advantages. One is that it reduces overheads and software complexity by allowing communication and computation to be interleaved vs. separating them into different phases. Another is that it implements the OpenSHMEM specification to provide efficient fine-grained one-sided communication, streamlining away overheads due to tag matching, wildcards, and unexpected messages which have compounding effect with increasing concurrency. It also offers ease of use by abstracting away low-level configuration operations that are required to enable low-overhead communication and direct loads and stores across processes. We evaluated NVSHMEM in terms of usability, functionality, and scalability by running two math kernels, matrix multiplication and Jacobi solver, and one full application, Horovod, on the 27,648-GPU Summit supercomputer. Our exercise of NVSHMEM at scale contributed to making NVSHMEM more robust and preparing it for production release.","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134641165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuki Hirayama, T. Asai, M. Motomura, Shinya Takamaeda-Yamazaki
{"title":"A Hardware-efficient Weight Sampling Circuit for Bayesian Neural Networks","authors":"Yuki Hirayama, T. Asai, M. Motomura, Shinya Takamaeda-Yamazaki","doi":"10.15803/ijnc.10.2_84","DOIUrl":"https://doi.org/10.15803/ijnc.10.2_84","url":null,"abstract":"The main problems of deep learning are requiring a large amount of data for learning, and prediction with excessive confidence. A Bayesian neural network (BNN), in which a Bayesian approach is incorporated into a neural network (NN), has drawn attention as a method for solving these problems. In a BNN, the probability distribution is assumed for the weight, in contrast to a conventional NN, in which the weight is point estimated. This makes it possible to obtain the prediction as a distribution and to evaluate how uncertain the prediction is. However, a BNN has more computational complexity and a greater number of parameters than an NN. To obtain an inference result as a distribution, a BNN uses weight sampling to generate the respective weight values, and thus, a BNN accelerator requires weight sampling hardware based on a random number generator in addition to the standard components of a deep learning neural network accelerator. Therefore, the throughput of weight sampling must be sufficiently high at a low hardware resource cost. We propose a resource-efficient weight sampling method using inversion transform sampling and a lookup-table (LUT)-based function approximation for hardware implementation of a BNN. Inversion transform sampling simplifies the mechanism of generating a Gaussian random number from a uniform random number provided by a common random number generator, such as a linear feedback shift register. Employing an LUT-based low-bit precision function approximation enables inversion transform sampling to be implemented at a low hardware cost. The evaluation results indicate that this approach effectively reduces the occupied hardware resources while maintaining accuracy and prediction variance equivalent to that with a non-approximated sampling method.","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116839377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"P systems with branch and bound for solving two hard graph problems","authors":"Kotaro Umetsu, A. Fujiwara","doi":"10.15803/ijnc.10.2_159","DOIUrl":"https://doi.org/10.15803/ijnc.10.2_159","url":null,"abstract":"Membrane computing is a computational model based on activity of cells. Using the membrane computing, a number of computationally hard problems have been solved in a polynomial number of steps using an exponential number of membranes. However, the number of membranes denotes the number of cells from practical point of view, and the reduction of the number of membranes must be considered for using the membrane computing in real world. In this paper, we propose asynchronous P systems with branch and bound for reducing the number of membranes for two computationally hard graph problems. We first propose an asynchronous P system that solves Hamiltonian cycle problem for a graph with n vertices, and show that the proposed P system works in O(n^2) parallel steps. We next propose an asynchronous P system that solves the minimum graph coloring for a graph with n vertices, and also show that the P system works in O(n^2) parallel steps. In addition, we evaluate validity of the proposed P systems using computational simulations. The experimental results show the validity and efficiency of the proposed P systems with branch and bound.","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131035436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taito Manabe, Koki Tomonaga, Koki Fujita, Yuichiro Shibata, Taiichiro Kosaka, T. Adachi
{"title":"CNN Architecture for Surgical Image Segmentation with Recursive Structure and Flip-Based Upsampling","authors":"Taito Manabe, Koki Tomonaga, Koki Fujita, Yuichiro Shibata, Taiichiro Kosaka, T. Adachi","doi":"10.15803/ijnc.10.2_259","DOIUrl":"https://doi.org/10.15803/ijnc.10.2_259","url":null,"abstract":"Laparoscopic surgery, a less invasive camera-aided surgery, is now performed commonly. However, it requires a camera assistant who holds and maneuvers a laparoscope. By controlling the laparoscope automatically using a robot, a surgeon can perform the operation without a camera assistant, which would be beneficial in areas suffering from lack of surgeons. In this paper, a prototype image segmentation architecture based on a convolutional neural network (CNN) is proposed to realize an automated laparoscope control for cholecystectomy. Since a training dataset is annotated manually by a few surgeons, its scale is limited compared to common CNN-based systems. Therefore, we built a recursive network structure, with some sub-networks which are used multiple times, to mitigate overfitting. In addition, instead of the common transposed convolution, the flip-based subpixel reconstruction is introduced into upsampling layers. Furthermore, we applied stochastic depth regularization to the recursive structure for better accuracy. Evaluation results revealed that these improvements bring better classification accuracy without increasing the number of parameters. The system shows a throughput sufficient for real-time laparoscope robot control with a single NVIDIA GeForce GTX 1080 GPU.","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126761342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keiji Yoshimoto, Yoshinori Uetake, Yuta Kodera, Takuya Kusaka, Y. Nogami
{"title":"Evaluating Side-Channel Resistance Using Low Order Rational Points Against Curve25519 and an Associated Quadratic Twist","authors":"Keiji Yoshimoto, Yoshinori Uetake, Yuta Kodera, Takuya Kusaka, Y. Nogami","doi":"10.15803/ijnc.10.2_144","DOIUrl":"https://doi.org/10.15803/ijnc.10.2_144","url":null,"abstract":"IoT devices contribute to improving the mechanism of a system as edge devices for data sharing and automation of industrials. However, such devices are often being a target of an attacker due to their simple architecture and the lack of resources so as to protect data confidentiality using cryptosystems. In addition, although Curve25519 has been used in various security protocols and known to work even on IoT devices efficiently, the curve inherits the low order points hidden inside of the Edward curves. In this paper, the authors demonstrate side-channel attacks against Curve25519 by focusing on the points of order 4 and 8. We choose the order 4 point which does not exist on Curve25519, that exists on the twisted curve of Curve25519. More precisely, the rational point used in this paper is given by (x,y)=(-1,0) in affine coordinates. In addition, the order 8 point appears to be a high order rational point. The results reveal that the rational points might be a threat to key extraction and it demands us to find further countermeasures.","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122083141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective Energy Restoration of Wireless Sensor Networks by a Mobile Robot","authors":"P. Flocchini, Eman Omar, N. Santoro","doi":"10.15803/ijnc.10.2_62","DOIUrl":"https://doi.org/10.15803/ijnc.10.2_62","url":null,"abstract":"As most existing sensors are powered by batteries, the coverage provided by a sensor network degrades over time and eventually disappears if energy is not restored. A popular approach to energy restoration is to use a robot acting as a mobile battery charger/changer. The robot decides where to move next according to a predefined on-line energy restoration strategy. The effectiveness of such a strategy depends on the number of nodes it is able to maintain operational at any given time, as well as on for how long a node whose battery is depleted remains non-operational. The ideal optimal on-line strategy (called OPTIMAL) occurs when the robot knows at any time the current status of all sensors, and it computes the best request to satisfy next, based on this information. Although optimal in terms of effectiveness, this centralized strategy would constantly require up-to-date global information; hence its high computational and communication costs make it not feasible. We consider a drastically different on-line strategy (called LIC), which is simple and fully decentralized, uses only local communication, requires no computations, and is highly scalable. In our strategy, the robot visits the sensors in a predefined circular order, moving in a \"clockwise\" direction and only when aware of a pending request. A sensor whose battery is about to become depleted originates a recharging request and waits for the robot; the request is forwarded according to the circular order in a \"counter-clockwise\" direction until it reaches either the robot or another sensor waiting for the robot. We show the perhaps unexpected result that, once the system becomes stable, in most networks the effectiveness of LIC is equivalent to that of OPTIMAL. In other words, in most cases, in spite of its simplicity and its extremely small (communication and computation) costs, the proposed decentralized strategy is as effective as the optimal centralized one. We augment our theoretical results with experimental analysis, confirming all the analytical results and showing among other things that the system stabilizes very quickly.","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116614181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Sequential Detection Method for Intrusion Detection System Based on Artificial Neural Networks","authors":"Zhao Hao, Yaokai Feng, Hiroshi Koide, K. Sakurai","doi":"10.15803/ijnc.10.2_213","DOIUrl":"https://doi.org/10.15803/ijnc.10.2_213","url":null,"abstract":"With rapidly increasing cyber attacks, network security has become an important issue. To protect ourselves against cyber attacks, the Intrusion Detection System (IDS) has been introduced. In such systems, different kinds of machine learning algorithms play a more and more important role, such as support vector machine(SVM), artificial neural network(ANN), etc. False positive rate and false negative rate, in addition to accuracy, are widely used for the evaluation of IDSs. These indices, however, are often related to each other, which makes it is difficult for us to improve all the indices at the same time. For example, when we try to make the false negative rate decrease to prevent from missing attacks, more normal communications tend to be classified into attacks and the false positive rate may increase, and vice versa. In this study, we propose an ANN based sequential classifier method to mitigate this problem. We design each subclassifier with a low false positive rate, which may lead to high false negative rate. To decrease the false negative rate, the reported negative instances from the former subclassifier are sent to the next one to further check (reclassification). In this way, it can be expected that the false negative rate can also reach an acceptable level. The results of our experiment shows that our proposed method can bring lower false negative rate and higher accuracy, in the mean time the false positive rate is kept at an acceptable level. We also investigated the effect of the number of subclassifiers on detection performance and found that the detection system performed best when using four subclassifiers.","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114642305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}