{"title":"Configuration and operation of networked control systems over heterogeneous WSANs","authors":"P. Furtado, J. Cecílio","doi":"10.1145/2536747.2536756","DOIUrl":"https://doi.org/10.1145/2536747.2536756","url":null,"abstract":"There have been both research and commercial advances on applying Wireless Sensor and Actuator Networks (WSN) in industrial premises. These have cost advantages related to avoiding some cabled deployments. A possible architecture involves a Networked Control System (NCS) with many small WSN subnetworks, cabled nodes and computer servers (e.g., servers, control stations). In those systems individual sensor nodes can be programmed, as opposed to cabled analog systems. We investigate approaches for networked-wide configuration, where all nodes—cabled or WSN sensors—can be configured with simplicity from a single interface, instead of hand-coding or complex configurations of individual nodes. We propose an architecture and approach for configuration and operation. Previous related proposals on middleware involving WSNs suffer from two major limitations: they either program within an individual WSN or configure operation outside WSNs, wrapping data coming from WSN. They do not allow configuring WSN and non-WSN nodes for operation from a single interface. We discuss the architecture and propose the NCSWSN configuration and operation approach. We are applying this system in an industrial testbed, therefore we test the approach and also show user interfaces and results from the deployment.","PeriodicalId":183677,"journal":{"name":"ACM Trans. Embed. Comput. Syst.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130467049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zai Jian Jia, T. Bautista, A. Núñez, A. Pimentel, M. Thompson
{"title":"A system-level infrastructure for multidimensional MP-SoC design space co-exploration","authors":"Zai Jian Jia, T. Bautista, A. Núñez, A. Pimentel, M. Thompson","doi":"10.1145/2536747.2536749","DOIUrl":"https://doi.org/10.1145/2536747.2536749","url":null,"abstract":"In this article, we present a flexible and extensible system-level MP-SoC design space exploration (DSE) infrastructure, called NASA. This highly modular framework uses well-defined interfaces to easily integrate different system-level simulation tools as well as different combinations of search strategies in a simple plug-and-play fashion. Moreover, NASA deploys a so-called dimension-oriented DSE approach, allowing designers to configure the appropriate number of, well-tuned and possibly different, search algorithms to simultaneously co-explore the various design space dimensions. As a result, NASA provides a flexible and re-usable framework for the systematic exploration of the multidimensional MP-SoC design space, starting from a set of relatively simple user specifications. To demonstrate the capabilities of the NASA framework and to illustrate its distinct aspects, we also present several DSE experiments in which, for example, we compare NASA configurations using a single search algorithm for all design space dimensions to configurations using a separate search algorithm per dimension. These proof-of-concept experiments indicate that the latter multidimensional co-exploration can find better design points and evaluates a higher diversity of design alternatives as compared to the more traditional approach of using a single search algorithm for all dimensions.","PeriodicalId":183677,"journal":{"name":"ACM Trans. Embed. Comput. Syst.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126621934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Concepción Sanz, J. I. Gómez, C. Tenllado, M. Prieto, F. Catthoor
{"title":"System-level memory management based on statistical variability compensation for frame-based applications","authors":"Concepción Sanz, J. I. Gómez, C. Tenllado, M. Prieto, F. Catthoor","doi":"10.1145/2536747.2536757","DOIUrl":"https://doi.org/10.1145/2536747.2536757","url":null,"abstract":"Process variability and dynamic domains increase the uncertainty of embedded systems and force designers to apply pessimistic designs, which become unnecessarily conservative and have a tremendous impact on both performance and energy consumption. In this context, developing uncertainty-aware design methodologies that take both variation at platform and at application level into account becomes a must. These methodologies should mitigate the effects derived from uncertainty, avoiding worst-case assumptions. In this article we propose a comprehensive methodology to tackle two forms of uncertainty: (1) process variation on the memory system, (2) application dynamism. A statistical model has been developed to deal with variability derived from fabrication process, whereas system scenarios are selected to cope with dynamic domains. Both sources of uncertainty are firstly tackled in combination at design time, to be refined later, at setup. As a result, at run time the platform can be successfully adapted to the current application behaviour as well as the current variations. Our simulations show that this methodology provides significant energy savings while still meeting strict timing constraints.","PeriodicalId":183677,"journal":{"name":"ACM Trans. Embed. Comput. Syst.","volume":"1060 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132057805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Lo, Mao Lin Li, Li-Chun Chen, Yi-Shan Lu, R. Tsay, Hsu-Yao Huang, J. Yeh
{"title":"Automatic generation of high-speed accurate TLM models for out-of-order pipelined bus","authors":"C. Lo, Mao Lin Li, Li-Chun Chen, Yi-Shan Lu, R. Tsay, Hsu-Yao Huang, J. Yeh","doi":"10.1145/2536747.2536759","DOIUrl":"https://doi.org/10.1145/2536747.2536759","url":null,"abstract":"Although pipelined/out-of-order (PL/OO) execution features are commonly supported by the state-of-the-art bus designs, no existing manual Transaction-Level-Modeling (TLM) approaches can effectively construct fast and accurate simulation models for PL/OO buses. Mainly, the inherent high design complexity of concurrent PL/OO behaviors makes the manual approaches tedious and error-prone. To tackle the complicated modeling task, this article presents an automatic approach that performs systematic abstraction and generation of fast-and-accurate simulation models. The experimental results show that our approach reduces 21 times modeling efforts, while our generated models perform simulation an order of magnitude faster than Cycle-Accurate models with the same PL/OO transaction execution cycle counts preserved.","PeriodicalId":183677,"journal":{"name":"ACM Trans. Embed. Comput. Syst.","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128534679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive scheduling of real-time systems cosupplied by renewable and nonrenewable energy sources","authors":"M. Mohaqeqi, M. Kargahi, Maryam Dehghan","doi":"10.1145/2536747.2536758","DOIUrl":"https://doi.org/10.1145/2536747.2536758","url":null,"abstract":"Energy management is an important issue in today's real-time systems due to the high costs of energy supplying. Using renewable, like wave, wind, and solar energy sources seem promising methods to address this issue. However, because of the existing contrast between the critical nature of hard real-time systems and the unpredictable nature of renewable energies, some supplementary energy source like electricity grid or battery is needed. In this paper, we consider hard real-time systems with two renewable and nonrenewable energy sources. In order to reduce the costs, we present two dynamic voltage scaling controllers to minimize the energy attained from the latter source. In order to handle variations of the environmental energy and workload, the model predictive control approach is employed. One nonlinear approach beside one fast linear piecewise affine explicit controller are proposed. The efficacies of the proposed approaches have been investigated through extensive simulations. Comparisons to an ideal clairvoyant controller as a baseline show that, in the studied scenarios, the proposed controllers guarantee at least 78% of the baseline performance.","PeriodicalId":183677,"journal":{"name":"ACM Trans. Embed. Comput. Syst.","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133214569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated generation of polyhedral process networks from affine nested-loop programs with dynamic loop bounds","authors":"D. Nadezhkin, Hristo Nikolov, T. Stefanov","doi":"10.1145/2536747.2536750","DOIUrl":"https://doi.org/10.1145/2536747.2536750","url":null,"abstract":"The Process Networks (PNs) is a suitable parallel model of computation (MoC) used to specify embedded streaming applications in a parallel form facilitating the efficient mapping onto embedded parallel execution platforms. Unfortunately, specifying an application using a parallel MoC is a very difficult and highly error-prone task. To overcome the associated difficulties, we have developed the pn compiler, which derives specific Polyhedral Process Networks (PPN) parallel specifications from sequential static affine nested loop programs (SANLPs). However, there are many applications, for example, multimedia applications (MPEG coders/decoders, smart cameras, etc.) that have adaptive and dynamic behavior which cannot be expressed as SANLPs. Therefore, in order to handle dynamic multimedia applications, in this article we address the important question whether we can relax some of the restrictions of the SANLPs while keeping the ability to perform compile-time analysis and to derive PPNs. Achieving this would significantly extend the range of applications that can be parallelized in an automated way.\u0000 The main contribution of this article is a first approach for automated translation of affine nested loop programs with dynamic loop bounds into input-output equivalent Polyhedral Process Networks. In addition, we present a method for analyzing the execution overhead introduced in the PPNs derived from programs with dynamic loop bounds. The presented automated translation approach has been evaluated by deriving a PPN parallel specification from a real-life application called Low Speed Obstacle Detection (LSOD) used in the smart cameras domain. By executing the derived PPN, we have obtained results which indicate that the approach we present in this article facilitates efficient parallel implementations of sequential nested loop programs with dynamic loop bounds. That is, our approach reveals the possible parallelism available in such applications, which allows for the utilization of multiple cores in an efficient way.","PeriodicalId":183677,"journal":{"name":"ACM Trans. Embed. Comput. Syst.","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127119828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel architectures for the kNN classifier -- design of soft IP cores and FPGA implementations","authors":"I. Stamoulias, E. Manolakos","doi":"10.1145/2514641.2514649","DOIUrl":"https://doi.org/10.1145/2514641.2514649","url":null,"abstract":"We designed a variety of k-nearest-neighbor parallel architectures for FPGAs in the form of parameterizable soft IP cores. We show that they can be used to solve large classification problems with thousands of training vectors, or thousands of vector dimensions using a single FPGA, and achieve very high throughput. They can be used to flexibly synthesize architectures that also cover: 1NN classification (vector quantization), multishot queries (with different k), LOOCV cross-validation, and compare favorably to GPU implementations. To the best of our knowledge this is the first attempt to design flexible IP cores for the popular kNN classifier.","PeriodicalId":183677,"journal":{"name":"ACM Trans. Embed. Comput. Syst.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114920776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Custom architecture for multicore audio beamforming systems","authors":"D. Theodoropoulos, G. Kuzmanov, G. Gaydadjiev","doi":"10.1145/2514641.2514646","DOIUrl":"https://doi.org/10.1145/2514641.2514646","url":null,"abstract":"The audio Beamforming (BF) technique utilizes microphone arrays to extract acoustic sources recorded in a noisy environment. In this article, we propose a new approach for rapid development of multicore BF systems. Research on literature reveals that the majority of such experimental and commercial audio systems are based on desktop PCs, due to their high-level programming support and potential of rapid system development. However, these approaches introduce performance bottlenecks, excessive power consumption, and increased overall cost. Systems based on DSPs require very low power, but their performance is still limited. Custom hardware solutions alleviate the aforementioned drawbacks, however, designers primarily focus on performance optimization without providing a high-level interface for system control and test. In order to address the aforementioned problems, we propose a custom platform-independent architecture for reconfigurable audio BF systems. To evaluate our proposal, we implement our architecture as a heterogeneous multicore reconfigurable processor and map it onto FPGAs. Our approach combines the software flexibility of General-Purpose Processors (GPPs) with the computational power of multicore platforms. In order to evaluate our system we compare it against a BF software application implemented to a low-power Atom 330, a middle-ranged Core2 Duo, and a high-end Core i3. Experimental results suggest that our proposed solution can extract up to 16 audio sources in real time under a 16-microphone setup. In contrast, under the same setup, the Atom 330 cannot extract any audio sources in real time, while the Core2 Duo and the Core i3 can process in real time only up to 4 and 6 sources respectively. Furthermore, a Virtex4-based BF system consumes more than an order less energy compared to the aforementioned GPP-based approaches.","PeriodicalId":183677,"journal":{"name":"ACM Trans. Embed. Comput. Syst.","volume":"741 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116088515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giovanni Mariani, G. Palermo, V. Zaccaria, C. Silvano
{"title":"Design-space exploration and runtime resource management for multicores","authors":"Giovanni Mariani, G. Palermo, V. Zaccaria, C. Silvano","doi":"10.1145/2514641.2514647","DOIUrl":"https://doi.org/10.1145/2514641.2514647","url":null,"abstract":"Application-specific multicore architectures are usually designed by using a configurable platform in which a set of parameters can be tuned to find the best trade-off in terms of the selected figures of merit (such as energy, delay, and area). This multi-objective optimization phase is called Design-Space Exploration (DSE). Among the design-time (hardware) configurable parameters we can find the memory subsystem configuration (such as cache size and associativity) and other architectural parameters such as the instruction-level parallelism of the system processors. Among the runtime (software) configurable parameters we can find the degree of task-level parallelism associated with each application running on the platform.\u0000 The contribution of this article is twofold; first, we introduce an evolutionary (NSGA-II-based) methodology for identifying a hardware configuration which is robust with respect to applications and corresponding datasets. Second, we introduce a novel runtime heuristic that exploits design-time identified operating points to provide guaranteed throughput to each application. Experimental results show that the design-time/runtime combined approach improves the runtime performance of the system with respect to existing reference techniques, while meeting the overall power budget.","PeriodicalId":183677,"journal":{"name":"ACM Trans. Embed. Comput. Syst.","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124645594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory performance estimation of CUDA programs","authors":"Yooseong Kim, Aviral Shrivastava","doi":"10.1145/2514641.2514648","DOIUrl":"https://doi.org/10.1145/2514641.2514648","url":null,"abstract":"CUDA has successfully popularized GPU computing, and GPGPU applications are now used in various embedded systems. The CUDA programming model provides a simple interface to program on GPUs, but tuning GPGPU applications for high performance is still quite challenging. Programmers need to consider numerous architectural details, and small changes in source code, especially on the memory access pattern, can affect performance significantly. This makes it very difficult to optimize CUDA programs. This article presents CuMAPz, which is a tool to analyze and compare the memory performance of CUDA programs. CuMAPz can help programmers explore different ways of using shared and global memories, and optimize their program for efficient memory behavior. CuMAPz models several memory-performance-related factors: data reuse, global memory access coalescing, global memory latency hiding, shared memory bank conflict, channel skew, and branch divergence. Experimental results show that CuMAPz can accurately estimate performance with correlation coefficient of 0.96. By using CuMAPz to explore the memory access design space, we could improve the performance of our benchmarks by 30% more than the previous approach [Hong and Kim 2010].","PeriodicalId":183677,"journal":{"name":"ACM Trans. Embed. Comput. Syst.","volume":"1632 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127446879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}