{"title":"Modelling run-time arbitration by latency-rate servers in dataflow graphs","authors":"M. Wiggers, M. Bekooij, G. Smit","doi":"10.1145/1269843.1269846","DOIUrl":"https://doi.org/10.1145/1269843.1269846","url":null,"abstract":"In order to obtain a cost-efficient solution, tasks share resources in a Multi-Processor System-on-Chip. In our architecture, shared resources are run-time scheduled. We show how the effects of Latency-Rate servers, which is a class of run-time schedulers, can be included in a dataflow model. The resulting dataflow model, which can have an arbitrary topology, enables us to provide guarantees on the temporal behaviour of the implementation.\u0000 Traditionally, the end-to-end behaviour of multiple Latency-Rate servers has been analysed with Latency-Rate analysis, which is a Network Calculus. This paper bridges a gap between Network Calculi and dataflow analysis techniques, since we show that a class of run-time schedulers can now be included in dataflow models, or, from a Network Calculus perspective, that restrictions on the topology of graphs that include run-time scheduling can be removed.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124395586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient buffer capacity and scheduler setting computation for soft real-time stream processing applications","authors":"M. Bekooij, M. Wiggers, J. V. Meerbergen","doi":"10.1145/1269843.1269845","DOIUrl":"https://doi.org/10.1145/1269843.1269845","url":null,"abstract":"Soft real-time applications that process data streams can often be intuitively described as dataflow process networks. In this paper we present a novel analysis technique to compute conservative estimates of the required buffer capacities in such process networks. With the same analysis technique scheduler settings can be verified. Unlike many other soft real-time analysis techniques, it is guaranteed that the desired throughput is obtained for the input stream that is used to characterize the application.\u0000 Experiments with artificial test-cases indicate that the computed FIFO capacities become more conservative if the desired throughput gets closer to the maximum throughput. The run-time of our algorithm for an H263 video decoder test-case was 14 seconds.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121303981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Language support for interoperable messaging in sensor networks","authors":"Kevin K. Chang, David E. Gay","doi":"10.1145/1140389.1140390","DOIUrl":"https://doi.org/10.1145/1140389.1140390","url":null,"abstract":"Development of network communication in a homogeneous sensor network environment is straightforward as the nodes can share message layouts simply by letting the compiler lay out messages in an arbitrary fashion and using the same executable code on all nodes. However, this simple approach does not usually work in a heterogeneous sensor network setting because different compilers may generate different message layouts, and different processors often have different basic type representations and alignments. The traditional solutions to this problem is to either require programmers to insert network-byte-order and host-byte-order conversions, or to use a compiler that automatically generates marshalling and unmarshalling routines. Unfortunately, these approaches are in-adequate for sensor networks because they are either error-prone and/or add significant overheads to already resource-constrained sensor motes. Instead, we propose a language extension --- network types --- which supports heterogeneous networking in a simple and efficient way. We have implemented network types in the nesC, the language of the TinyOS sensor network operating system and its applications. We have used network types to supports heterogeneous networking between micaz and telos motes (which have different alignment restrictions). We also show that our implementation introduces a negligible amount of overhead in runtime and code size. Network types have the additional benefit of requiring few changes to existing TinyOS code.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123290391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A software-only compression system for trading-offs between performance and code size","authors":"K. Heydemann, F. Bodin, H. Charles","doi":"10.1145/1140389.1140393","DOIUrl":"https://doi.org/10.1145/1140389.1140393","url":null,"abstract":"The design of an embedded system is often heavily constrained by its performance objective and budget envelope. Software code compression may reduce the instruction memory space and then the overall cost of the system. However, it may also induce performance degradation. Previous studies proposed selective code compression using profile information in order to reduce the performance penalty. In this paper, we go one step further. We propose a software-only compression system, called SCS, that automatically finds trade-offs between code size and performance. Through an iterative approach, SCS automatically determines which functions to be compressed given a performance constraint and/or a code size constraint in order to guarantee a minimal performance and a maximal code size for an application. Experimentations illustrate that even with a non-optimal software decompression approach, SCS achieves a high compression rate with a minimal performance degradation.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122613559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power optimizations for the MLCA using dynamic voltage scaling","authors":"I. Matosevic, T. Abdelrahman, F. Karim, A. Mellan","doi":"10.1145/1140389.1140401","DOIUrl":"https://doi.org/10.1145/1140389.1140401","url":null,"abstract":"Dynamic voltage scaling (DVS) is an effective method for reducing processor power consumption. We present a compiler-based technique for DVS-based power optimizations of multimedia applications in the context of the Multi-Level Computing Architecture (MLCA) a novel architecture for parallel systems-on-a-chip. Our technique combines dependence analysis of long-running loops with profiling information in order to identify the slack available in the execution of parallel tasks. DVS is then applied to slow down processors executing noncritical-path tasks, reducing power with little or no impact on execution time. We evaluate our technique using realistic multimedia applications and a simulator of the MLCA. The results demonstrate that up to 10% savings in processor power consumption can be achieved with no more than 1.5% increase in execution time. Although our technique is developed in the context of MLCA, we believe that it is applicable in the broader context of task-level parallelism in multimedia applications.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134394213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining compiler and operating system support for energy efficient I/O on embedded platforms","authors":"Ripal Nathuji, B. Seshasayee, K. Schwan","doi":"10.1145/1140389.1140398","DOIUrl":"https://doi.org/10.1145/1140389.1140398","url":null,"abstract":"Mobile and embedded platforms have experienced dramatic advances in capabilities, largely due to the development of associated peripheral devices for storage and communication. The incorporation of these I/O devices has increased the overall power envelope of these platforms. In fact, system-level power consumption of mobile platforms is often dominated by peripheral devices. Since battery technologies alone have been unable to provide the lifetimes required by many platforms, in order to conserve energy, most devices provide the ability to transition into low power states during idle periods. The resulting energy savings are heavily dependent upon the lengths and number of idle periods experienced by a device. This paper presents an infrastructure designed to take advantage of device low power states by increasing the burstiness of device accesses and idle periods to provide a reduced power profile, and thereby an improvement in battery life. Our approach combines compiler-based source modifications with operating system support to implement a dynamic solution for enhanced energy consumption. We evaluate our infrastructure on an XScale-based embedded platform with a Linux implementation.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132137029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Global memory optimisation for embedded systems allowed by code duplication","authors":"M. Palkovic, H. Corporaal, F. Catthoor","doi":"10.1145/1140389.1140397","DOIUrl":"https://doi.org/10.1145/1140389.1140397","url":null,"abstract":"The data transfers and storage are dominating contributors to the area and power consumption for all modern multimedia embedded systems. Modern high-level memory optimisations can ensure cost-efficient realisation of these systems. An important step in these optimisations are loop transformations performed on a geometrical model. However, these loop transformations traditionally cannot optimise code across data dependent conditions.In this paper we selectively duplicate the code in order to enable global loop transformations across data dependent conditions. We propose a technique which finds in a systematic way the Pareto curve in 2D exploration space: the better memory optimisations vs. the code increase. Our technique has been tested on an MP3 audio decoder. Results show 45.8% decrease in the number of main memory accesses which requires a 16.2% increase of code size.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126727544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance guarantees by simulation of process","authors":"M. Bekooij, J. V. Meerbergen, Sonali Parma","doi":"10.1145/1140389.1140391","DOIUrl":"https://doi.org/10.1145/1140389.1140391","url":null,"abstract":"In this paper we derive the end-to-end temporal behavior of real-time applications that are described as process networks. We demonstrate that a tight upper bound on the arrival time of data can be derived by simulation of this process network. We also show that the effects of arbitration can be taken into account if resources are reserved. For an H263 video decoder example we derive by means of simulation the settings of the schedulers and the buffer capacities. We arrive at the conclusion that for this application a close to maximum throughput is obtained with small buffers if only one process is executed on each processor. Larger buffers are needed if processors are shared and processes are executed during long time-slices.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128922813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Leventhal, Lin Yuan, N. Bambha, S. Bhattacharyya, G. Qu
{"title":"DSP address optimization using evolutionary algorithms","authors":"S. Leventhal, Lin Yuan, N. Bambha, S. Bhattacharyya, G. Qu","doi":"10.1145/1140389.1140399","DOIUrl":"https://doi.org/10.1145/1140389.1140399","url":null,"abstract":"Offset assignment has been studied as a highly effective approach to code optimization in modern digital signal processors (DSPs). In this paper, we propose two evolutionary algorithms to solve the general offset assignment problem with k address registers and an arbitrary auto-modify range. These algorithms differ from previous algorithms by having the capability of visiting the entire search space. We implement and analyze a variety of existing general offset assignment algorithms and test them on a set of standard benchmarks. The algorithms we propose can achieve a performance improvement of up to 31% over the best existing algorithm. We also achieve an average of 14% improvement over the union of recently proposed algorithms.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116723039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A real-time garbage collection framework for embedded systems","authors":"Wei Fu, C. Hauser","doi":"10.1145/1140389.1140392","DOIUrl":"https://doi.org/10.1145/1140389.1140392","url":null,"abstract":"Garbage collection is increasingly prevalent as part of the programming landscape, but its use in real-time embedded systems remains problematic. One approach is to separate the specification of the timing requirements of real-time tasks, the memory use behavior of the code that implements them, and the configuration of the memory management system to ensure that the tasks' real-time requirements are met. The Real Time Garbage Collector (RTGC) framework provides common language for describing a broad class of real-time collectors. The notion of an RTGC configuration, based on the framework, provides a target for automating the task of configuring automatic memory management for a real-time system and its workload.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126454470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}