{"title":"Architectural support for safe software execution on embedded processors","authors":"Divya Arora, A. Raghunathan, S. Ravi, N. Jha","doi":"10.1145/1176254.1176281","DOIUrl":"https://doi.org/10.1145/1176254.1176281","url":null,"abstract":"The lack of memory safety in many popular programming languages, including C and C++, has been a cause for great concern in the realm of software reliability, verification, and more recently, system security. Despite their limitations, the flexibility, performance, and ease of use of these languages have made them the choice of most embedded software developers. Researchers have proposed various techniques to enhance programs for memory safety; however, they are all subject to severe performance penalties, making their use impractical in most scenarios. In this paper, we present architectural enhancements to enable efficient, memory-safe execution of software on embedded processors. The key insight behind our approach is to extend embedded processors with hardware that significantly accelerates the execution of the additional computations involved in memory-safe execution. Specifically, we design custom instructions to perform various kinds of memory-safety checks and augment the instruction set of a state-of-the-art extensible processor (Xtensa from Tensilica, Inc.) to implement them. We demonstrate the application of the proposed architectural enhancements using CCured, an existing tool for type-safe retrofitting of C programs. The tool uses a type-inferencing engine that is built around strong type-safety theory and is provably safe. Simulations of memory-safe versions of popular embedded benchmarks on a cycle-accurate simulator modeling a typical embedded system configuration indicate an average performance improvement of 2.3 times, and a maximum of 4.6 times, when using the proposed architecture. These enhancements entail minimal (less than 10%) hardware overhead to the base processor. Our approach is completely automated, and applicable to any C program, making it a promising and practical approach for addressing the growing security and reliability concerns in embedded software.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130854138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The pipeline decomposition tree:: an analysis tool for multiprocessor implementation of image processing applications","authors":"D. Ko, S. Bhattacharyya","doi":"10.1145/1176254.1176269","DOIUrl":"https://doi.org/10.1145/1176254.1176269","url":null,"abstract":"Modern embedded systems for image processing involve increasingly complex levels of functionality under real-time and resource-related constraints. As this complexity increases, the application of single-chip multiprocessor technology is attractive. To address the challenges of mapping image processing applications onto embedded multiprocessor platforms, this paper presents a novel data structure called the pipeline decomposition tree (PDT), and an associated scheduling framework, which we refer to as PDT scheduling. PDT scheduling exploits both heterogeneous data parallelism and task-level parallelism, which are important considerations for scheduling image processing applications. This paper develops the PDT representation for system synthesis, and presents methods using the PDT to derive customized pipelined architectures that are streamlined for the given implementation constraints.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125106077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application-specific workload shaping in multimedia-enabled personal mobile devices","authors":"Balaji Raman, S. Chakraborty","doi":"10.1145/1176254.1176259","DOIUrl":"https://doi.org/10.1145/1176254.1176259","url":null,"abstract":"Today, most personal mobile devices (e.g. cell phones and PDAs) are multimedia-enabled and support a variety of concurrently running applications such as audio/video players, word processors and Web browsers. Media-processing applications are often computationally expensive and most of these devices typically have 100 - 400 MHz processors. As a result, the user-perceived application response times are often poor when multiple applications are concurrently fired. In this paper we show that by using application-specific dynamic buffering techniques, the workload of these applications can be suitably \"shaped\" to fit the available processor bandwidth. Our techniques are analogous to traffic shaping which is widely used in communication networks to optimally utilize network bandwidth. Such shaping techniques have recently attracted a lot of attention in the context of embedded systems design (e.g. for dynamic voltage scaling). However, they have not been exploited for enhanced schedulability of multiple applications, as we do in this paper.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115496505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware assisted pre-emptive control flow checking for embedded processors to improve reliability","authors":"R. Ragel, S. Parameswaran","doi":"10.1145/1176254.1176280","DOIUrl":"https://doi.org/10.1145/1176254.1176280","url":null,"abstract":"Reliability in embedded processors can be improved by control flow checking and such checking can be conducted using software or hardware. Proposed software-only approaches suffer from significant code size penalties, resulting in poor performance. Proposed hardware-assisted approaches are not scalable and therefore cannot be implemented in real embedded systems. This paper presents a scalable, cost effective and novel fault detection technique, to ensure proper control flow of a program. This technique includes architectural changes to the processor and software modifications. While architectural refinement incorporates additional instructions, the software transformation utilizes these instructions into the program flow. Applications from an embedded systems benchmark suite are used for testing and evaluation. The overheads are compared with the state of the art approach that performs the same error coverage using software-only techniques. Our method has greatly reduced overheads compared to the state of the art. Our approach increased code size by between 3.85-11.2% and reduced performance by just 0.24-1.47% for eight different industry standard applications. The additional hardware (gates) overhead in this approach was just 3.59%. In contrast, the state of the art software- only approach required 50-150% additional code, and reduced performance by 53.5-99.5% when error detection was inserted.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121754500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Streamroller:: automatic synthesis of prescribed throughput accelerator pipelines","authors":"M. Kudlur, Kevin Fan, S. Mahlke","doi":"10.1145/1176254.1176321","DOIUrl":"https://doi.org/10.1145/1176254.1176321","url":null,"abstract":"In this paper, we present a methodology for designing a pipeline of accelerators for an application. The application is modeled using sequential C language with simple stylizations. The synthesis of the accelerator pipeline involves designing loop accelerators for individual kernels, instantiating buffers for arrays used in the application, and hooking up these building blocks to form a pipeline. A compiler-based system automatically synthesizes loop accelerators for individual kernels at varying performance levels. An integer linear program formulation which simultaneously optimizes the cost of loop accelerators and the cost of memory buffers is proposed to compose the loop accelerators to form an accelerator pipeline for the whole application. Cases studies for some applications, including FMRadio and Beamformer, are presented to illustrate our design methodology. Experiments show significant cost savings are achieved through hardware sharing, while achieving the prescribed throughput requirements.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126784622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Increasing hardware efficiency with multifunction loop accelerators","authors":"Kevin Fan, M. Kudlur, Hyunchul Park, S. Mahlke","doi":"10.1145/1176254.1176322","DOIUrl":"https://doi.org/10.1145/1176254.1176322","url":null,"abstract":"To meet the conflicting goals of high-performance low-cost embedded systems, critical application loop nests are commonly executed on specialized hardware accelerators. These loop accelerators are traditionally designed in a single-function manner, wherein each loop nest is implemented as a dedicated hardware block. This paper focuses on hardware sharing across loop nests by creating multifunction loop accelerators, or accelerators capable of executing multiple algorithms. A compiler-based system for automatically synthesizing multifunction loop accelerator architectures from C code is presented. We compare the effectiveness of three architecture synthesis approaches with varying levels of complexity: sum of individual accelerators, union of individual accelerators, and joint accelerator synthesis. Experiments show that multifunction accelerators achieve substantial hardware savings over combinations of single-function designs. In addition, the union approach to multifunction synthesis is shown to be effective at creating low-cost hardware by exploiting hardware sharing, while remaining computationally tractable.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130330900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UML and model-driven development for SoC design","authors":"W. Mueller, Y. Vanderperren","doi":"10.1145/1176254.1176255","DOIUrl":"https://doi.org/10.1145/1176254.1176255","url":null,"abstract":"Summary form only given. UML (Unified Modeling LanguageTM) as an OMG standard has received wide acceptance in software engineering over the last years. As electronic systems design moved towards software engineering, there is emerging interest for UML within the hardware community and different UML diagrams and their variations found their application in requirements specification, testbenches, architectural descriptions, and behavioral modeling. In most cases, UML is just applied as a graphical capture, though UML 2.0 meanwhile comes as a computationally complete language based on a generic metamodeling mechanism. Though it introduces considerable complexity, it is one of the key strengths of UML 2.0, providing a flexible foundation for its customization towards different application domains through so-called UML profiles, which currently receives increasing tool support and gives UML great potential to complement current C++-oriented languages for ESL design. In this context, SysML and the UML for SoC extension are already available as OMG profiles for Systems Engineering and SoC application and several proprietary profiles are under development. In that context, the concepts of the Model Driven Architecture (MDA) are of emerging interest. However, since MDA was mainly introduced for CASE tool support, its full application for hardware design still needs some investigations and certainly comes with some pitfalls. For industrial applications, the availability of appropriate tool support is crucial for deployment of UML in SoC design. UML tools currently come in different variations based on different UML versions and subsets with the support of specific flows, so that the selection of the appropriate tools becomes a key decision for the successful introduction of UML. Recently, several groups have reported positive outcomes regarding the customization of UML and tool support towards SoC design. These efforts result from collaborations between industrial users, researchers, and tool vendors, and constitute steps in the right direction. Regarding model exchange between tools, the UML-related XMI (XML Metadata Interchange) format and its relationship to SPIRIT, the emerging IEEE standard, are of additional particular interest. Partial overlaps can be identified and are currently under investigations by some projects, like SPRINT.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"3 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114011924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A unified hardware/software runtime environment for FPGA-based reconfigurable computers using BORPH","authors":"Hayden Kwok-Hay So, A. Tkachenko, R. Brodersen","doi":"10.1145/1176254.1176316","DOIUrl":"https://doi.org/10.1145/1176254.1176316","url":null,"abstract":"This paper presents a hw/sw codesign methodology based on BORPH, an operating system designed for FPGA-based reconfigurable computers (RC's). By providing native kernel support for FPGA hardware, BORPH offers a homogeneous UNIX interface for both software and hardware processes. Hardware processes inherit the same level of service from the kernel, such as file system support, as typical UNIX software processes. Hardware and software components of a design therefore run as hardware and software processes within BORPH's run-time environment. The familiar and language independent UNIX kernel interface facilitates easy design reuse and rapid application development. Performance of our current implementation and our experience with developing a real-time wireless digital signal processing system based on BORPH will be presented.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122140596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automotive electronics system, software, and local area network","authors":"Y. Furukawa, S. Kawamura","doi":"10.1145/1176254.1176256","DOIUrl":"https://doi.org/10.1145/1176254.1176256","url":null,"abstract":"In this tutorial, an overview of automotive electronic systems and details of the development methodologies are presented. Automobiles were born to enhance human mobile performance. In early development stage, automotive engineers focused to strengthen automobile engine power. Afterwards, automobiles had enough function to drive faster than any animal, but they caused some social problems such as traffic accidents, environmental problems and traffic congestions. Automotive electronic technologies have been developed in order to solve these social problems. Roles of electronic technologies on automobile functional developments For the solution to safely, environment and traffic problems, various functions are necessary which could not be completed only by mechanical systems. In this section, roles of automobile electronic systems on countermeasures to the social problems are discussed. Vehicle motion control systems, power-train control systems, navigation systems, and advanced drive assist systems are introduced and automotive functions are defined. Design requirement for automotive electronic systems architecture Electronic systems composed basically of sensors, ECU's (Electronic Control Units), actuators and human interfaces. In early days, each electronic system was designed independently. Today's automobile has various functions which could be completed by multiple electric systems. Therefore, fundamental architecture of integrated electronic systems in an automobile is important to be designed in order to optimize the total function, cost and productivity. Design and development procedure of electronic systems and software In vehicle systems and software, required functions and complexity of products are increasing. In this situation, ECU suppliers are working with efficient development methodology to achieve the highest quality. Today most common development processes are still classical V shaped process, module design and C language programming. However, several new technologies such as UML design method are tried and some of them are adopted as the standard process. Automatic testing and simulation environment are also important for the development procedure. Automotive Local Area Network Recently more and more automotive equipment are controlled electronically and the number of ECU's is increasing. The number of wire harnesses is also increasing and many problems such as the increase of weight, lack of installation space and difficulty of handling are experienced. As the solution of these problems, multiplexing with automotive Local Area Network is important to secure high speed communication as well as to decrease the weight and volume of wire harnesses. We will review technologies of automotive Local Area Network from CAN and LIN, which are currently de facto standards, to FlexRay that is about to start being adopted.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115778103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Radhakrishnan, Hui Guo, S. Parameswaran, A. Ignjatović
{"title":"Application specific forwarding network and instruction encoding for multi-pipe ASIPs","authors":"S. Radhakrishnan, Hui Guo, S. Parameswaran, A. Ignjatović","doi":"10.1145/1176254.1176313","DOIUrl":"https://doi.org/10.1145/1176254.1176313","url":null,"abstract":"Small area and code size are two critical design issues in most of embedded system designs. In this paper, we tackle these issues by customizing forwarding networks and instruction encoding schemes for multi-pipe Application Specific Instruction-Set Processors (ASIPs). Forwarding is a popular technique to reduce data hazards in the pipeline to improve performance and is applied in almost all modern processor designs; but it is very area expensive. Instruction encoding schemes have a direct impact on code size; an efficient encoding method can lead to a small instruction width, and hence reducing the code size. We propose application specific techniques to reduce forwarding networks and instruction widths for ASIPs with multiple pipelines. By these design techniques, it is possible to reduce area, code size, and even power consumption (due to reduced area), without costing any performance. Our experiments, on a set of benchmarks using the proposed customization approaches show that, on average, there are 27% savings on area, 30% on leakage power, 16.7% on code size, and at the same time, performance even improves by 4% because of the reduced clock period.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115611583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}