{"title":"Traffic-balanced IP mapping algorithm for 2D-mesh On-Chip-Networks","authors":"Ting-Jung Lin, Shu-Yen Lin, A. Wu","doi":"10.1109/SIPS.2008.4671762","DOIUrl":"https://doi.org/10.1109/SIPS.2008.4671762","url":null,"abstract":"Intellectual Properties (IPs) mapping algorithms for On-Chip-Networks (OCNs) allocate a set of IPs onto given network topologies. The existing mapping algorithms limit a single IP to connect to a single router. Hence, the IPs with large communication volumes will result in heavy traffic loads of certain routers. Those routers may become hot spots due to high power density, which affects the reliability of chips. In this paper, new Network Interfaces (NIs) were proposed to solve the aforementioned problem. Traffic-Balanced Mapping Algorithm (TBMAP) is also proposed based on the new NIs. The traffic loads then become more decentralized, and the traffic of all the routers on the chip can be balanced without sacrificing the networking performance. The TBMAP has short runtime to achieve balanced network traffic loads, which leads to the enhanced performance of OCNs. The experimental results show that at least 24% communication time is saved for real applications.","PeriodicalId":173371,"journal":{"name":"2008 IEEE Workshop on Signal Processing Systems","volume":"347 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116495070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cooperative OFDM for energy-efficient wireless sensor networks","authors":"Weiguo Tang, Lei Wang","doi":"10.1109/SIPS.2008.4671741","DOIUrl":"https://doi.org/10.1109/SIPS.2008.4671741","url":null,"abstract":"This paper presents a new energy-efficient cooperative sensor network scheme. Cooperative nodes shift the carrier frequency of the source node, and the destination node receives the signal from the source node and the cooperative nodes as a standard orthogonal frequency division multiplexing (OFDM) receiver. The maximal ratio combining (MRC) approach is applied to achieve full diversity with low complexity. The proposed scheme is scalable with respect to the number of nodes and is robust to time synchronization. A clustered network topology and cooperation protocol are proposed to realize multi-hop long haul signal transmission. The effect of the cluster scale is studied thoroughly to obtain the optimal size of clusters subject to the cooperation overhead. Simulation results show that the proposed cooperation scheme can consistently reduce the transmission power with the increase of cooperation scale. However, there is an optimal cluster scale where a good tradeoff between the diversity gain and cooperation cost can be established to minimize the total power consumption of the whole network.","PeriodicalId":173371,"journal":{"name":"2008 IEEE Workshop on Signal Processing Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125595408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimal complexity low-latency architectures for Viterbi decoders","authors":"Renfei Liu, K. Parhi","doi":"10.1109/SIPS.2008.4671752","DOIUrl":"https://doi.org/10.1109/SIPS.2008.4671752","url":null,"abstract":"For Viterbi decoders, high throughput rate is achieved by applying look-ahead techniques in the add-compare-select unit, which is the system speed bottleneck. Look-ahead techniques combine multiple binary trellis steps into one equivalent complex trellis step in time sequence, which is referred to as the branch metrics precomputation (BMP) unit. The complexity and latency of BMP increase exponentially and linearly with respect to the look-ahead levels, respectively. For a Viterbi decoder with constraint length K and M-step look-ahead, 2M+K-1 branch metrics need to be computed and compared. In this paper, the computational redundancy in existing branch metric computation approaches is first recognized, and a general mathematical model for describing the approach space is built, based on which a new approach with minimal complexity and latency is proposed. The proof of its optimality is also given. This highly efficient approach leads to a novel overall optimal architecture for M that is any multiple of K. The results show that the proposed approaches can reduce the complexity by up to 45.65% and the latency by up to 72.50%. In addition, the proposed architecture can also be applied when M is any value while achieving the minimal complexity.","PeriodicalId":173371,"journal":{"name":"2008 IEEE Workshop on Signal Processing Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126944104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient interpolration architecture for soft-decision Reed-Solomon decoding by applying slow-down","authors":"Xinmiao Zhang, Jiangli Zhu","doi":"10.1109/SIPS.2008.4671731","DOIUrl":"https://doi.org/10.1109/SIPS.2008.4671731","url":null,"abstract":"Among various decoding algorithms of Reed-Solomon (RS) codes, algebraic soft-decision decoding (ASD) can achieve significant coding gain with polynomial complexity. One major step of ASD is the interpolation. The interpolation problem can be solved by the Nielsonpsilas algorithm, which involves discrepancy coefficient computation. This computation requires a feedback loop with one multiplier, one adder and one register. The maximum clock frequency of the interpolation architecture is limited by the multiplier-adder path in this loop. In this paper, we propose to employ the slow-down technique to increase the register number in the feedback loop, such that the multiplier-adder path can be divided into shorter segments through retiming to achieve higher clock frequency. In addition, input sequences to the feedback loops are interleaved. Applying the proposed interpolation architecture to a (255, 239) RS code with maximum multiplicity three, 43% higher efficiency in terms of speed/area ratio can be achieved over prior efforts.","PeriodicalId":173371,"journal":{"name":"2008 IEEE Workshop on Signal Processing Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128231437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min Li, D. Novo, B. Bougard, F. Naessens, L. Perre, F. Catthoor
{"title":"An implementation friendly low complexity multiplierless LLR generator for soft MIMO sphere decoders","authors":"Min Li, D. Novo, B. Bougard, F. Naessens, L. Perre, F. Catthoor","doi":"10.1109/SIPS.2008.4671748","DOIUrl":"https://doi.org/10.1109/SIPS.2008.4671748","url":null,"abstract":"When combined with advanced FEC techniques such as the turbo code and LDPC code, soft-output MIMO sphere decoders significantly outperform hard-output sphere decoders. Hence, algorithms and implementations of soft-output sphere decoders have attracted intensive interest in recent years. Practical soft-output sphere decoder implementations often consist of a list generator and a LLR generator. Most existing implementations focus on the list generator, and the LLR generator is implemented in a relatively straightforward way. However, the LLR generator accounts for a great part of the complexity. Our contribution is an implementation friendly low complexity multiplierless LLR generator. We apply selective and incremental updating, algebraic simplifications and strength reductions to reduce the algorithmic complexity and to eliminate all multiplications. When integrated with the SSFE list generator, our scheme not only remove 100% multiplications, but also remove 26% to 83% additions, 76% to 94% bit-shifts and 63% to 91% memory operations. Besides the algorithmic aspects, we extract the key data-flow block with well-defined control signals. This can be easily mapped onto micro-architectures and implemented as the data-path in ASICs, or a function unit in ASIPs.","PeriodicalId":173371,"journal":{"name":"2008 IEEE Workshop on Signal Processing Systems","volume":"25 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129856011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kalman filtering based motion estimation for video coding with adaptive block partitioning","authors":"Yi-Shiou Luo, M. Celenk","doi":"10.1109/SIPS.2008.4671750","DOIUrl":"https://doi.org/10.1109/SIPS.2008.4671750","url":null,"abstract":"In this paper, a new block-based motion estimation (ME) method is proposed which uses the Kalman filtering (KF) with adaptive block partitioning (ABP) to improve the motion estimates resulting from conventional block-matching algorithms (BMAs). In our method, a first order autoregressive model is applied to the motion vectors (MVs) obtained by BMAs. The motion correlations between neighboring blocks are utilized to predict motion information. According to the statistics of the frame MVs, 16times16 macro-blocks (MBs) are split into 8times8 blocks or 4times4 sub-blocks adaptively for the Kalman filtering (KF). To further improve the performance, a zigzag scanning is adopted and the state parameters of the Kalman filter are adjusted adaptively during the each KF iteration. The experimental results indicate that the proposed method can effectively improve the ME performance in terms of the peak-signal-to-noise-ratio (PSNR) of the motion compensated images with smoother motion vector fields.","PeriodicalId":173371,"journal":{"name":"2008 IEEE Workshop on Signal Processing Systems","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134057223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast multiple reference frame selection methods for H.264/AVC","authors":"Shin Wang Ho, S. D. Kim, M. Sunwoo","doi":"10.1109/SIPS.2008.4671751","DOIUrl":"https://doi.org/10.1109/SIPS.2008.4671751","url":null,"abstract":"This paper proposes three efficient frame selection schemes to reduce the computation complexity for the multi-reference and variable block size motion estimation (ME). The proposed reference selection pass scheme can minimize the overhead of frame selection. The modified frame selection scheme can reduce the number of search points about 18% compared with existing schemes. In addition, the two pass reference frame selection scheme is proposed to minimize the frame selection operation for the variable block size ME in H.264/AVC. The simulation results show the proposed schemes can save up to 50% of the ME computation without degradation of image quality. Because the proposed schemes can be separated from the block matching process, they can be used with any existing single reference fast search algorithms.","PeriodicalId":173371,"journal":{"name":"2008 IEEE Workshop on Signal Processing Systems","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131297693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Naessens, B. Bougard, Siebert Bressinck, L. Hollevoet, P. Raghavan, L. Perre, F. Catthoor
{"title":"A unified instruction set programmable architecture for multi-standard advanced forward error correction","authors":"F. Naessens, B. Bougard, Siebert Bressinck, L. Hollevoet, P. Raghavan, L. Perre, F. Catthoor","doi":"10.1109/SIPS.2008.4671733","DOIUrl":"https://doi.org/10.1109/SIPS.2008.4671733","url":null,"abstract":"The continuously increasing number of communication standards to be supported in nomadic devices combined with the fast ramping design cost in deep submicron technologies claim for highly reusable and flexible programmable solutions. Software defined radio (SDR) aims at providing such solutions in radio baseband architectures. Great advances were recently booked in handset-targeted SDR, covering most of the baseband processing with satisfactory performance and energy efficiency. However, as it typically depicts a magnitude higher computation load, forward error correction (FEC) has been excluded from the scope of high throughput SDR solutions and let to dedicated hardware accelerators. The currently growing number of advanced FEC options claims however for flexibility there too. This paper presents the first application-specific instruction programmable architecture addressing in a unified way the emerging turbo- and LPDC coding requirements of 3GPP-LTE, IEEE802.11n, IEEE802.16(e) and DVB-S2/T2. The proposal shows a throughput from 0.07 to 1.25 Mbps/MHz with efficiencies round 0.32 nJ/bit/iter in turbo mode and round 0.085 nJ/bit/iter in LDPC mode. The area is lower than the cumulated area of dedicated turbo and LDPC solution.","PeriodicalId":173371,"journal":{"name":"2008 IEEE Workshop on Signal Processing Systems","volume":"284 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133484703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Novo, Min Li, B. Bougard, F. Naessens, L. Perre, F. Catthoor
{"title":"Application-driven adaptive fixed-point refinement for SDRs","authors":"D. Novo, Min Li, B. Bougard, F. Naessens, L. Perre, F. Catthoor","doi":"10.1109/SIPS.2008.4671770","DOIUrl":"https://doi.org/10.1109/SIPS.2008.4671770","url":null,"abstract":"Wireless interfaces implement and increasing number of different standards. For cost effectiveness, flexible radio implementations are preferred over the multiplication of dedicated solutions. Software Defined Radios (SDR) have been introduced as the ultimate way to achieve such flexibility. However, the reduced energy budget required by battery-powered solutions makes the typical worst-case static dimensioning unaffordable under highly dynamic operating conditions. Instead, energy-scalable algorithms and implementations are entailed to provide flexibility while maintaining the required energy efficiency. Particularly, energy-scalable implementations can exploit data-format properties to offer different tradeoffs between accuracy and energy. In this paper, an application-driven adaptive fixed-point refinement methodology is proposed. The latter derives the minimum word-lengths which respect a user-defined degradation on the application performance. This technique is applied to the fixed-point refinement of a Near-ML MIMO (Multiple Inputs, Multiple Outputs) detector. Variations on the minimum required precision depending on external conditions are made explicit. Finally, on a processor platform these variations can be translated into reduced cycles and energy by leveraging on sub-word parallel implementations.","PeriodicalId":173371,"journal":{"name":"2008 IEEE Workshop on Signal Processing Systems","volume":"471 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131990045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unified decoder architecture for LDPC/turbo codes","authors":"Yang Sun, Joseph R. Cavallaro","doi":"10.1109/SIPS.2008.4671730","DOIUrl":"https://doi.org/10.1109/SIPS.2008.4671730","url":null,"abstract":"Low-density parity-check (LDPC) codes on par with convolutional turbo codes (CTC) are two of the most powerful error correction codes known to perform very close to the Shannon limit. However, their different code structures usually lead to different hardware implementations. In this paper, we propose a unified decoder architecture that is capable of decoding both LDPC and turbo codes with a limited hardware overhead. We employ maximum a posteriori (MAP) algorithm as a bridge between LDPC and turbo codes. We represent LDPC codes as parallel concatenated single parity check (PCSPC) codes and propose a group sub-trellis (GST) decoding algorithm for the efficient decoding of PCSPC codes. This algorithm achieves about 2X improvement in the convergence speed and is more numerically robust than the classical ldquotanhrdquo algorithm. What is more interesting is that we can generalize a unified trellis decoding algorithm for LDPC and turbo codes based on their trellis structures. We propose a reconfigurable computation kernel for log-MAP decoding of LDPC and turbo codes at a cost of ~15% hardware overhead. Small lookup tables (LUTs) with 9 entries of 2-bit data are designed to implement the log-MAP algorithm. Fixed point (6:2) simulation results show that there is negligible or nearly no performance loss by using this LUT approximation compared to the ideal case. The proposed architecture results in scalable and flexible datapath units enabling parallel decoding of LDPC/turbo codes.","PeriodicalId":173371,"journal":{"name":"2008 IEEE Workshop on Signal Processing Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130518841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}