2017 IEEE International Workshop on Signal Processing Systems (SiPS)最新文献

筛选
英文 中文
Task-based execution of synchronous dataflow graphs for scalable multicore computing 基于任务的同步数据流图执行,用于可扩展的多核计算
2017 IEEE International Workshop on Signal Processing Systems (SiPS) Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8110023
Georgios Georgakarakos, Sudeep Kanur, J. Lilius, K. Desnos
{"title":"Task-based execution of synchronous dataflow graphs for scalable multicore computing","authors":"Georgios Georgakarakos, Sudeep Kanur, J. Lilius, K. Desnos","doi":"10.1109/SiPS.2017.8110023","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110023","url":null,"abstract":"Dataflow models of computation have early on been acknowledged as an attractive methodology to describe parallel algorithms, hence they have become highly relevant for programming in the current multicore processor era. While several frameworks provide tools to create dataflow descriptions of algorithms, generating parallel code for programmable processors is still sub-optimal due to the scheduling overheads and the semantics gap when expressing parallelism with conventional programming languages featuring threads. In this paper we propose an optimization of the parallel code generation process by combining dataflow and task programming models. We develop a task-based code generator for PREESM, a dataflow-based prototyping framework, in order to deploy algorithms described as synchronous dataflow graphs on multicore platforms. Experimental performance comparison of our task generated code against typical thread-based code shows that our approach removes significant scheduling and synchronization overheads while maintaining similar (and occasionally improving) application throughput.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125538206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Customizing fixed-point and floating-point arithmetic — A case study in K-means clustering 自定义定点和浮点算法- K-means聚类的案例研究
2017 IEEE International Workshop on Signal Processing Systems (SiPS) Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8109980
Benjamin Barrois, O. Sentieys
{"title":"Customizing fixed-point and floating-point arithmetic — A case study in K-means clustering","authors":"Benjamin Barrois, O. Sentieys","doi":"10.1109/SiPS.2017.8109980","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109980","url":null,"abstract":"This paper presents a comparison between custom fixed-point (FxP) and floating-point (FlP) arithmetic, applied to bidimensional K-means clustering algorithm. After a discussion on the K-means clustering algorithm and arithmetic characteristics, hardware implementations of FxP and FlP arithmetic operators are compared in terms of area, delay and energy, for different bitwidth, using the ApxPerf2.0 framework. Finally, both are compared in the context of K-means clustering. The direct comparison shows the large difference between 8-to-16-bit FxP and FlP operators, FlP adders consuming 5–12 χ more energy than FxP adders, and multipliers 2–10χ more. However, when applied to K-means clustering algorithm, the gap between FxP and FlP tightens. Indeed, the accuracy improvements brought by FlP make the computation more accurate and lead to an accuracy equivalent to FxP with less iterations of the algorithm, proportionally reducing the global energy spent. The 8-bit version of the algorithm becomes more profitable using FlP, which is 80% more accurate with only 1.6 χ more energy. This paper finally discusses the stake of custom FlP for low-energy general-purpose computation, thanks to its ease of use, supported by an energy overhead lower than what could have been expected.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121527467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Obtaining an optimal set of head-related transfer functions with a small amount of measurements 用少量的测量获得一组最优的头部相关传递函数
2017 IEEE International Workshop on Signal Processing Systems (SiPS) Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8110008
Mikko Parviainen, Pasi Pertilä
{"title":"Obtaining an optimal set of head-related transfer functions with a small amount of measurements","authors":"Mikko Parviainen, Pasi Pertilä","doi":"10.1109/SiPS.2017.8110008","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110008","url":null,"abstract":"This article presents a method to obtain personalized Head-Related Transfer Functions (HRTFs) for creating virtual soundscapes based on small amount of measurements. The best matching set of HRTFs are selected among the entries from publicly available databases. The proposed method is evaluated using a listening test where subjects assess the audio samples created using the best matching set of HRTFs against a randomly chosen set of HRTFs from the same location. The listening test indicates that subjects prefer the proposed method over random set of HRTFs.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132613771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CRN-based design methodology for synchronous sequential logic 基于crn的同步顺序逻辑设计方法
2017 IEEE International Workshop on Signal Processing Systems (SiPS) Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8109979
Zhiwei Zhong, Lulu Ge, Ziyuan Shen, X. You, Chuan Zhang
{"title":"CRN-based design methodology for synchronous sequential logic","authors":"Zhiwei Zhong, Lulu Ge, Ziyuan Shen, X. You, Chuan Zhang","doi":"10.1109/SiPS.2017.8109979","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109979","url":null,"abstract":"With the aid of a storage-release mechanism named key-keysmith, an implementation approach based on chemical reaction networks (CRNs) for synchronous sequential logic is proposed. This design approach, which stores logic information in keysmith and releases it through key, primarily focuses on the underlying state transitions behind the required logic rather than the electronic circuit representation. Therefore, it can be uniformly and easily employed to implement any synchronous sequential logic with molecular reactions. Theoretical analysis and numerical simulations have demonstrated the robustness and universality of the proposed approach.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134310204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Processing LSTM in memory using hybrid network expansion model 使用混合网络扩展模型处理内存中的LSTM
2017 IEEE International Workshop on Signal Processing Systems (SiPS) Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8110011
Yu Gong, Tingting Xu, Bo Liu, Wei-qi Ge, Jinjiang Yang, Jun Yang, Longxing Shi
{"title":"Processing LSTM in memory using hybrid network expansion model","authors":"Yu Gong, Tingting Xu, Bo Liu, Wei-qi Ge, Jinjiang Yang, Jun Yang, Longxing Shi","doi":"10.1109/SiPS.2017.8110011","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110011","url":null,"abstract":"With the rapidly increasing applications of deep learning, LSTM-RNNs are widely used. Meanwhile, the complex data dependence and intensive computation limit the performance of the accelerators. In this paper, we first proposed a hybrid network expansion model to exploit the finegrained data parallelism. Based on the model, we implemented a Reconfigurable Processing Unit(RPU) using Processing In Memory(PIM) units. Our work shows that the gates and cells in LSTM can be partitioned to fundamental operations and then recombined and mapped into heterogeneous computing components. The experimental results show that, implemented on 45nm CMOS process, the proposed RPU with size of 1.51 mm2 and power of 413 mw achieves 309 GOPS/W in power efficiency, and is 1.7 χ better than state-of-the-art reconfigurable architecture.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133414533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Successive cancellation decoder for very long polar codes 超长极性码的连续对消解码器
2017 IEEE International Workshop on Signal Processing Systems (SiPS) Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8110022
B. Gal, Camille Leroux, C. Jégo
{"title":"Successive cancellation decoder for very long polar codes","authors":"B. Gal, Camille Leroux, C. Jégo","doi":"10.1109/SiPS.2017.8110022","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110022","url":null,"abstract":"Polar codes are a family of error correcting codes that achieves the symmetric capacity of memoryless channels when the code length N tends to infinity. However, moderate code lengths are required in most of wireless digital applications to limit the decoding latency. In some other applications, such as optical communications or quantum key distribution, the latency introduced by very long codes is not an issue. The main challenge is to design codes with the best error correction capability, a tractable complexity and a high throughput. In such a context, SC decoding is an interesting solution because its performance improves with N while the computational complexity scales almost linearly. In this paper, we propose to improve the scalability of SC decoders thanks to four architectural optimizations. The resulting SC decoder is implemented on an FPGA device and favorably compares with state-of-the-art scalable SC decoders. Moreover, a 222 polar code SC decoder is implemented on a Stratix-5 FPGA. This code length is twice larger than the ones achieved in previous works. To the best of our knowledge, this is the first architecture for which a N = 4 million bits polar code can be actually decoded on a reconfigurable circuit.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128100582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Statistical analysis of Post-HEVC encoded videos 后hevc编码视频的统计分析
2017 IEEE International Workshop on Signal Processing Systems (SiPS) Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8110020
A. Jallouli, Fatma Belghith, M. A. B. Ayed, W. Hamidouche, J. Nezan, N. Masmoudi
{"title":"Statistical analysis of Post-HEVC encoded videos","authors":"A. Jallouli, Fatma Belghith, M. A. B. Ayed, W. Hamidouche, J. Nezan, N. Masmoudi","doi":"10.1109/SiPS.2017.8110020","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110020","url":null,"abstract":"The Post-HEVC is the emerging video coding standard beyond the High Efficiency Video Coding (HEVC) standard. It is more complex in transformation and prediction steps but it offers the opportunity of 3D and 360° videos coding and compression. This paper presents different statistical analyzes of Post-HEVC encoded videos especially analysis on 1D and 2D transformation types and analysis on intra and inter prediction types of some test videos for different classes and resolutions. Analyzes are carried out at the decoder level where the coding decision has already been taken by the encoder. Results show that the choice of transformation (type and size) and the prediction type (intra or inter) depends on the nature of video: motion and texture. This work can be considered as a milestone for proposing intelligent algorithms based on video characteristics to perform fast decision in the Post-HEVC encoding process.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114180474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Low complexity hardware accelerator for nD FastICA based on coordinate rotation 基于坐标旋转的nD - FastICA低复杂度硬件加速器
2017 IEEE International Workshop on Signal Processing Systems (SiPS) Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8110000
Swati Bhardwaj, Shashank Raghuraman, A. Acharyya
{"title":"Low complexity hardware accelerator for nD FastICA based on coordinate rotation","authors":"Swati Bhardwaj, Shashank Raghuraman, A. Acharyya","doi":"10.1109/SiPS.2017.8110000","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110000","url":null,"abstract":"This paper proposes a low complex hardware accelerator algorithmic modification for n-dimensional (nD) FastICA methodology based on Coordinate Rotation Digital Computer (CORDIC) to attain high computation speed. The most complex and time consuming update stage and convergence check required for computation of the nth weight vector are eliminated in the proposed methodology. Using the Gram-Schmidt Orthogonalization stage and normalization stage to calculate nth weight vector in an entirely sequential procedure of CORDIC-based FastICA results in a significant gain in terms of the computation time. The proposed methodology has been functionally verified and validated by applying it for separating 6D speech signals. It has been implemented on hardware using Verilog HDL and synthesized using UMC 180nm technology. The average improvement in computation time obtained by using the proposed methodology for 4D to 6D FastICA with 1024 samples, considering the minimum case of two iterations for nth stage, was found to be 98.79 %.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124309218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
FPGA implementation of object recognition processor for HDTV resolution video using sparse FIND feature 基于稀疏FIND特征的HDTV分辨率视频目标识别处理器的FPGA实现
2017 IEEE International Workshop on Signal Processing Systems (SiPS) Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8109993
Yuri Nishizumi, Go Matsukawa, K. Kajihara, T. Kodama, S. Izumi, H. Kawaguchi, C. Nakanishi, Toshio Goto, Takeo Kato, M. Yoshimoto
{"title":"FPGA implementation of object recognition processor for HDTV resolution video using sparse FIND feature","authors":"Yuri Nishizumi, Go Matsukawa, K. Kajihara, T. Kodama, S. Izumi, H. Kawaguchi, C. Nakanishi, Toshio Goto, Takeo Kato, M. Yoshimoto","doi":"10.1109/SiPS.2017.8109993","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109993","url":null,"abstract":"This paper describes FPGA implementation of object recognition processor for HDTV resolution 30 fps video using the Sparse FIND feature. Two-stage feature extraction processing by HOG and Sparse FIND, a highly parallel classification in the support vector machine (SVM), and a block-parallel processing for RAM access cycle reduction are proposed to perform a real time object recognition with enormous computational complexity. From implementation of the proposed architecture in the FPGA, it was confirmed that detection using the Sparse FIND feature was performed for HDTV images at 47.63 fps, on average, at 90 MHz. The recognition accuracy degradation from the original Sparse FIND-base object detection algorithm implemented on software was 0.5%, which shows that the FPGA system provides sufficient accuracy for practical use.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116800175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Robust compressed analysis using subspace-based dictionary for ECG telemonitoring systems 基于子空间字典的心电远程监护系统鲁棒压缩分析
2017 IEEE International Workshop on Signal Processing Systems (SiPS) Pub Date : 2017-10-01 DOI: 10.1109/SiPS.2017.8110016
Meng-Ya Tsai, Ching-Yao Chou, A. Wu
{"title":"Robust compressed analysis using subspace-based dictionary for ECG telemonitoring systems","authors":"Meng-Ya Tsai, Ching-Yao Chou, A. Wu","doi":"10.1109/SiPS.2017.8110016","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110016","url":null,"abstract":"To realize Electrocardiography (ECG) signals monitoring systems, compressive sensing (CS) is a new technique to reduce power of biosensors and data transmission. Instead of spending high complexity on reconstructing back to data domain to do signal analysis, compressed analysis (CA) exploits the data structure preserved by CS to directly analyze in the compressed domain. However, compressively-sensed signals contaminated by interference cause learning performance degradation. Meanwhile, traditional interference removal methods are developed for signals in data domain, which involve reconstruction. In this paper, we propose a new CA framework using pre-trained subspace-based dictionary to project interfered and compressed data onto the subspace with high learnability and low complexity. Through simulations, we show that our technique enables 5.64% improvements on accuracy of detection compared with conventional CA, and reduces 99% complexity compared with reconstructed analysis.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128676923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信