{"title":"High-Throughput Fixed-Point Object Detection on FPGAs","authors":"Xiaoyin Ma, W. Najjar, A. Roy-Chowdhury","doi":"10.1109/FCCM.2014.40","DOIUrl":null,"url":null,"abstract":"Computer vision applications make extensive use of floating-point number representation, both single and double precision. The major advantage of floating-point representation is the very large range of values that can be represented with a limited number of bits. Most CPU, and all GPU designs have been extensively optimized for short latency and high-throughput processing of floating-point operations. On an FPGA, the bit-width of operands is a major determinant of its resource utilization, the achievable clock frequency and hence its throughput. By using a fixed-point representation with fewer bits, an application developer could implement more processing units and a higher-clock frequency and a dramatically larger throughput. However, smaller bit-widths may lead to inaccurate or incorrect results. Object and human detection are fundamental problems in computer vision and a very active research area. In these applications a high throughput and an economy of resources are highly desirable features allowing the applications to be embedded in mobile or fielddeployable equipment. The Histogram of Oriented Gradients (HOG) algorithm [1], developed for human detection and expanded to object detection, is one of the most successful and popular algorithm in its class. In this algorithm, object descriptors are extracted from detection window with grids of overlapping blocks. Each block is divided into cells in which histograms of intensity gradients are collected as HOG features. Vectors of histograms are normalized and passed to a Support Vector Machine (SVM) classifier to recognize a person or an object.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2014.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Computer vision applications make extensive use of floating-point number representation, both single and double precision. The major advantage of floating-point representation is the very large range of values that can be represented with a limited number of bits. Most CPU, and all GPU designs have been extensively optimized for short latency and high-throughput processing of floating-point operations. On an FPGA, the bit-width of operands is a major determinant of its resource utilization, the achievable clock frequency and hence its throughput. By using a fixed-point representation with fewer bits, an application developer could implement more processing units and a higher-clock frequency and a dramatically larger throughput. However, smaller bit-widths may lead to inaccurate or incorrect results. Object and human detection are fundamental problems in computer vision and a very active research area. In these applications a high throughput and an economy of resources are highly desirable features allowing the applications to be embedded in mobile or fielddeployable equipment. The Histogram of Oriented Gradients (HOG) algorithm [1], developed for human detection and expanded to object detection, is one of the most successful and popular algorithm in its class. In this algorithm, object descriptors are extracted from detection window with grids of overlapping blocks. Each block is divided into cells in which histograms of intensity gradients are collected as HOG features. Vectors of histograms are normalized and passed to a Support Vector Machine (SVM) classifier to recognize a person or an object.
计算机视觉应用广泛使用浮点数表示,包括单精度和双精度。浮点表示法的主要优点是可以用有限的位数表示非常大的值范围。大多数CPU和所有GPU设计都针对浮点操作的短延迟和高吞吐量处理进行了广泛优化。在FPGA上,操作数的位宽是其资源利用率、可实现时钟频率以及吞吐量的主要决定因素。通过使用具有更少位的定点表示,应用程序开发人员可以实现更多的处理单元、更高的时钟频率和更大的吞吐量。但是,较小的位宽可能导致不准确或错误的结果。物体和人的检测是计算机视觉的基本问题,也是一个非常活跃的研究领域。在这些应用中,高吞吐量和资源经济性是非常理想的特性,允许应用程序嵌入移动或现场可部署的设备中。面向梯度直方图(Histogram of Oriented Gradients, HOG)算法[1]是针对人体检测而开发并扩展到目标检测的算法,是同类算法中最成功、最流行的算法之一。在该算法中,从具有重叠块网格的检测窗口中提取目标描述符。每个块被分成若干个单元,在这些单元中收集强度梯度直方图作为HOG特征。直方图的向量被归一化并传递给支持向量机(SVM)分类器来识别一个人或一个物体。