Object detection applications often require the algorithms to execute on embedded processing platforms, such as multiprocessor SoCs. One way these algorithms can search input images for objects- of-interest is by consulting a detection library that contains a list of features describing the objects. The processing of large volumes of image data and consultation with a library can decrease the performance of processing platforms, as contention for cacheable resources leads to varied data locality and reuse: software- based techniques have been investigated in the literature with varied success. This paper addresses this issue head-on through a novel hardware accelerator designed to overcome the disadvantages of shared resources contention while optimizing on-chip memory consumption. Detection libraries are compressed and stored on- chip within the accelerator that decompresses the data and writes it to dedicated dual-port memories ensuring optimal library data locality and reuse for all processors. By allowing the accelerator to manipulate library data, application performance can be improved by reducing the computation carried out by processors. Our evaluation revealed that by eliminating contention within caches, the application performance was drastically improved without over-consuming on-chip resources or power.
- object detection
- detection library