Automated Retargeting of Sequential Imaging Software to Parallel Execution

The large body of image and video processing software for embedded devices is written almost exclusively in C. This lends itself to traditional compilation for general purpose processor and DSPs. However, new, more efficient and higher performance parallel architectures offer the thread-level and data-parallel execution needed to fully exploit the inherent parallelism in these applications. Harnessing the computational power of these platforms requires not only new compilation techniques for new software development, but also retargeting the significant sequential code assets that were originally written for older architectures. The challenge is exposing the parallelism inherent in existing image and video processing applications that is obscured by artifacts imposed by sequential programming languages.


This research explores reverse engineering and retargeting compilation techniques to realize the potential of new architectural advances, particularly for embedded imaging applications.

  1. Parallelism Estimation: Dynamic analysis techniques have been developed to estimate potential parallelism before undertaking the process of retargeting; it characterizes the types of parallelism that are inherent in a given program to estimate the potential benefit of retargeting to a hybrid parallel mechanism processor. This technique suggests the type of parallelization strategy to use in retargeting and helps identify requirements for the processor that best exploits the parallelism.

  2. DLP, ILP, TLP, and serial fractions for MediaBench and SPEC applications.

  3. Automated Retargeting: this research focuses on techniques for automatically extracting parallelism from sequential image processing software and retargeting the code to data parallel execution -- ranging from multimedia extensions (such as SSE2) for general-purpose and DSP architectures to new massively parallel SIMD architectures. It uses a recognition-based approach for automatically extracting a data parallel program model from sequential image processing code and retargeting it to data parallel execution mechanisms. For example, this technique has been successfully applied to production image processing programs which are part of the Texas Instruments IMGLIB suite of applications for the TI TMS320C62xx line of DSPs. The retargeted applications yield a potential execution throughput limited only by the number of processing elements, exceeding thousands of instructions per cycle in massively parallel implementations.

  4. DLP, ILP, TLP, and serial fractions for MediaBench and SPEC applications.

  5. Retargeting Compilation to Varying Processor Grain Sizes: In embedded image processing systems where significant architectural variation is possible, the same software application can change drastically based on architectural variation. We've developed a method to compile a single high-level source, given a fundamental variation in data-parallel target architectures -- a wide range of processor granularities from a single processor to a massively parallel processor array. In image-processing applications, the grain size of the processing elements determines the number of pixels that are mapped to each processing element, which is called the pixels per processing element (PPE) ratio. Our approach uses single PPE virtualization, which supports pixel-level data-parallel expressions that operate on a virtual one pixel per processing element (PPE) network and applies pixel-locating transformations to retarget the code into a given target PPE.

Publications: