Automated Retargeting of Sequential Imaging Software to Parallel Execution
The large body of image and video processing software for embedded
devices is written almost exclusively in C. This lends itself to
traditional compilation for general purpose processor and DSPs.
However, new, more efficient and higher performance parallel
architectures offer the thread-level and data-parallel execution
needed to fully exploit the inherent parallelism in these
applications. Harnessing the computational power of these platforms
requires not only new compilation techniques for new software
development, but also retargeting the significant sequential code
assets that were originally written for older architectures. The
challenge is exposing the parallelism inherent in existing image and
video processing applications that is obscured by artifacts imposed by
sequential programming languages.

This research explores reverse engineering and retargeting compilation
techniques to realize the potential of new architectural advances,
particularly for embedded imaging applications.
- Parallelism Estimation: Dynamic analysis techniques have been
developed to estimate potential parallelism before undertaking the
process of retargeting; it characterizes the types of parallelism
that are inherent in a given program to estimate the potential
benefit of retargeting to a hybrid parallel mechanism
processor. This technique suggests the type of parallelization
strategy to use in retargeting and helps identify requirements for
the processor that best exploits the parallelism.

DLP, ILP, TLP, and serial fractions for MediaBench and SPEC applications.
- Automated Retargeting: this research focuses on
techniques for automatically extracting parallelism from sequential
image processing software and retargeting the code to data parallel
execution -- ranging from multimedia extensions (such as SSE2) for
general-purpose and DSP architectures to new massively parallel SIMD
architectures. It uses a recognition-based approach for
automatically extracting a data parallel program model from
sequential image processing code and retargeting it to data parallel
execution mechanisms. For example, this technique has been
successfully applied to production image processing programs which
are part of the Texas Instruments IMGLIB suite of applications for
the TI TMS320C62xx line of DSPs. The retargeted applications yield
a potential execution throughput limited only by the number of
processing elements, exceeding thousands of instructions per cycle
in massively parallel implementations.

DLP, ILP, TLP, and serial fractions for MediaBench and SPEC applications.
- Retargeting Compilation to Varying Processor Grain Sizes:
In embedded image processing systems where significant architectural
variation is possible, the same software application can change
drastically based on architectural variation. We've developed a
method to compile a single high-level source, given a fundamental
variation in data-parallel target architectures -- a wide range of
processor granularities from a single processor to a massively
parallel processor array. In image-processing applications, the
grain size of the processing elements determines the number of
pixels that are mapped to each processing element, which is called
the pixels per processing element (PPE) ratio. Our approach uses
single PPE virtualization, which supports pixel-level data-parallel
expressions that operate on a virtual one pixel per processing
element (PPE) network and applies pixel-locating transformations to
retarget the code into a given target PPE.
Publications:
-
L. Baumstark and L. M. Wills, Multidimensional Dataflow-based Parallelization for Multimedia Instruction Set Extensions, Proceedings of the 3rd International Workshop on Embedded Computing, Columbus, Ohio, held in conjunction with the 2006 International Conference on Parallel Processing (ICPP-06), August 2006.
-
T. C. Huang, L. M. Wills, R. Melton and C. Alford, Predicting Communication Protocol Performance on Superscalar Architectures using Instruction Dependency, Performance Evaluation, Vol. 63, No. 9-10, pp. 939-955, October 2006.
-
S. Sander and L. M. Wills, Retargeting Image-Processing Algorithms to Varying Processor Grain Sizes, Proceedings of the 3rd International Workshop on Embedded Computing, Columbus, Ohio, held in conjunction with the 2006 International Conference on Parallel Processing (ICPP-06), August 2006.
-
L. Baumstark and L. M. Wills, Dynamic Estimation of Data-Level Parallelism in Nested Loop Structures, Proceedings of the 1st Workshop on Program Comprehension through Dynamic Analysis (PCODA 2005), pp. 28-32, Pittsburgh, Pennsylvania, co-located with the 12th Working Conference on Reverse Engineering, November 2005.
-
L. Baumstark and L. M. Wills, Retargeting Sequential Image-Processing Programs for Data Parallel Execution, IEEE Transactions on Software Engineering, Vol. 31, No. 2, pp. 116-136, February 2005.
-
L. Baumstark, M. Guler and L. M. Wills, Extracting an Explicitly Data-Parallel Representation of Image-Processing Programs, Proceedings of the 10th Working Conference on Reverse Engineering (WCRE), IEEE Computer Society Press, pp. 24-34, Victoria, British Columbia, Canada, November 2003.
-
L. M. Wills, T. Taha, L. Baumstark and S. Wills, Estimating Potential Parallelism for Platform Retargeting, Proceedings of the 9th Working Conference on Reverse Engineering (WCRE), IEEE Computer Society Press, pp. 55-64, Richmond, Virginia, October 2002.
-
L. Baumstark and L. M. Wills, Exposing Data-Level Parallelism in Sequential Image Processing Algorithms, Proceedings of the 9th Working Conference on Reverse Engineering (WCRE), IEEE Computer Society Press, pp. 245-254, Richmond, Virginia, October 2002.
-
R. Janka, L. M. Wills and L. Baumstark, Virtual Benchmarking and Model Continuity in Prototyping Embedded Multiprocessor Signal Processing Systems, IEEE Transactions on Software Engineering, Vol. 28, No. 9, pp. 832-846, September 2002.
-
R. Janka and L. M. Wills, Specification and Synthesis of Real-Time Embedded Distributed and Parallel Multiprocessor-based Signal Processing Systems, Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES-2000), pp. 72-80, San Jose, California, November 2000.
-
R. Janka and L. M. Wills, Combining Virtual Benchmarking with Rapid System Prototyping for Real-Time Embedded Multiprocessor Signal Processing System Codesign, Proceedings of the 11th IEEE International Workshop on Rapid System Prototyping (RSP 2000), pp. 20-25, Paris, France, June 2000.
-
R. Janka and L. M. Wills, A Novel Codesign Methodology for Real-Time Embedded COTS Multiprocessor-Based Signal Processing Systems, Proceedings of the 8th International Workshop on Hardware/Software Co-design (CODES 2000), pp. 157-161, Mission Bay, San Diego, California, May 2000.