text-only page produced automatically by LIFT Text Transcoder Skip all navigation and go to page contentSkip top navigation and go to directorate navigationSkip top navigation and go to page navigation
National Science Foundation
Search  
Awards
design element
Search Awards
Recent Awards
Presidential and Honorary Awards
About Awards
Grant Policy Manual
Grant General Conditions
Cooperative Agreement Conditions
Special Conditions
Federal Demonstration Partnership
Policy Office Website


Award Abstract #0234293
SOFTWARE: ACR: Advanced Code Generation for Digital Signal Processing Algorithms


NSF Org: CCF
Division of Computer and Communication Foundations
divider line
divider line
Initial Amendment Date: February 13, 2003
divider line
Latest Amendment Date: June 2, 2005
divider line
Award Number: 0234293
divider line
Award Instrument: Continuing grant
divider line
Program Manager: Almadena Y. Chtchelkanova
CCF Division of Computer and Communication Foundations
CSE Directorate for Computer & Information Science & Engineering
divider line
Start Date: April 1, 2003
divider line
Expires: March 31, 2007 (Estimated)
divider line
Awarded Amount to Date: $463800
divider line
Investigator(s): Markus Pueschel pueschel@ece.cmu.edu (Principal Investigator)
Jose Moura (Co-Principal Investigator)
divider line
Sponsor: Carnegie-Mellon University
5000 Forbes Avenue
PITTSBURGH, PA 15213 412/268-8746
divider line
NSF Program(s): COMPUTING PROCESSES & ARTIFACT,
COMPILERS,
ADVANCED COMP RESEARCH PROGRAM
divider line
Field Application(s): 0000099 Other Applications NEC,
0000912 Computer Science
divider line
Program Reference Code(s): HPCC, 9251, 9216
divider line
Program Element Code(s): 7352, 7329, 4080

ABSTRACT

The goal of this research is to automatically generate implementations of digital signal processing (DSP) transform algorithms that are adapted to a given off-the-shelf computing platform, i.e., the generated code takes full advantage of architectural features and available instructions sets, including the memory hierarchy, multiple processors, vector instructions, fused multiply-add instructions, and prefetching instructions. We refer to this type of code as "advanced code'' for two reasons: (1) the code uses instructions that are

significantly beyond standard C or Fortran programming; and (2) manually writing such code requires a very high level of programming expertise including assembly level programming and a deep understanding of the platform's architecture. Our goal is ambitious: automatically generate code that competes with or outperforms expertly hand-tuned code for DSP algorithms.

Our approach is based on the following: (1) We represent each fast DSP

transform algorithm as a formula in a concise mathematical language. (2) We derive for each transform a very large number of structurally different fast algorithms or formulas. (3) These different formulas (i.e., algorithms) are efficiently enumerated and manipulated. (4) We automatically translate each formula into advanced code. (5) We optimize the implementation by searching over the space of alternative formulas and code implementations.

Step (4) is the key claim---the automation step---of our approach. In other words, the applicability of advanced instructions, including vector instructions, parallel threads, fused multiply-add instructions, and prefetching instructions, can be related to mathematical constructs in our formula notation. Step (5) provides the

tuning and adaptability of the code to the platform.

We will research the mapping from formulas to advanced code and formalize it in a way suitable for automatic code generation and optimization. We will develop a compiler that translates a formula description of a fast DSP algorithm, possibly augmented with parameters that control coding degrees of freedom, into advanced code. Finally, we will build an "Advanced Code Generator'' that interfaces our compiler with a formula generator and a search module to

automatically explore the space of structurally distinct fast algorithms and the space of advanced coding degrees of freedom. For a given DSP transform, our Advanced Code Generator automatically finds the formula, i.e., the algorithm, whose structure allows the best implementation given the available advanced instructions and given the computing platform.

In contradistinction to other approaches to advanced code generation, including approaches using optimizing C or Fortran compilers or manual coding, our Advanced Code Generator automates and integrates the optimization at the mathematical level of algorithms and the optimization at the implementation level of machine-specific instructions. This way the Advanced Code Generator has access to all structural information necessary to produce high quality code and

mimics a highly skilled programmer who understands and exploits the structure of algorithms and the given architecture to write platform-tuned advanced code. We believe that our research makes, for the performance-critical class of DSP algorithms, a pioneering step towards a solution for one of the present fundamental problems in software development for high performance scientific computing: How to develop, with reasonable effort, optimal and portable performance on increasingly complex and diverse off-the-shelf computing platforms.


PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

(Showing: 1 - 15 of 15).

Aca Gacic, Markus Pueschel, and Jose Moura.  "Automatically Generated High-Performance Code for Discrete Wavelet Transforms,"  Proc. Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP),  v.5,  2004,  p. 69.

Adam C. Zelinski, Markus Pueschel, Smarahara Misra, and James C. Hoe.  "Automatic Cost Minimization for Multiplierless Implementations of Discrete Signal Transforms,"  Proc. Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP),  v.5,  2004,  p. 221.

Franz Franchetti and Markus Pueschel.  "SIMD Vectorization of Non-Two-Power Sized FFTs,"  Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP),  2007, 

Franz Franchetti, Stefan Kral, Juergen Lorenz, Markus Pueschel, Christoph W. Ueberhuber, and Peter Wurzinger.  "Automatically Optimized FFT Codes for the BlueGene/L Supercomputer,"  Proc. High Performance Computing for Computational Science (VecPar), revised and selected paper, Lecture Notes in Computer Science,  v.3402,  2004,  p. 23.

Franz Franchetti, Yevgen Voronenko, and Markus Pueschel.  "Formal Loop Merging for Signal Transforms,"  Proc. Programming Language Design and Implementation (PLDI),  v.40(6),  2005,  p. 315.

Franz Franchetti, Yevgen Voronenko, and Markus Pueschel.  "FFT Program Generation for Shared Memory: SMP and Multicore,"  Proc. Supercomputing (SC),  2006, 

Markus Pueschel, Adam C. Zelinski, and James C. Hoe.  "Custom-Optimized Multiplierless Implementations of DSP Algorithms,"  Proc. International Conference on Computer-Aided Design (ICCAD),  2004,  p. 175.

Markus Pueschel, Jose Moura, Jeremy Johnson, David Padua, Manuela Veloso, Bryan Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nick Rizzolo.  "SPIRAL: Code Generation for DSP Transforms,"  Proceedings of the IEEE special issue on "Program Generation, Optimization, and Adaptation",  v.93(2),  2005,  p. 232.

Paolo D'Alberto, Markus Pueschel, and Franz Franchetti.  "Performance/Energy Optimization of DSP Transforms on the XScale Processor,"  Proc. International Conference on High Performance Embedded Architectures & Compilers (HiPEAC),  2007, 

Paolo D?Alberto, Franz Franchetti, Peter A. Milder, Aliaksei Sandryhaila, James C. Hoe, José M. F. Moura, and Markus Pueschel.  "Generating FPGA Accelerated DFT Libraries,"  Proc. IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM),  2007, 

Roland Wunderlich, Markus Pueschel, and James C. Hoe.  "Accelerating Blocked Matrix-Matrix Multiplication using a Software-Managed Memory Hierarchy with DMA,"  Proc. High Performance Embedded Computing (HPEC),  2005, 

Sung-Chul Han, Franz Franchetti, and Markus Pueschel.  "Program Generation for the All-Pairs Shortest Path Problem,"  Proc. Parallel Architectures and Compilation Techniques,  2006, 

Thammanit Pipatsrisawat, Aca Gacic, Franz Franchetti, Markus Pueschel, and Jose Moura.  "Performance Analysis of the Filtered Backprojection Image Reconstruction Algorithms,"  Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP),  v.5,  2005,  p. 153.

Yevgen Voronenko and Markus Pueschel.  "Automatic Generation of Implementations for DSP Transforms on Fused Multiply-Add Architectures,"  Proc. Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP),  v.5,  2004,  p. 101.

Yevgen Voronenko and Markus Pueschel.  "Multiplierless Multiple Constant Multiplication,"  ACM Transactions on Algorithms,  v.3(2),  2007, 


(Showing: 1 - 15 of 15).

 

Please report errors in award information by writing to: awardsearch@nsf.gov.

 

 

Print this page
Back to Top of page
  Web Policies and Important Links | Privacy | FOIA | Help | Contact NSF | Contact Web Master | SiteMap  
National Science Foundation
The National Science Foundation, 4201 Wilson Boulevard, Arlington, Virginia 22230, USA
Tel: (703) 292-5111, FIRS: (800) 877-8339 | TDD: (800) 281-8749
Last Updated:
April 2, 2007
Text Only


Last Updated:April 2, 2007