Award Abstract #0702386
Program Generation for Parallel Platforms
NSF Org: |
CCF
Division of Computer and Communication Foundations
|
|
|
Initial Amendment Date: |
May 29, 2007 |
|
Latest Amendment Date: |
March 27, 2008 |
|
Award Number: |
0702386 |
|
Award Instrument: |
Continuing grant |
|
Program Manager: |
Almadena Y. Chtchelkanova
CCF Division of Computer and Communication Foundations
CSE Directorate for Computer & Information Science & Engineering
|
|
Start Date: |
June 1, 2007 |
|
Expires: |
May 31, 2010 (Estimated) |
|
Awarded Amount to Date: |
$375000 |
|
Investigator(s): |
Markus Pueschel pueschel@ece.cmu.edu (Principal Investigator)
Franz Franchetti (Co-Principal Investigator)
|
|
Sponsor: |
Carnegie-Mellon University
5000 Forbes Avenue
PITTSBURGH, PA 15213 412/268-8746
|
|
NSF Program(s): |
ADVANCED COMP RESEARCH PROGRAM
|
|
Field Application(s): |
0000912 Computer Science
|
|
Program Reference Code(s): |
HPCC, 9218
|
|
Program Element Code(s): |
4080
|
ABSTRACT
The clock speed of microprocessors has finally reached its practical limits. Future performance gains will only be obtained through various forms of parallelism such as integrating multiple CPU cores on one chip: The area of mainstream parallelism has started. This will pose an enormous burden on the developers of high performance libraries. Optimal code has to be carefully tuned to every specific platform including its memory hierarchy, special instruction sets, and the forms of parallelism it provides. This time-consuming process is repeated for every new platform released. It is time to ask the question: Can computers write these libraries for us?
The goal of this research is to develop a program generation system that completely
automates the implementation and optimization of a large class of performance-critical library functionality. This class will at least include linear transforms, a set of dense linear algebra problems, correlation, a set of decoders, and numerical integration. The program generation system will produce code that is optimized to a computer's memory hierarchy and that is parallelized, if required, for vector architectures, shared or distributed memory parallelism, or even streaming parallelism in graphics processing units (GPUs), or a for a combination of those. The performance of the generated code should be competitive with the best hand-written code available. "Program generation" means that the system takes as input only the problem specification. In other words, the computer itself writes highly optimized and, if desired, already parallelized source code. To achieve this, the knowledge about alternative algorithms and about algorithm optimization has to be formalized in a way that it can be done by the computer. In summary, the goal is to enable computers to write very fast libraries for well-understood numerical functionality and for a wide range of parallel platforms.
Please report errors in award information by writing to: awardsearch@nsf.gov.
|