DEPARTMENT OF ENERGY For more information about the Office of Science, go to Office of Science |
To DOE National Laboratories LAB 07-23
Operating and Runtime Systems
SUMMARY: The Office of Advanced Scientific Computing Research (ASCR) of the Office of Science
(SC), U.S. Department of Energy (DOE), hereby announces its interest in receiving proposals
for research in Operating and Runtime Systems for Extreme Scale Scientific Computation
(FASTOS). This announcement is focused on research and development of operating and
runtime systems which enable the effective management and use of extreme-scale systems
(petascale and beyond) for scientific computation. The overall goal of this announcement is to
stimulate research and development related to operating and runtime systems for petascale
systems the in 2010 to 2015 timeframe. It is likely that these systems will include a
combination of commodity and custom components, with different systems reflecting
different degrees of customization. Operating and runtime systems research must be driven
from the needs of current and future applications, and the primary focus is on supporting the
needs of existing and anticipated SC and other DOE applications. An ultimate goal would be
the development of a unified operating and runtime system that could fully support and
exploit petascale and beyond systems and autonomously adapt to meet specific application
needs for performance, functionality, security, and fault tolerance. The activities supported by
this notice may be a combination of basic research, development, prototyping, and testing.
Partnerships among universities, National Laboratories, and industry are encouraged.
PREPROPOSAL DUE DATE: April 6, 2007, 4:30 pm, Eastern Time
Potential researchers are required to submit a two-page preproposal by email to
fjohnsonr@ascr.doe.gov. Preproposals must be received by April 6, 2007, 4:30 p.m.,
Eastern Time. The subject line of the email should be: "FASTOS Preproposal". The
preproposal should be a Word file attached to the email, having 1 inch margins when printed.
No FAX or mail submission of preproposals will be accepted.
Preproposals will be reviewed for conformance with the guidelines and technical areas
specified in this announcement. A response to preproposals encouraging or discouraging
formal proposals will be communicated to the proposers by April 13, 2007. Proposers who
have not received a response regarding the status of their preproposal by this date are
responsible for contacting the program to confirm their status.
Preproposals should consist of no more than two pages total. This narrative should give the
project title and describe the research objectives, the technical approach(s), and all proposed
team members and their expertise. It should also include a rough estimate of the planned
budget request. The intent in requesting a preproposal is to save the time and effort of
researchers in preparing and submitting a formal project proposal that may be inappropriate for
the program. Preproposals also assist ASCR in planning the peer review process and the
selection of potential reviewers for the proposal. Formal proposals will be accepted only
from preprosers encouraged to submit a formal proposal.
PROPOSAL DUE DATE: June 11, 2007, 8:00 pm, Eastern Time
Full proposals submitted in response to this Announcement must be submitted to the DOE
Electronic Proposal Management Application (ePMA) system (
https://epma.doe.gov) no later than 8:00 p.m., Eastern Time, June 11, 2007, to be
accepted for merit review and to permit timely consideration for award in Fiscal Year 2008.
It is important that the entire peer reviewable proposal be submitted to the ePMA system
as a single PDF file attachment.
In order to expedite the review process, it is essential to also submit via email a single PDF
file of the entire LAB proposal and FWP addressed to Dr. Frederick Johnson at:
fjohnson@ascr.doe.gov. Please use "FASTOS Proposal" as the subject of the email.
To identify that the FWP is responding to this program announcement, please fill in the
following fields in the "ePMA Create Proposal Admin Information" screen as shown:
* Please use the wording shown when filling in these fields to identify that the FWP
is responding to this Program Announcement.
DOE National Laboratories should submit using ePMA as instructed above. Researchers from
other Federal agencies and Non-DOE Federally Funded Research and Development Centers
(FFRDCs) should follow the format at
http://www.science.doe.gov/grants/fed_prop.html and submit via email as stated above.
FOR FURTHER INFORMATION CONTACT:
Operating and runtime systems provide mechanisms to manage system hardware and software
resources for the efficient execution of large scale scientific applications. They are essential to
the success of both large scale systems and complex applications. By the end of this decade
petascale computers with thousands of times more computational power than any in current
use will be vital tools for expanding the frontiers of science and for addressing vital National
priorities. These systems will have tens to hundreds of thousands of processors, an
unprecedented level of complexity, and will require significant new levels of scalability and
fault management. The overwhelming size and complexity of such systems poses deep
technical challenges that must be overcome to fully exploit their potential for scientific
discovery. Applications require multiple services from OS/R layers, including: resource
management and scheduling, fault-management (detection, prediction, recovery, and
reconfiguration), configuration management, and file systems access and management.
Current and future large scale parallel systems require that such services be implemented in a
fast and scalable manner so that the OS/R does not become a performance bottleneck. The
current trend in large scale scientific systems is to leverage operating systems developed for
other areas of computing - operating systems that were not specifically designed for large
scale, parallel computing platforms. Unix, Linux and other Unix derivatives are the most
popular OS's in use for high end scientific computing, and these all reflect a technological
heritage nearly 30-years old with few fundamental mechanisms to support parallel systems.
Example Research Topics
Operating and runtime systems provide the glue that bind running applications to hardware.
The research activities supported by this activity need to bridge the gap between new
languages and/or programming models and next-generation hardware, including interactions
with novel architectures. Consequently, there are a wide variety of research topics that are
appropriate for this effort. A brief listing of candidate topics is provided below, but research
in other relevant areas and combinations of areas is encouraged:
Virtualization. Virtualization is expected to play an increasingly important role in the
deployment of large scale systems, enabling multiple operating systems on a single platform
and application specific operating systems. Virtualization includes the development and use
of hypervisors, virtual machine monitors, and application/runtime virtualization for HPC
systems. Specific topics of interest include: identification and quantification of problems with
current hypervisors in HPC systems, novel uses of hypervisors in HPC systems (development,
porting, etc), support for fault handling, better support for custom hardware, and lightweight
mechanisms for virtual resources.
Fault Handling. As the number of components in a system increase from tens to hundreds of
thousands, these systems will have significantly reduced mean time between interrupt (MTI).
Mechanisms to support application resiliency in the face of hardware faults are needed to
support long running applications. Specific topics of interest include: tradeoffs associated with
handling failures at different layers (application, runtime, OS); understanding and identifying
sources of faults; approaches to proactive fault handling; fault tolerance for alternate (non-
MPI) programming models; languages/APIs for the bi-directional communication of fault
information between layers (e.g., between the application and runtime layers); quantification
of scalability issues; automatic, transparent, and efficient checkpoint/restart; and
checkpointing when disks are far away.
OS Noise/Interference. Operating system interference or noise due to asynchronous
overhead needed to implement system services, has been shown to have a significant impact
on application performance on very large scale systems. Measurement and understanding the
impact of OS interference on application performance at scale will be critical to the successful
deployment of very large scale systems. Specific topics of interest include: OS design
strategies for dealing with OS noise (e.g., implementations of critical services that minimize
related noise and alternatives for timeouts and/or periodic service requirements); hardware
features to control the impact of noise (e.g., hardware support for low overhead barriers);
strategies to mitigate the impact of OS noise (e.g., exploiting asynchrony).
Exposing Resources. Bidirectional APIs to expose system information (performance
counters) and to select implementations are critical for application level adaptability (need
information about what is being used and may need to select alternate implementations).
Specific topics of interest include: hooks for controlling resources; interfaces to allow code to
query hardware characteristics; exposing communication related resources.
Resource Management. Managing the local and global resources provided by a computing
system is a fundamental responsibility of any operating system, and exploration of policies
and mechanisms for resource management is especially critical for petascale systems. Specific
topics of interest include: local resource management (memory management, processor
scheduling (multi-core), and communication support); interfaces between local and external
components (gang scheduling, virtual memory reservations and queries); support for alternate
(non-MPI) programming models (e.g., UPC); OS service coordination (load balancing at scale,
global memory management, topology aware mapping of work- and data-units);
heterogeneous resource management (HW and SW); and power management.
Adaptability. The ability of runtime and operating systems to change their behaviors based
on application needs to improve performance or tolerate faults needed to support the use of
petascale systems. Specific topics of interest include: measurement and strategies to support
adaptation; understanding and exploiting application phases; adapting collective
communication components; and APIs to expose resource performance models and
information
Performance Measurement. Petascale systems will require models and tools to measure
system performance, including hooks for application level performance monitoring; tools to
measure runtime/OS performance; performance models (define what needs to be measured);
and scalability
System Management/Administration. Several issues related to overall system
administration need to be addressed, including: usage models (space/time sharing); flexible
space-sharing; changing processors allocated to running jobs; single system image issues to
ease system management number of system administrators should not scale with the size of
the system; node allocation; power management; software distribution; and RAS and RAS
interfaces
Parallel I/O: Efficient communication with external storage servers and parallel file systems
is an essential component of a petascale system. Topics of interest include:
support for high performance access to external servers, efficient, scalable I/O call forwarding,
portable I/O models which support diverse storage instantiations, and parallel file systems.
Community building
An important goal of this notice is to foster the development of an active research community
in operating systems and runtime environments for high end systems. In order to meet this
goal the following are mandatory requirements for awardees:
Testbed access
Proposals should provide a plan for utilizing leadership class systems at Oak Ridge National
Laboratory and Argonne National Laboratory and to systems at the National Energy Research
Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory for the
purpose of software testing at scale. Each proposal should contain a section which discusses
the characteristics of the test environments necessary for the research and identify the time
frames in which specific testbed support will be required. Only a relatively limited amount of
testing time will be available on these systems, and the individual testing plans will be used to
develop an overall test plan for the FASTOS program.
Program Funding
It is anticipated that up to $4 million annually will be available for multiple awards for this
program. Awards are planned to be made in Fiscal Year 2008, and proposals may request
project support for up to three years. All awards are contingent on the availability of funds
and programmatic needs. Annual budgets for successful projects are expected to range from
$500,000 to $1,000,000 per project although smaller projects of exceptional merit may be
considered. Annual budgets may increase in the out-years but should remain within the
overall annual maximum guidance.
References
FASTOS forum: http://www.cs.unm.edu/~fastos
Federal Plan for High-End Computing:
http://www.nitrd.gov/pubs/2004_hecrtf/20040702_hecrtf.pdf
GUIDE FOR PREPARATION OF SCIENTIFIC/TECHNICAL PROPOSALS TO BE SUBMITTED BY NATIONAL LABORATORIES Proposals from National Laboratories submitted to the Office of Science (SC) as a result of this program announcement will follow the Department of Energy Field Work Proposal process with additional information requested to allow for scientific/technical merit review. The following guidelines for content and format are intended to facilitate an understanding of the requirements necessary for SC to conduct a merit review of a proposal. Please follow the guidelines carefully, as deviations could be cause for declination of a proposal without merit review. 1. Evaluation Criteria Proposals will be subjected to formal merit review (peer review) and will be evaluated against the following criteria which are listed in descending order of importance:
2. Appropriateness of the proposed method or approach; 3. Competency of proposer's personnel and adequacy of proposed resources; and 4. Reasonableness and appropriateness of the proposed budget. The evaluation will include program policy factors such as the relevance of the proposed research to the terms of the announcement, the Department's programmatic needs, and quality of previous performance. External peer reviewers are selected with regard to both their scientific expertise and the absence of conflict-of-interest issues. Non-federal reviewers may be used, and submission of a proposal constitutes agreement that this is acceptable to the investigator(s) and the submitting institution. Proposals found to be scientifically meritorious and programmatically relevant will be selected in consultation with DOE selecting officials depending upon availability of funds in the DOE budget. The selected projects will be required to acknowledge support by DOE in all public communications of the research results. 2. Summary of Proposal Contents
2.1 Number of Copies to Submit A complete formal FWP in a single Portable Document Format (PDF) file must be submitted through the DOE ePMA system ( https://epma.doe.gov) as an attachment. To identify that the FWP is responding to this program announcement, please fill in the following fields in the "ePMA Create Proposal Admin Information" screen as shown:
Fiscal Year: Proposal Reason: Program Announcement Number: LAB 07-23* Program announcement Title: Operating and Runtime Systems for Extreme Scale Scientific Computation* Proposal Purpose: Estimated Proposal Begin Date: HQ Program Manager Organization: * Please use the wording shown when filling in these fields to identify that the FWP is responding to this Program Announcement. In order to expedite the review process, please submit via email a single PDF file of the entire LAB proposal and FWP. The email should be addressed to Dr. Frederick Johnson at: fjohnson@ascr.doe.gov. Please use "FASTOS Proposal" as the subject of the email. 3. Detailed Contents of the Proposal Adherence to type size and line spacing requirements is necessary for several reasons. No researcher should have the advantage, or by using small type, of providing more text in their proposals. Small type may also make it difficult for reviewers to read the proposal. Proposals must have 1-inch margins at the top, bottom, and on each side. Type sizes must be at least 11 point. Line spacing is at the discretion of the researcher but there must be no more than 6 lines per vertical inch of text. Pages should be standard 8 1/2" x 11" (or metric A4, i.e., 210 mm x 297 mm). 3.1 Field Work Proposal Format (Reference DOE O 412.1A) (DOE ONLY) The Field Work Proposal (FWP) is to be prepared and submitted consistent with policies of the investigator's laboratory and the local DOE Operations Office. Additional information is also requested to allow for scientific/technical merit review. Laboratories may submit proposals directly to the SC Program office listed above. A copy should also be provided to the appropriate DOE operations office. 3.2 Proposal Cover Page The following proposal cover page information may be placed on plain paper. No form is required.
SC Program announcement title Name of laboratory Name of principal investigator (PI) Position title of PI Mailing address of PI Telephone of PI Fax number of PI Electronic mail address of PI Name of official signing for laboratory* Title of official Fax number of official Telephone of official Electronic mail address of official Requested funding for each year; total request Use of human subjects in proposed project:
Signature of official, date of signature* *The signature certifies that personnel and facilities are available as stated in the proposal, if the project is funded.
Provide the initial page number for each of the sections of the proposal. Number pages consecutively at the bottom of each page throughout the proposal. Start each major section at the top of a new page. Do not use unnumbered pages and do not use suffices, such as 5a, 5b. 3.4 Budget and Budget Explanation A detailed budget is required for the entire project period and for each fiscal year. It is preferred that DOE's budget page, Form 4620.1 be used for providing budget information*. Modifications of categories are permissible to comply with institutional practices, for example with regard to overhead costs. A written justification of each budget item is to follow the budget pages. For personnel this should take the form of a one-sentence statement of the role of the person in the project. Provide a detailed justification of the need for each item of permanent equipment. Explain each of the other direct costs in sufficient detail for reviewers to be able to judge the appropriateness of the amount requested. Further instructions regarding the budget are given in section 4 of this guide. * Form 4620.1 is available at web site: http://www.science.doe.gov/grants/budgetform.pdf 3.5 Abstract Provide an abstract of less than 400 words. Give the project objectives (in broad scientific terms), the approach to be used, and what the research is intended to accomplish. State the hypotheses to be tested (if any). At the top of the abstract give the project title, names of all the investigators and their institutions, and contact information for the principal investigator, including e-mail address. 3.6 Narrative (main technical portion of the proposal, including background/introduction, proposed research and methods, timetable of activities, and responsibilities of key project personnel). The narrative comprises the research plan for the project and is limited to 20 pages (maximum). It should contain enough background material in the Introduction, including review of the relevant literature, to demonstrate sufficient knowledge of the state of the science. The major part of the narrative should be devoted to a description and justification of the proposed project, including details of the methods to be used. It should also include a timeline for the major activities of the proposed project, and should indicate which project personnel will be responsible for which activities. If any portion of the project is to be done in collaboration with another institution (or institutions), provide information on the institution(s) and what part of the project it will carry out. Further information on any such arrangements is to be given in the sections "Budget and Budget Explanation", "Biographical Sketches", and "Description of Facilities and Resources". 3.7 Literature Cited Give full bibliographic entries for each publication cited in the narrative. 3.8 Biographical Sketches This information is required for senior personnel at the institution submitting the proposal and at all subcontracting institutions (if any). The biographical sketch is limited to a maximum of two pages for each investigator. To assist in the identification of potential conflicts of interest or bias in the selection of reviewers, the following information must be provided in each biographical sketch.
Graduate and Postdoctoral Advisors and Advisees: A list of the names of the individual's own graduate advisor(s) and principal postdoctoral sponsor(s), and their current organizational affiliations. A list of the names of the individual's graduate students and postdoctoral associates during the past five years, and their current organizational affiliations. 3.9 Description of Facilities and Resources Facilities to be used for the conduct of the proposed research should be briefly described. Indicate the pertinent capabilities of the institution, including support facilities (such as machine shops), that will be used during the project. List the most important equipment items already available for the project and their pertinent capabilities. Include this information for each subcontracting institution (if any). 3.10 Other Support of Investigators Other support is defined as all financial resources, whether Federal, non-Federal, commercial, or institutional, available in direct support of an individual's research endeavors. Information on active and pending other support is required for all senior personnel, including investigators at collaborating institutions to be funded by a subcontract. For each item of other support, give the organization or agency, inclusive dates of the project or proposed project, annual funding, and level of effort (months per year or percentage of the year) devoted to the project. 3.11 Appendix Information not easily accessible to a reviewer may be included in an appendix, but do not use the appendix to circumvent the page limitations of the proposal. Reviewers are not required to consider information in an appendix, and reviewers may not have time to read extensive appendix materials with the same care they would use with the proposal proper. The appendix may contain the following items: up to five publications, manuscripts accepted for publication, abstracts, patents, or other printed materials directly relevant to this project, but not generally available to the scientific community; and letters from investigators at other institutions stating their agreement to participate in the project (do not include letters of endorsement of the project).
4. Detailed Instructions for the Budget 4.1 Salaries and Wages List the names of the principal investigator and other key personnel and the estimated number of person-months for which DOE funding is requested. Proposers should list the number of postdoctoral associates and other professional positions included in the proposal and indicate the number of full-time-equivalent (FTE) person-months and rate of pay (hourly, monthly or annually). For graduate and undergraduate students and all other personnel categories such as secretarial, clerical, technical, etc., show the total number of people needed in each job title and total salaries needed. Salaries requested must be consistent with the institution's regular practices. The budget explanation should define concisely the role of each position in the overall project. 4.2 Equipment DOE defines equipment as "an item of tangible personal property that has a useful life of more than two years and an acquisition cost of $25,000 or more." Special purpose equipment means equipment which is used only for research, scientific or other technical activities. Items of needed equipment should be individually listed by description and estimated cost, including tax, and adequately justified. Allowable items ordinarily will be limited to scientific equipment that is not already available for the conduct of the work. General purpose office equipment normally will not be considered eligible for support. 4.3 Domestic Travel The type and extent of travel and its relation to the research should be specified. Funds may be requested for attendance at meetings and conferences, other travel associated with the work and subsistence. In order to qualify for support, attendance at meetings or conferences must enhance the investigator's capability to perform the research, plan extensions of it, or disseminate its results. Consultant's travel costs also may be requested. 4.4 Foreign Travel Foreign travel is any travel outside Canada and the United States and its territories and possessions. Foreign travel may be approved only if it is directly related to project objectives. 4.5 Other Direct Costs The budget should itemize other anticipated direct costs not included under the headings above, including materials and supplies, publication costs, computer services, and consultant services (which are discussed below). Other examples are: aircraft rental, space rental at research establishments away from the institution, minor building alterations, service charges, and fabrication of equipment or systems not available off- the-shelf. Reference books and periodicals may be charged to the project only if they are specifically related to the research. a. Materials and Supplies The budget should indicate in general terms the type of required expendable materials and supplies with their estimated costs. The breakdown should be more detailed when the cost is substantial. b. Publication Costs/Page Charges The budget may request funds for the costs of preparing and publishing the results of research, including costs of reports, reprints page charges, or other journal costs (except costs for prior or early publication), and necessary illustrations. c. Consultant Services Anticipated consultant services should be justified and information furnished on each individual's expertise, primary organizational affiliation, daily compensation rate and number of days expected service. Consultant's travel costs should be listed separately under travel in the budget. d. Computer Services The cost of computer services, including computer-based retrieval of scientific and technical information, may be requested. A justification based on the established computer service rates should be included. e. Subcontracts Subcontracts should be listed so that they can be properly evaluated. There should be an anticipated cost and an explanation of that cost for each subcontract. The total amount of each subcontract should also appear as a budget item. 4.6 Indirect Costs Explain the basis for each overhead and indirect cost. Include the current rates.
|