NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
 

NERSC Announcements Message Archive

Select: [all-announcements] [users] [franklin] [bassi] [jacquard] [davinci] [nug] [managers]

[ Back ]

Subject: Potential problems with Seaborg batch jobs
Author: David Turner <dpturner_at_lbl.gov>
Date: 2005-04-05 09:44:01
Greetings Seaborg User, NERSC has identified and fixed a configuration problem on Seaborg that could possibly affect batch jobs submitted between 14:00 March 22 and 15:45 April 4. Jobs submitted during this interval could experience either of the following: 1) Job failure due to insufficient memory for MPI operations. Two possible error messages are: ERROR: 0032-171 Communication subsystem error: Memory is exhausted. in MPI_Isend, task 0 ERROR: 0032-113 Out of memory in MPI_Allreduce, task 51 Whether or not a particular program experiences this type of failure depends on the nature of its MPI operations; not all MPI codes will encounter this failure. 2) Reading large files via stdin (standard input) will result in unpredictable results. Input files over 1024 bytes in size will not be read correctly. Depending on the program's logic, this could result in code failure, or more seriously, incorrect results. Situation 2) requires immediate user attention. If you have run to completion any batch job submitted during the interval in question, that used stdin to read a file larger than 1024 bytes, you should look very closely at your results; they may not be correct. If you have any pending batch jobs (status I, NQ, HS, or HU) that were submitted during this interval, and that expect to use stdin to read a file larger than 1024 bytes, you should cancel those jobs and resubmit them. We apologize for the inconvenience this problem causes for our users. NESRC staff are actively working with IBM to prevent this problem in the future. -- Best regards, David Turner User Services Group email: dpturner@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Lab fax: (510) 486-4316

LBNL Home
Page last modified: Fri, 05 Dec 2008 19:17:25 GMT
Page URL: http://www.nersc.gov/nusers/announcements/message_text.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science