NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
  Operations Schedule for PDSF

In case of difficulty accessing PDSF or HPSS please check one of the following:

PDSF Scheduled Shutdowns

NERSC Systems Message of the Day - includes PDSF and HPSS.
Message of the Day, know also as MOTD is maintained 24 hours a day, 7 days a week, by the NERSC Computer Operations & Network Support staff, and provides the most up-to-date information on system status. Just recently PDSF has been added to this system.

NERSC Systems Availability Log - includes HPSS and PDSF.
History of down times for all the NERSC systems - updated once a day. Good place to check if you wonder whether PDSF or HPSS was down or up some time in the past

Scheduled Events

  • December 10th 10:00am - 2:00pm - Cluster maintenance.

  • November 7th 10:00am - 5:00pm - Batch system suspended for the cluster benchmarking.

  • September 12th 2002 9:00am - 6:00pm - Upgrade to RH 7.2. There will be no logins at that time. There will be no queue draining prior to that event. Instead, the queues will be stopped at 9:00am on 09/12/02 and all the running jobs will be killed to give LSF a chance to clean up the spool, as we will be migrating spool to a new location. This should not affect pending jobs.

  • September 5th 2002 9:00am - 5:00 pm (with a possible overflow into September 6th) - Installation of new hardware and seismic bracing. During that time there will be no logins and the batch system will be suspended, but no jobs should die.

  • August 15th 2002 6:30am - 7:30am - Upgrade of pdsfsu00 and pdsfsu05, afs services will be down across the cluster at that time except for pdsflx007 and pdsflx008.

  • April 30th 2002 12 pm PDT - PDSF mail system will stop using any .forward files you have on your PDSF account. e-mail sent to you@nersc.gov will be delivered to the address that is in the NERSC database (nim.nersc.gov). You can update that database yourself. Addresses specified in bsub will work as before.

  • April 29th 2002 12 pm PDT - HSI upgrade, see MOTD for details.

  • April 23th 2002 9 am - 1 pm PDT - LSF batch system suspended. Jobs will pause and resume when tests are done. There will be no draining associated with it and users are allowed to queue jobs at that time. Interactive nodes will not be affected.

  • March 13th 2002 5pm-6pm PDT network interruptions.

    On Wednesday, Mar 13, from 5-6pm, there will be several network interruptions.

    ESnet will be changing the OC12 link from the OSF to Sunnyvale from ATM to POS. We expect 2-4 network interruptions of less than 10 minutes each. During the interruptions, there will be no traffic to/from NERSC. Internal traffic on the NERSC network will not be affected.

    After ESnet completes their work, the NERSC network team will be upgrading the switch on the public subnet. While the switch is being replaced, the NERSC web servers and DNS servers will not be available. We expect 2 periods of downtime of less than 5 minutes each.

    If ESnet is unable to perform their work on Wednesday, the network downtime will be moved to Thursday, Mar 14 from 5-6pm. If this happens an additional announcement will be sent.

  • 19 February 2002 9am-6pm (or until announced)- scheduled PDSF cluster maintenance, all the CPU's will be restarted. Following users' requests there will be no queue draining. Jobs running at the time of shutdown will be terminated. We are planning to bring PDSF GID's and UID's in sync with the NERSC database. It really is lots of work for the staff. We'll bring the system back ASAP.

  • 1 February 2002 - ftp server on pdsfsu00 will be turned off.

  • 17 January 2002 5:30pm - 6:00pm PST - the LBNL<=>ESnet connection will be subject to disruption to troubleshoot the source of errors on the link. LBLnet & ESnet personnel will be working to isolate & hopefully eliminate the source of these problems. They apologize for any impact this activity may have on your operations. Service will be fully restored at the conclusion of this work.
    This will affect only our LBNL users. Especially check on your afs tokens. They could go away.

  • 3 January 2002 9am-6pm (or until announced)- scheduled reboot of the PDSF cluster, all the CPU's will be restarted. pdsflx000 will be replaced by two new nodes. Following users' requests there will be no queue draining. Jobs running at the time of shutdown will be terminated.

  • 19 December 2001 10am-2pm - interactive nodes pdsflx001, pdsflx002 and pdsflx003 will be replaced a by new hardware. IP addresses will change. All the nodes will be configured like pdsflx008 is now (standalone RH 6.2).

  • 12 December 2001 10am-2pm - interactive nodes pdsflx004, pdsflx005, pdsflx006, pdsflx007 and pdsflx008 will be replaced a by new hardware. IP addresses will change. All the nodes will be configured like pdsflx008 is now (standalone RH 6.2).

  • 4 December 2001 9am-10am - network interruption.

  • 26 Spetember 2001 9am-6pm (or until announced)- scheduled reboot of the PDSF cluster, all the CPU's will be restarted. Following users' requests there will be no queue draining.

  • August 6-12 2001 - Scheduled System Upgrade of the NERSC AFS cell

    The storage group is upgrading NERSC AFS cell this week. Starting on Monday 8/6/2001 they will be moving AFS volumes. Users should experience no outage during the week.

    On Sunday 8/12/2001 there will be a 4 hour AFS outage in the NERSC cell from 12:00-16:00 PDT to move the AFS databases. During this time NERSC AFS cell will be unavailable.

  • 22 June 2001 10am-11am PDT - NERSC network outage.

    The outage is necessary to replace a failed interface in our main router and for ESnet to upgrade their router's software.

    All network connectivity to the nersc.gov domain will be interrupted including all access to NERSC computational and storage resources as well as the PDSF cluster. Also, the AFS and DCE servers will be isolated from the rest of NERSC while the router is down. Although we are allowing one hour for the outage, we expect the interruption will be much shorter than this.

  • 14 June 2001 9am-6pm (or until announced)- scheduled reboot of the PDSF cluster, all the CPU's will be restarted.

  • 25 March 2001 9am-12pm - scheduled maintenance of pdsflx00. At that time the cluster will be shut down and then restarted. Users requested not to drain queues so all the jobs still runing at that time will be killed.

  • 6 February 2001 9am-12pm - scheduled reboot of the PDSF cluster, all the CPU's will be restarted.

  • 22 November 2000 8:00 AM PST - scheduled emergency reboot of pdsfsu05 . Reboot should take less than 15min. AFS service on linux nodes will be affected.

  • Activities related to the Oakland Move.

    • 20 October, Friday - 8 am Long queue shut down.
    • 24 October, Tuesday - 8 am Medium queue shut down.
    • 26 October, Thursday - 8 am System goes down for packing.
        No access to HPSS at this time.
        Cluster Network Switch, pdsflx00, and pdsfsu05 moved to Oakland.
      NOTE:
        Please remember, this means that there will be NO logins Thursday morning starting at 8:00 AM. At that time any and all jobs that are still in the LSF system will be deleted. Anyone who is logged in or does log in after this time will be logged off and any work done after this time may be lost. We will be doing backups in preparations for the Oakland move.

        Also note that after Thursday 8:00 AM the data on dv01 - dv10 will be lost. When these disk vaults become available again, they will be in their new configuration. So any job that writes results to the disk vaults should be reviewed and saved if appropriate to afs, HPSS or correct area on dv14 or dv15. Affected areas:

                pdsfdv01
                pdsfdv02
                pdsfdv03
                pdsfdv04
                pdsfdv05
                pdsfdv06
                pdsfdv07
                pdsfdv08
                pdsfdv09
                pdsfdv10
        
        This will affect the following mount points:
                auto/amanda
                auto/atlas
                auto/babar
                auto/cdf
                auto/d0
                auto/deepsrch
                auto/e895
                auto/e896
                auto/na49
                auto/pdsfdv01
                auto/pdsfdv02
                auto/pdsfdv03
                auto/pdsfdv04
                auto/pdsfdv05
                auto/pdsfdv06
                auto/pdsfdv07
                auto/pdsfdv08
                auto/pdsfdv09
                auto/pdsfdv10
                auto/pdsfdv13
                auto/phenix
                auto/sno
                auto/star
                auto/ucbmep
        

    • 27 October, Friday Pieces are moved to Oakland:
        cluster network switch, pdsflx00, and pdsfsu05, dv14 and dv15. New compute nodes bought online
    • 30 October, Monday Network returns to production mode
    • 31 October, Tuesday Partial cluster available:
        lx00 for home directories, su05 for afs->nfs, dv14 and dv15 for disk vaults. Interactive access will be restored. Partial batch queues will come online, using new equipment (89 dual PIII 650's). HPSS will NOT be available at that time.
    • 2 November HPSS comes back on line, although this could be as late as 11/06 (a message will be sent with its status).
    • 10 November This is the date we hope to have the rest of PDSF installed by.
        PDSF will be upgraded as tasks get completed. Messages will be sent when major events get complete. At that point we will have 156 linux nodes, 16 data vaults, 2 suns.

    General Notes:

    All IP numbers will be changing for all machines in PDSF and the machine names will be changing too. Some key machines IPs will be:
        pdsflx000 - 128.55.24.100
        pdsfsu00   - 128.55.24.20
        pdsflx001 - pdsflx008:    128.55.24.101 - 128.55.24.108
    
    Notice the extra digit in the name of the lx machines. And if these numbers do happen to change, a message will follow with corrected information. (These numbers will be in effect after 10/27.)

    Also note; any disk space on pdsflx00 and pdsfsu00 will not be affected.

    For more details check pdsf-announce mailing list archives If there are any questions, please direct them to Cary.

  • 5 October 2000 6:30am - 6:45am- network to/from the lab down.
  • 4 October 2000 10am - 12am - network down in the cluster area.
  • 1 August 2000 9am-12pm - scheduled reboot of the PDSF cluster, all the CPU's were restarted.

Hardware Projects

  • Wednesday, December 8, 1999
    Changed a lot. Reconfigured all batch noded. Added machines from the e895 project to cluster. The old starlx machines have now become the interactive nodes. We have also added 18 new PIII/450, 4 512GB raidzone disk vaults, and 12 PIII/450 processors to existing dual processor machine which only had 1 processor. Now all the batch nodes have 2 Gig of swap space. Check out the hardware section for more details.
  • Thursday, August 5, 1999
    2GB of memory was added to starsu00 and 256GB of memory was added to starlx0[1-8]
    Also we lost the 2nd 23GB drive thus /scratch and /scratch/common has been moved to /data06
  • Saturday, July 10, 1999
    972G of new drive space added to starsu00 in directories /data0[4-8]
  • Monday, October 5, 1998
    All remaining pdsf HP systems (pdsfhp1 - pdsfhp32) will be decommissioned.
  • Wednesday, September 30, 1998
    All remaining pdsf Sun systems (pdsfsu1 - pdsfsu24) will be decommissioned.
  • Friday, September 18, 1998
    The old data vaults will be decommissioned.
  • Friday, September 11, 1998
    The new data vault bank will be available for general use.
  • Friday, September 11, 1998
    16 Linux machines (pdsflx01 - pdsflx16) will be available for general use.
  • Monday, June 22, 1998
    Decommission 8 PDSF Sun systems (pdsfsu25 - pdsfsu32).


Software Projects

  • Friday, July 9, 1999
    The cluster was moved to 2.2.10-ac10 kernel.
  • Wednesday, June 30, 1999
    New cluster monitoring service became operational.
  • Wednesday, April 28, 1999
    LSF 3.2 bacame available for general use.
  • Monday, August 17, 1998
    LSF 3.1 will be available for use on the starsu & starlx machines.
  • Thursday, June 18, 1998
    The following will be completed on the PDSF cluster:
    • - Move the PDSF /home/common/gc5 and /home/common/star to the PDSF cluster
    • - Increase the scratch space on the Linux machines.
    • - Unmount the /home/user area from the PDSF cluster.
  • Implement a more flexible batch queueing system which will perform load balancing activities.


LBNL Home
Page last modified: Tue, 25 May 2004 18:12:50 GMT
Page URL: http://www.nersc.gov/nusers/systems/PDSF/announce/schedule.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science