--------- 31JAN2013 {The potential outage for 29JAN was canceled and rescheduled for 31JAN. *(j*)} The CCS switch originally scheduled for today (which was postponed due to Critical Weather) has been rescheduled for tomorrow, Wednesday, Jan. 31. NCO will switch production from Stratus to Cirrus. At 1130Z, developer jobs will be drained. At 1230Z, any remaining developer jobs will be stopped and the switch will take place. Developers will be allowed onto Stratus as soon as the switch is completed. --------- 28JAN2013 {There may be no data flow to the nomad development servers during the indicated period, 29JAN2013, however, the servers nomad[1,3,5] will be working showing past data. The WOC NOMADS 24/7 high availability server: http:/nomads.ncep.noaa.gov will not be affected and real time data should not be delayed. (*j*)} On Tuesday, Jan. 29, NCO will switch production from Stratus to Cirrus. At 1130Z, developer jobs will be drained. At 1230Z, any remaining developer jobs will be stopped and the switch will take place. Developers will be allowed onto Stratus as soon as the switch is completed. --------- 04DEC2012 {There may be no data flow to the nomad development servers during the indicated period, Dec. 6, however, the servers nomad[1,3,5] will be working showing past data. The WOC NOMADS 24/7 high availability server: http:/nomads.ncep.noaa.gov will not be affected and real time data should not be delayed. (*j*)} Thursday Dec. 6, at around 1130Z, the SPA team will start a single cycle parallel production test on Stratus to confirm the functionality of the changes made with tomorrow's IBM quarterly upgrade. This test will last about 10 hours (extended time is needed to allow the SREF to run to completion and allow transfers to catch up). Developers will not have access to the system during this time. Once the parallel production test has been completed on Stratus, the system will be returned to users. --------- 03DEC2012 {There may be no data flow to the nomad development servers during the indicated period, however, the servers nomad[1,3,5] will be working showing past data. The WOC NOMADS 24/7 high availability server: http:/nomads.ncep.noaa.gov will not be affected and real time data should not be delayed. (*j*)} Subject: Stratus upgrade tomorrow, Dec. 4 Date: Mon, 3 Dec 2012 09:33:49 -0500 From: Christine Caruso.Magee - NOAA Federal At 1130Z tomorrow {04DEC2012}, developer jobs will be drained from Stratus in preparation for the Stratus upgrade. At 1200Z, any remaining developer jobs will be killed and the upgrade will begin. Stratus will be unavailable for 24 hrs. Upon completion of the upgrade, the baselines will be run. Developers will be allowed back onto Stratus following successful completion of the baselines. --------- 13NOV2012 {Once again there may be no data flow to the nomad development servers during the indicated period. The servers nomad[1,3,5] will be working showing past data. The WOC NOMADS 24/7 high availability server: http:/nomads.ncep.noaa.gov will not be affected and real time data should not be delayed. (*j*)} From NOAA SDM: On Wednesday, November 14, starting at 1130Z, we will switch production from Stratus to Cirrus to continue the quarterly upgrade process. At 1130Z, Cirrus will be drained, at 1230Z, developers will be forced off and the switch will begin. Stratus will be returned to developers after the switch is complete. --------- 17OCT2012 {The following means there may be no data flow to the nomad development servers during the indicated period. The servers nomad[1,3,5] will be working showing past data. The WOC NOMADS 24/7 server: http:/nomads.ncep.noaa.gov will not be affected and real time data should not be delayed. (*j*)} On Thursday, October 18, at around 11Z, the SPA team will start a Multi-cycle, parallel production test on Cirrus to confirm the functionality of the changes made with the IBM quarterly upgrade. This test will last about 24 hours. Developers will not have access to the system during this time. Once the parallel production test has been completed on Cirrus, the system will be returned to users. User notification will be sent out via SP-Announce. At this time a production switch to Cirrus is planned for Tuesday, October 23 pending the outcome of the parallel test. More information will be announced prior to the switch. --------- 10OCT2012 On Wednesday, October 10, IBM will resume the quarterly upgrade of Cirrus. At 07:00 am local (1100Z), jobs will be drained on Cirrus. At 08:00 am local (1200Z), any remaining jobs will be killed and users logged off of Cirrus. Approximately 2 hours after taking Cirrus down, IBM will unmount Tier 3 on Stratus. The upgrade may take up to 4-5 hours to complete. The system will be returned to developers after all benchmark tests are complete. On Thursday, October 11, at around 12Z, the SPA team will start a single cycle, parallel production test on Cirrus to confirm functionality. This will take about 10 hours. Developers will not have access to the system during this time. Once the parallel production tests have been completed on Cirrus the system will be returned to users. User notification will be sent out via SP-Announce. At this time a production switch to Cirrus is planned for Thursday, October 18. More information will be announced prior to the switch. {This means there may be no data flow to the nomad development servers during this period, although the servers nomad[1,3,5] will be working showing past data. The WOC NOMADS 24/7 server: http:/nomads.ncep.noaa.gov will not be affected and real time data should not be delayed. (*j*)} --------- 19SEP2012 This just in ... -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Cirrus Parallel Production test today - Postponed 9/19 then Rescheduled for 9/20 Date: Wed, 19 Sep 2012 10:52:23 -0400 From: SDM To: ncep.list.sp-announce@noaa.gov The CIRRUS Parallel Production test will take place tomorrow at 12Z. IBM will prepare CIRRUS for the test beginning at 11Z. When the single cycle test starts tomorrow developers will not be permitted on the system. When the test is completed around 20Z, then CIRRUS will be returned to development.. {This means there may be no data flow to the nomad development servers during this time, although the servers nomad[1,3,5] will be working showing past data. The WOC 24/7 server: http:/nomads.ncep.noaa.gov will not be affected and real time data will be unaffected. (*j*)} -------- Original Message -------- Subject: CCS: Cirrus Parallel Production test today Date: Wed, 19 Sep 2012 07:35:36 -0400 From: Mary Hart I expect this test to last until ~20Z today. mlh -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Cirrus Parallel Production test today Date: Wed, 19 Sep 2012 06:52:59 -0400 From: Catherine Schaefer (sysadmin) Today starting at 07:00 AM local, Cirrus will be undergoing system testing and a parallel production test. Any development jobs that were remaining on the system have been terminated. --------- 14SEP2012 On Monday. September 17, IBM will begin the quarterly upgrade of Cirrus. At 07:00 am local (1100Z), jobs will be drained ... {This means there will be no data flow to the nomad development servers although they will be active for past data. The WOC 24/7 server http:/nomads.ncep.noaa.gov will not be affected. (*j*)} --------- 11SEP2012 {This means there will be no data flow to the nomad development servers although they will be active for past data. The WOC 24/7 server http:/nomads.ncep.noaa.gov will not be affected. (*j*)} [NCEP.List.SP-Announce] wrote ... with Subject: CCS Production Switch Today, starting at 1130Z, we will switch production from Cirrus to Stratus. At 1130Z, Stratus will be drained, at 1230Z, developers will be forced off and the switch will begin. Cirrus will be returned to developers after the switch is complete. This switch is in preparation for the Quarterly CCS Upgrade that will start the week of September 17. -- Senior Duty Meteorologist NOAA/NWS/NCEP/NCO/PMB --------- 02AUG2012 Beginning on August 9, 2012 NCEP is moving to its new building and all development nomads servers will be dismantled and moved to new quarters. There will be down for a week but could return as early as August 11, 2012 --------- 05JUL2012 Switching production and development on NCEP IBM-SP so there will be losses on the nomad[1,3,5] data flow... but http://nomads.ncep.noaa.gov will be on time with all the data. --------- 28JUN2012 {Sorry for the late notification} Systems or the Senior Production Analyst (SPA) office will take Cirrus to run a parallel production test for approximately 30 hours until about 18Z on June 29. During the test, all development will be suspended. {This means there will be no data flow to the nomad development servers although they will be active for past data. The WOC 24/7 server http:/nomads.ncep.noaa.gov will not be affected. (*j*)} --------- 14MAY2012 On Sunday, 13MAY2012 an unscheduled switch of production and development supercomputers occurred. This means data sets may be missing or late on development nomadd[1,3,5]. WOC nomads will not be affected. --------- 27MAR2012 {The patches for nomad3 and other work may extend a few more days.} The nomad1, nomad3 and nomad5 systems have patches that need to be applied by the system admin robots between 1:00PM to 3:00PM today. The servers will not be available during this period. Also on Thursday. March 29, starting at 1130Z, the super computer ststem admin staff will switch production from Stratus to Cirrus, production and development as the final step in the upgrade process. At 1130Z, Cirrus will be drained, at 1230Z, developers will be forced off, nomad[1,3,5] data flow will be interrupted and the switch will begin. Stratus will be returned to developers after the switch is complete. After the data mirrors are readly data flow will resume. During this time the high availiblity server nomads.ncep.noaa.gov will be up and have data on time. --------- 21MAR2012 {The delay is over dev and prod are switching 20120322 as per following announcement. Note also possible data flow delays for nomad[1,3,5] at the end of March. WOC high availability nomads will not be affected.} On Thursday, March 22, IBM will begin the quarterly upgrade of Cirrus. At 07:00 am local (1100Z), jobs will be drained on Cirrus. At 08:00 am local (1200Z), any remaining jobs will be killed and users logged off of Cirrus. Approximately 2 hours after taking Cirrus down, IBM will unmount Tier 3 on Stratus. The upgrade may take up to 24 hours to complete. Once all work and benchmarks have been completed on Cirrus the system will be returned to users. User notification will be sent out via sp-announce. At this time a single cycle test on Cirrus is scheduled for March 27, and a production switch to Cirrus is planned for Thursday, March 29. --------- 12MAR2012 {delayed for a time} Development and production are switching {again}. 201203?? there could be some data flow delays for all the development nomad[1,3,5] servers but the high availability nomads will be on time. Below is the message from sp-announce} --------- 08MAR2012 {Once again development and production are switching. 20120308 there could be some data flow delays for all the development nomad[1,3,5] servers but the high availability nomads will be on time. Below is the message from sp-announce} On Thursday, March 8, NCO will switch production from Cirrus to Stratus in preparation for an upgrade to Cirrus. At 06:30 am local (1130Z), jobs will be drained on Stratus. At 07:30 am local (1230Z), any remaining jobs will be killed and users logged off of Stratus. When the switch is complete, developers will be allowed back on the CCS. User notification will be sent out via sp-announce. --------- 28FEB2012 {The Prodcution-Dev switch on 20120301 1130Z may cause data flow problems for nomads development. Data at the WOC http://nomads.ncep.noaa.gov will be present and on time.} On Thursday, March 1, NCO will switch production from Stratus to Cirrus in preparation for an upgrade to Stratus. At 06:30 am local (1130Z), jobs will be drained on Cirrus. At 07:30 am local (1230Z), any remaining jobs will be killed and users logged off of Cirrus. When the switch is complete (around 10:00am), IBM will take Stratus to perform an upgrade to correct some of the recent problems on that system. Approximately 2 hours after taking Stratus down, IBM will unmount Tier 3 on Cirrus. Once all work and benchmarks has been completed (about 8 hours) on Stratus and Tier3 the system will be returned to users. User notification will be sent out via sp-announce. --------- 08FEB2012 On Thursday, February 9, NCO will switch production from Stratus to Cirrus due to performance issues on Stratus. At 06:30 am local (1130Z), jobs will be drained on Cirrus. At 07:30 am local (1230Z), any remaining jobs will be killed and users logged off of Cirrus. When the switch is complete, Stratus will be returned for developer access. WOC high availability NOMADS, nomads.ncep.noaa.gov, will have data and on time. nomad1 dev server will be down for OS upgrades 08FEB2012. Should be back before COB. try nomad5 for an alternate and nomads.ncep.noaa.gov, the high availability server which is up 24/7. --------- 06FEB2012 {Word from operations below has made a quick change for production and the development super computers. This means there may be some loss in data flow for nomad[1,3,5] but nomads should have all the data and on time.} {Last night at 2AM sp-announce wrote...} Due to hardware problems on Cirrus, Stratus will be configured for production immediately. Also... I thought the machine change (nomad3 dev) would be transparent and did not mention it in these status statements (http://nomad5.ncep.noaa.gov/DOC/announce.txt) since there remained nomads, and nomad1&5, and it almost worked OK, but I failed to remember that a new machine means a new mac address, and the secure shell on the ibm-sp will have the old mac address, and instead of copying the new knownhost info, would shut down all data transfers to nomad3. Sorry., but sekurity rules, now the the old knownhost address is deleted and the new is set so data transfers restarted last Friday and I back filled in the 1x1 GFS (which is the only data apart from rotating) past few days, about from 1/30. --------- 04JAN2012 {The switch (from DEC) will take place 20120105 1230Z so data flow may be turned off for a while for dev servers. Data at the WOC nomads.ncep.noaa.gov will be present and on time.} On Thursday, Jan. 5, NCO will switch production from Stratus to Cirrus. At 06:30 am local (1130Z), jobs will be drained on Cirrus. At 07:30 am local (1230Z), any remaining jobs will be killed and users logged off of Cirrus. When the switch is complete, Stratus will be returned for developer access. --------- 21DEC2011 {Data flow outage has been canceled} Due to ongoing CIRRUS IB switch issues, the production switch scheduled for tomorrow has been canceled. The IBM Support Team will be rebooting CIRRUS compute nodes throughout the day in order to resolve the problem. The production switch will be announced after the IB switch issues on CIRRUS have been resolved. --------- 20DEC2011 {The following means thre will be no data flow for development NOMADS for a day. WOC high availability NOMADS will have data and on time. } On Wednesday, December 21, ... a production switch from Stratus to Cirrus to complete the work of adding 10 additional nodes to the CCS. At 06:30 AM local, jobs will be drained, at 07:30 remaining jobs will be killed and users logged out of Cirrus. When the switch procedures are complete Stratus will be returned for development access. --------- 08DEC2011 {This may be transparent to nomad development users.} -------- Original Message -------- Subject: December 12 Parallel Production Test on Cirrus Date: Thu, 08 Dec 2011 11:55:51 -0500 From: sdm On Monday, December 12, a parallel production test will be run on Cirrus in order to fully test the HV4 implementation on that system. At 6:30AM local, jobs will be drained from Cirrus. Any remaining developer jobs will be killed 30-60 minutes later. Once the test is underway, devonprod, devhigh, and class 1 users will be allowed back onto Cirrus but will be subjected to preemption for the duration of the test. The test should be complete by around 3 PM. If the test goes well, the production switch from Stratus to Cirrus will be scheduled for Wednesday, December 14. --------- 05DEC2011 {The following means thre will be no data flow for development NOMADS for 3 days. WOC high availability NOMADS will have data and on time. } Starting at 07:30 AM local Cirrus will undergo a maintenance outage to perform HV4 augmentation work. At 06:30 AM local Cirrus will be drained, at 07:30 any remaining jobs will be killed and users logged off. The outage will last for approximately 72 hours. --------- 02NOV2011 { NOMAD development dataflow outage } On November 7, NCO will be running a parallel production test on Stratus in order to fully test the wgrib/wgrib2/w3lib/cnvgrib update and all affected downstream jobs. Just before the test, IBM will beginning draining developer jobs from Stratus. Any remaining developer jobs will be killed 30-60 minutes later. NCO plans to run this test for the 12Z cycle but the exact start time has not yet been determined. Once NCO has begun the parallel production test, devonprod, devhigh, and class 1 users will be allowed back onto Stratus but will be subjected to preemption for the duration of the test. NCO estimates that production mirrors will take approximately 4 hours to catch back up once the production parallel test has been completed. **Stratus outage** {Today} IBM reported at 9am today that they finished the h/w work & were starting the benchmark tests & model runs. They had an issue with some T3 nodes & had T3 off-line. NCO won't be running a prod cycle test today so IBM will let the dev users back on Stratus once the tests are successfully completed. If T3 is still off-line at that time, users will have to expect a delay for resync'ing between Cirrus & Stratus before it's back to normal.** --------- 25OCT2011 {31OCT-03NOV NOMAD development dataflow outage after 06Z 10/31 dataflow. There may be no data flow to development NOMAD[1,3,5] ~10/31- 11/2. WOC high availability NOMADS will have data and on time. (*j*) } sp-announce message: In order to accomplish the CCS HV4 system augmentation, Stratus will be unavailable for 48 - 72 hours beginning Monday October 31, 2011. Starting 07:30 local on October 31st, all jobs on Stratus will be drained. At 08:30 all remaining jobs will be cancelled and users will be logged out. Once IBM has completed system upgrades, a series of system baseline and acceptance tests will be performed. Once all required testing has been successfully completed user access will restored. Users will be notified of any updates through sp-announce. --------- 20OCT2011 There is currently a dead disk on Nomad1. We have a replacement and I will swap out the dead drive later this after noon. It should not affect the use of the system but might cause a slight performance hit while the new drive copies the data. {1900} Disk has been replaced and it is rebuilding. Should be a few hours for the rebuild. Shouldn't be a significant performance hit but there may be a little bit of a slow down but should be completed by later today. --------- 12OCT2011 Note that dev users {and nomads[1,3,5]} will not have access to the dev machine both for some hours before (starting Monday morning) and again after (ending on Tuesday aftn) the Tuesday morning prod switch. mlh -------- Original Message -------- Subject: [NCEP.List.SP-Announce] next week's NAM implementation and CCS production switch Date: Wed, 12 Oct 2011 13:44:38 -0400 From: Christine Caruso Magee In order to perform the NAM implementation next week, NCO/PMB will be taking Cirrus from development use on Oct. 17. The schedule is as follows: 1130Z Oct. 17 - Draining of developer jobs begins 1200Z Oct. 17 - Any remaining dev jobs on Cirrus will be killed. Transfers from Stratus to Cirrus will be stopped. Cirrus will be configured for production and the new NAM will be implemented on Cirrus but products will not yet be alerted from Cirrus. Stratus remains the production CCS and all output is still alerted from Stratus. 1230Z Oct. 18 - Cirrus becomes the production CCS. Products will be alerted from Cirrus. NCO/PMB will keep Stratus until 1800Z in case there is a problem with the new NAM (so that production can be switched back to Stratus without losing a cycle of the NAM). 1800Z Oct. 18 - Stratus is configured for development use and developers will have full access to Stratus. Transfers will be turned on from Cirrus to Stratus. PMB anticipates Stratus will be fully caught up in approximately 24 hrs. --------- 04OCT2011 {This means there will be no data flow to Development nomad[1,3,5] this weekend 10/7-10/10 (*j*) } Cirrus outage for facility work 10/07-10 (this weekend). IBM will drain Cirrus jobs at 2:30pm EDT on 10/07 and then start to shut down Cirrus before the facility power shuts off at 5pm. Power will be restored at 7pm on 10/09 and IBM will start powering up gpfs & Cirrus so NCO can run benchmarks. Users will be able to log on Cirrus by 12:30 am on 10/10, but it will take some hours before the mirror catches up. IBM will put out a user announcement with details. --------- 20MAY2011 On Monday, May 23rd data flow will be unreliable on nomad[1,3,5]. nomads.ncep.noaa.gov will be up and on time. On Monday, May 23rd we will perform a production switch to Cirrus. At 07:00 AM local, jobs will be drained, at 08:00 remaining jobs will be killed and users logged out of Cirrus. When the switch procedures are complete Stratus will be returned to development access. --------- 13MAY2011 {This means that Data Flow will be unreliable on Wednesday 18MAY.} IBM is restarting the CCS maintenance by installing the latest efix on Cirrus next Wed. Note that this time line looks very similar to the Cirrus maintenance last week, with Wed maintenance followed by a prod cycle test on Thurs. Hopefully users will have access to Cirrus Wed night like last time. Note also the prod switch one week later (25 May). Both these dates will fit around the scheduled hurricane model implementations on 17 & 24 May. --------- 03MAY2011 Data Flow will be unreliable for the next few days on nomad[1,3,5]. nomads.ncep.noaa.gov will be up and on time. Wednesday, May 4th at 12Z/08:00AM local. The outage is scheduled to last 24 hours. At 11Z/07:00AM Wednesday dev jobs will be drained on Cirrus. At 08:00 AM local all remaining jobs will be killed and users logged off. On Thursday, May 5th at 12z/08:00AM local, a 1-cycle production test will run on Cirrus. For the duration of the test Cirrus will be configured as a production machine. Any activity on Cirrus will be drained at 07:00 AM local on Thursday, and at 08:00AM local Thursday the test will commence. A production switch to Cirrus is scheduled for Wednesday, May 11th, and Stratus maintenance will occur on May 18th. --------- 11MAR2011 The development servers need to undergo program and kernel updates. It should take a short time and most users should not notice. They will be done on each development server [1,3,5], one at a time the next starting when the previous server is back on line. --------- 03MAR2011 No data flow to any development server but data remains ontime on the WOC nomads.ncep.noaa.gov. When the below mentioned switch was done all development disk (T3) was gone, including the data mirror. They are working on it as the quote below indicates: "At the 0900 meeting, IBM had no ETA for when T3 might be back. " --------- 01MAR2011 This means that there may be delays in data flow on development nomad servers. Data remains ontime on the WOC nomads.ncep.noaa.gov. * 03/02 prod switch details: Production will switch from Cirrus to Stratus on * Wednesday March 2nd at 12:30z,(0730 local). 0630 - 0730 local, development * jobs will be drained & canceled from Stratus. Once the production switch is * complete, IBM will stop gpfs to apply multi-cluster tuning parameters and * re-purpose Madis disk arrays. IBM expects to complete the multi-cluster * tuning between 0930 and 1000 local. User access to Cirrus will be restored * for about 4 hours while disk arrays rebuild. At 1400 - 1500 local * development jobs will be drained & canceled on Cirrus. System baseline tests * will be run by IBM and PMB. Once baseline test have completed successfully, * user access will be restored. --------- 27JAN2011 Due to fiber cuts in the area we will need to stop the flow of data between the CCS and NOMAD1. The system is utilizing a very small percent of the line but under the circumstances we need to remove all we can. --------- 06JAN2011 Nomad[1,3,5] development server Data flow (delivery) was delayed as cron was turned off on the IBM-SP during Jan 6. The data is on its way now and should be up to date by the time this is received. Data remains ontime on the WOC nomads.ncep.noaa.gov. --------- 23NOV2010 {This means there may be no data flow to development NOMAD[1,3,5] 12/1 12-18Z. WOC high availability NOMADS will have data and on time. (*j*) } On Wednesday Dec. 1, effective for the 1200Z and 1800Z cycles, NCO/PMB will be running a parallel test of the entire production job suite on Cirrus in order to test the upcoming bufrlib upgrade. During these 2 cycles, the contents of the bufr tanks in /dcom on Cirrus will contain data formatted with the new bufrlib and will not be compatible with the current production version. All executables in /nwprod on Cirrus will be compiled with the new versions of bufrlib and w3lib, and the actual bufr and w3 libraries in /nwprod/lib will be the new versions. The production mirrors will be diabled during the test. Users will be able to log on to Cirrus but will not be able to run jobs during the test. After the test is completed (around 0030Z/7:30 pm EST), the executables, libraries, and tanks will be restored to the current production versions on Stratus and production mirrors restarted. --------- 18NOV2010 This will happen serially beginning at 18Z and the servers will not be available for a short time. The servers nomad1.ncep.noaa.gov, nomad3.ncep.noaa.gov and > > nomad5.ncep.noaa.gov have kernel updates available. These updates > > will require the systems to be rebooted once they have been > > applied. Is there a good time this week to apply the updates and > > reboot the systems? > > > > --------- 08NOV2010 {This message means the production super computer is taking over development, and for a day there will be no data flow. NOMADS high availability will not be affected and have all the data on time. (*J*) } Looks like we shouldn't expect Cirrus to be available again until some time on Tues morning. mlh -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Cirrus GPFS Maintenance - 08 Nov 2010 Date: Mon, 08 Nov 2010 06:33:11 -0500 From: Madhuveer Konidena (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Cirrus will be unavailable for system maintenance beginning at 12:30Z/07:30 AM local on Monday, November 8th, 2010. All jobs will be drained on Cirrus momentarily. At 12:30Z/07:30 AM local all remaining jobs will be canceled and users logged off. The maintenance is estimated to take approximately 24 hours. Once system maintenance has completed PMB will perform system baseline testing. Once all work and testing has been completed access to Cirrus will be resumed. Notification will go out via sp-announce. -- Mary L Hart EMC Info Officer --------- 04NOV2010 1739 GMT The mirror process as been restarted and files are being mirrored again. However it will take many hours/day before everything is caught up. ... -------- Original Message -------- Subject: CCS: 04 Nov 10: USER IMPACT: Mirror not running on stratus Date: Thu, 04 Nov 2010 14:32:47 -0400 From: M. Hart To: ALL OF EMC , "Aikman, Frank" , "Benjamin, Stan" , "Gilbert, Kathryn" , "schneider, russell" , "Wei, Eugene" For those who don't have up-to-date files on Cirrus, this is why. mlh -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 04 Nov 10: USER IMPACT: Mirror not running on stratus Date: Thu, 04 Nov 2010 14:01:47 -0400 From: root@cirrus.ccs.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The mirror between stratus and cirrus stopped on Sunday (10/31). We are having difficulties restarting it. George Vandenberghe has been contacted via email. If we can resolve the issue we will restart the mirror. Otherwise, it will have to wait until he returns this weekend. Another announcment will be sent when the mirror status changes. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce ... From 03NOV2010... 3) After the Space Shuttle critical weather day is over or canceled (production management has not said when, perhaps 11/04) there will be a switch of production and development IBM-SP super computers which means that there will be a delay in the data flow to development nomad (nomad[1,3,5]). The high availability server will have all the data and on time. Here is the announcement.... -------- Original Message -------- Subject: Fwd: [NCEP.List.SP-Announce] Production switch will occur Date: Thu, 04 Nov 2010 06:36:17 -0400 From: M. Hart To: ALL OF EMC Even though we are in CWD until 20Z tomorrow, they're going ahead with the prod switch this morning. mlh -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Production switch will occur Date: Thu, 04 Nov 2010 06:26:31 -0400 From: sdm To: _NCEP.List SP-Announce All, The production switch scheduled for around 12Z Thu Nov 4 will go forward - nodes will be drained at 1130Z. SDM Rob _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 03NOV2010 1) nomad5 server has been upgraded to new equipment. 2) nomad1 server GDS-1.3 has been upgraded to GDS-2.0 so DODS/OPENDAP can serve GRIB2 files. Last time I tried this there was an old cron script that was resetting the environment back to the old GDS settings which would hang it after about 8 hours. The offending script has been removed and we will keep an eye on the server. 3) After the Space Shuttle critical weather day is over or canceled (production management has not said when, perhaps 11/04) there will be a switch of production and development IBM-SP super computers which means that there will be a delay in the data flow to development nomad (nomad[1,3,5]). The high availability server will have all the data and on time. --------- 27OCT2010 Development server nomad5 gets new hardware. A switch from the old to the new system will take place. It should "mostly" be transparent to users as both systems are running in parallel. There will be a short time when past data is unavailable as the two systems are sync up. However, the production super computer will switch with the development super computer so no real time data will be available for a day. The date for this has been changing with critical weather days, so far the switch is scheduled i for 20101027. --------- 14OCT2010 nomad5 (and nomad3) are older architecture and need to be replaced. On 20101018 we will switch from an old to a new server for nomad5. Users may not notice this change as we have the new and old systems in parallel. --------- 17SEP2010 {1400Z} -------- Original Message -------- Subject: CCS: Stratus announcement for Dev users Date: Fri, 17 Sep 2010 09:39:56 -0400 From: M. Hart To: ALL OF EMC , "Aikman, Frank" , "Benjamin, Stan" , "Gilbert, Kathryn" , "schneider, russell" , "Wei, Eugene" SDM just announced over the intercom that IBM has to do some unscheduled maintenance on Stratus & they were booting users off the system *right now*. If I hear any more about this, I'll pass it on. -- Mary L Hart EMC Info Officer 09SEP2010 --------- {Once again the development super computer is unavailable for a while and there will be no data flow 9/13 10Z for some period. NOMADS high availability will not be affected and the data should be on time. (*J*) } Monday 13 Sep 2010, GPFS Maintenance on Cirrus starting at 10:00z (6:00 local) All jobs on Cirrus will be drained. At 11:00z all remaining jobs will be cancelled, LoadLeveler will be stopped and gpfs will be unmounted. Scheduled maintenance is esttiamted for 2 hours. Upon completion of scheduled work, baseline tests will be run before the system is returned to development. Notification will go out via sp-announce. 07SEP2010 -------- {Once again development super computer is being taken down and for a while there will be no data flow 9/08 12Z. NOMADS high availability will not be affected and have all the data on time. (*J*) } Wednesday 08 Sep 2010, GPFS Maintenance on Cirrus starting at 12:00z (07:00 local) all jobs on Cirrus will be drained. At 12:00z all remaining jobs will be cancelled, LoadLeveler will be stopped and gpfs will be unmounted. Scheduled maintenance is esttiamted for 2 hours. Upon completion of scheduled work, baseline tests will be run before the system is returned to development. Notification will go out via sp-announce. --------- 25AUG2010 {Here is the latest from SP-Announce on supercomputer availability. When ever it says "... cancel...." a period begins for nomad[1,3,5] that will have no server data flow, and data will be late or missing. The high availablity server http://nomads.ncep.noaa.gov will not be affected and should have all the data present and on time. (*j*) } Listed below is the GPFS maintenance schedule on Cirrus & Stratus to combine the new multi-cluster GPFS cluster (Nimbus) with Cirrus & Stratus clusters. 08/26 - Cirrus GPFS Maintenance 6:00 AM EDT - Drain all the LL jobs 7:00 AM EDT - All jobs canceled and GPFS taken down for 2 hour maintenance 10:00 AM EDT - Baseline test 08/30 - Production Switch to Cirrus 7:30 AM EDT - Drain all development jobs 8:30 AM EDT - Cancel all development user jobs, switch production to Cirrus 08/31 - Stratus GPFS Maintenance 6:00 AM EDT - Drain all the LL jobs 7:00 AM EDT - All jobs canceled and GPFS taken down for 2 hour maintenance 10:00 AM EDT - Baseline test 09/01 - Production Switch back to Stratus 7:30 AM EDT - Drain all development jobs 8:30 AM EDT - Cancel all development user jobs, switch production back to Stratus Thanks _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 24AUG2010 {The following indicates that the development supercomputer will not be available for nomad[1,3,5] development server data flow, beginning ~26AUG2010 1100Z. The servers will be up but the data may be late. The high availablity server http://nomads.ncep.noaa.gov will not be affected and should have all the data present and on time. (*j*) } Thursday 26 Aug 2010, GPFS Maintenance on Cirrus starting at 10:00z (06:00 local) all jobs on Cirrus will be drained. At 11:00z all remaining jobs will be cancelled, LoadLeveler will be stopped and gpfs will be unmounted. Scheduled maintenance is esttiamted for 2 hours. Upon completion of scheduled work, baseline tests will be run before the system is returned to development. Notification will go out via sp-announce. --------- 28JUL2010 {There have been many switch's this month and each time the data flow has been unavailable. Here below is today's latest switch announcement. The development data mirror should be up shortly at 15-16Z and then data will start flowing. I was away and did not notice that half of the data sets were not part of the data flow since July 19 when a switch of dev and prod was done leaving some data flow running and others not. The problem was complicated by a failure of the rotating archive delete program which became confused and stopped deleting data sets that were due for deletion thus filling the disk. This should be fixed and replenish what is missing in today and yesterday's data. (*j*) } -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Production Switch to Cirrus 08:45 AM local Date: Wed, 28 Jul 2010 08:40:00 -0400 From: Catherine Schaefer To: NCEP.List.SP-Announce@noaa.gov Today at approximately 08:45 AM local, production will be switched to Cirrus. The production fence will remain in effect on Stratus. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 06JUL2010 {No daata flow for development nomad[1,3,5]. Hisgh javailability server willl have all the data and on time. (*j*) } -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 06 Jul 10: STRATUS UPGRADE ON TUE JUL 6TH 2010 Date: Tue, 06 Jul 2010 05:50:46 -0400 From: Madhuveer Konidena (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Beginning 07:30 AM on Tue Jul 06, all LoadLeveler jobs will be drained. At 08:30 AM, all remaning jobs will be canceled, users logged out, and LoadLeveler will be stopped. Maintenance is scheduled to take approximately 24 hours. Upon completion of scheduled maintenance and following successful verification, the SPA team will perform a parallel production test during the 12Z cycle on Wed Jul 07. Once the parallel test is complete and all data transfers have caught up, user access will be restored. The system should be available to developers by late afternoon or early evening on Wed Jul 7th. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 01JUL2010 I have (not) been working on moving GDS and Grads into version 2.0 for GRIB2 records as well as grib1. It runs for half a day and thens tops without warning or error as if it was told to stopserver. I have not had enough time to fix it but anyday now but I have changed back to Grads1.9b... and GDS1.3 so there should be no problems for the long weekend. Jordan 21JUN2010 {Data may not arrive at the development servers from June 23 ~12Z for 24 hours. This will affect nomad[1,3,5]. nomads.ncep.noaa.gov WOC high availability will not be affected. (*J*) } -------- Original Message -------- Subject: [NCEP.List.SP-Announce] June 23-24 Stratus outage Date: Mon, 21 Jun 2010 12:20:23 -0400 From: sdm To: _NCEP.List SP-Announce Stratus upgrade on June 23: Beginning 06:30 local on June 23, all LoadLeveler jobs will be drained. At 07:30 all remaining jobs will be canceled, users logged out, and LoadLeveler will be stopped. Maintenance is scheduled to take approximately 24 hours. Upon completion of scheduled maintenance and following successful verification, the SPA team will perform a parallel production test during the 12z cycle on June 24. Once the parallel test is complete and all data transfers have caught up,user access will be restored. The system should be available to developers by COB on Thursday June 24. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 15JUN2010 {This means that there may be some delay or missing data on the development NOMAD servers as there will be no data flow during this time. The servers will remain in operation and static data sets such as the reanalysis will continue to be available. The real time data on high availability NOMADS (nomads.ncep.noaa.gov) will be unaffected. (*j*) } -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 16 Jun 10: Production switch from Stratus to Cirrus 12:30z 6/16/2010 Date: Tue, 15 Jun 2010 08:55:57 -0400 From: Don Avart (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Production will be switched from Stratus to Cirrus Wednesday 16 June, 2010 at 12:30z. Beginning 11:30z all jobs will be drained on Cirrus. At 12:30z all remaining jobs will be cancelled and users that do not have production access will be logged out. Once all production jobs have been vacated from Stratus, development access will be allowed. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 14JUN2010 A problem developed with the nomad1 GDS/OPENDAP/DODS server over last weekend. The system is OK now (20100614 0930) (*j*) --------- 09JUN2010 The GDS2.0 server is now running on nomad1.ncep.noaa.gov. The gds-1.3 is retired. For some time we have been running under port 9091 but gds-2.0 is now operating with port 9090. The address remains the same: http://nomad1.ncep.noaa.gov:9090/dods GDS 2.0 has been upgraded for compatibiity with GrADS version 2.0. It handles data in GRIB2 format, and supports 5-Dimensional ensemble data sets. The server-side analysis capability is more flexible, so the result can vary in all 5 dimensions. This is already running on nomad3 and high availability nomads.ncep.noaa.gov. (*j*) --------- 07JUN2010 {With little notice a swap between production and development super-computers occured over the weekend. Some data might me missing or slow on nomad[1,3,5] development servers. High availability servers, nomads.ncep.noaa.gov, are not affected. (*j*) } - -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Production switch to Stratus Date: Sun, 06 Jun 2010 07:59:38 -0400 From: SDM To: _NCEP.List SP-Announce All, Due to production running slow on Cirrus, production will be switching to Stratus over the next hour. SDM - Joe Carr -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Cirrus - Baseline test run Date: Sun, 06 Jun 2010 15:22:45 -0400 From: sdm To: _NCEP.List SP-Announce All, IBM replaced hardware on Cirrus. NCO must run a baseline test to ensure the system is running as expected. This baseline test will require all users be removed from Cirrus during the test. This is a notification that all users will be removed from the system in the near future for a baseline test. Once the test is complete, all users will be allowed back onto the system. SDM Joey _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Cirrus Baseline Test Date: Sun, 06 Jun 2010 15:32:37 -0400 From: Don Avart Reply-To: davart@ebi-llc.com Organization: eBusiness Integrators, LLC To: NCEP.List.SP-Announce@noaa.gov Beginning at 15:30 local loadleveler will be drained on Cirrus. At 16:00 local all remaining loadleveler jobs will be canceled and users will be logged off. IBM will perform a GPFS performance test on Cirrus and then PMB will perform a system baseline test. Upon successful completion of both tests, access will be restored and loadleveler will be resumed. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 04JUN2010 Data flow crons did not and are not functioning since the switch described below on 02JUN2010 between production and development on June 2. I have a letter into the SP-support. The high availability server nomads.ncep.noaa.gov is not affected by these problems so all the data will be there and on time. I am submitting some manual jobs to get some of the data onto the server. --------- 02JUN2010 {This means that there may be some delay or missing data on the development NOMADS as there will be no data flow during this time. The servers will remain in operation and static data sets such as the reanalysis will continue to be available. The real time data on high availability NOMADS (nomads.ncep.noaa.gov) will be unaffected. (*j*) } -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Production Switch to Cirrus Today 8:30 AM Local Date: Wed, 02 Jun 2010 07:16:27 -0400 From: Catherine Schaefer To: NCEP.List.SP-Announce@noaa.gov Production will be switched from Stratus to Cirrus this morning, Wednesday 2 June, 2010 at 8:30 AM local. Beginning 7:30 AM all jobs will be drained on Cirrus. At 8:30 AM all remaining jobs will be cancelled and users that do not have production access will be logged out. Once all production jobs have been vacated from Stratus, development access will be allowed. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 13MAY2010 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 13 May 10: Cirrus Parallel Production Test Date: Wed, 12 May 2010 20:13:08 -0400 From: root@cirrus.ccs.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The IBM support team will be installing a patch on Cirrus at 07:00 local Thursday May 13. Following the patch installation and testing, the SPA team will perform a parallel prodcution test of the 12z cycle. At 06:00 LoadLeveler queues will be drained. At 07:00 all remaining jobs on Cirrus will be drained and all users will be logged off. Once maintenance and system checkout have completed, the parallel production test will begin. User access will be allowed during the parallel test. Upon completion of the parallel test Cirrus will be reconfigured for development. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 12MAY2010 {This means that some data might be late or missing on nomad[1,3,5]. All data will be ontime at the high availability server http://nomads.ncep.noaa.gov. (*j*) } -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 2 hour Emergency Cirrus Maintenance Date: Wed, 12 May 2010 15:08:16 -0400 From: Don Avart To: NCEP.List.SP-Announce@noaa.gov Cirrus will be taken down immediately for system maintenance. This work is expected to take 2 hours. Once all work and system checkout has been completed user access will be restored. Users will be notified via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 22APR2010 nomad3 OPeNDAP/(DODS)/GDS and other applications running OK now. Data delivery for all nomad development delayed (see below) as mirrors have not been filled as of 1300 20100422. High availability server has all data on time. (*j*) -- -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 21 Apr 10 Cirrus Maintenance Outage: Wednesday 21 April, 2010 Date: Tue, 20 Apr 2010 09:36:14 -0400 From: Don Avart To: NCEP.List.SP-Announce@noaa.gov Beginning 06:30 local, all LoadLeveler jobs will be drained. At 07:30 all remaining jobs will cancelled, users logged out, and LoadLeveler will be stopped. Upon completion of scheduled maintenance and following successful verification, the SPA team will perform a parallel production test during the 12z cycle. Once the parallel test is complete and all data transfers have caught up, user access will be restored. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 09MAR2010 {This means that there may be some delay in getting data to the development servers nomad[1,5]. nomad3 is still in the shop. nomads.ncep.noaa.gov is 24/7. (*j*)} --------Original Message -------- Subject: [NCEP.List.SP-Announce] scheduled production switch to Stratus on 3/10 Date: Tue, 09 Mar 2010 08:56:17 -0500 From: SDM To: _NCEP.List SP-Announce Production will switch from Cirrus to Stratus on Wednesday March 10 at 07:30 local time (1230Z). Beginning at 06:30 (1130Z)all development jobs will be drained on Stratus. At 07:30 all remaining development jobs will be cancelled and users that do not have production access will be logged out. --------- 08MAR2010 {This just in.... It means that nomad[1,5] will not be updated during the downtime but nomads.ncep.noaa.gov will have all the data on time.(*j*) -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 09 Mar 10: STRATUS - s2n6 System Maintenance Date: Mon, 08 Mar 2010 16:16:47 -0500 From: Curtis Fields (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Starting at 0700 local, s2n6 will be drained and interactive users will be logged off. All crontabs will will be stopped in order to facilitate the update of the EtherChannel device. Once the system and networking maintenance has been completed, s2n6 will be reactivated for Development use. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 01MAR2010 -------- Original Message -------- Subject: [EMC #HCR-19519-765] Nomad3 Date: Mon, 01 Mar 2010 10:17:50 -0500 From: EMC Help Desk Reply-To: EMC Help Desk To: Jordan.Alpert@noaa.gov, Jun.Wang@noaa.gov Jordan and Jun, As I'm sure you have already noticed, Nomad3 is currently down. It looks to be a problem with the power supply. I will do what I can but it does not look promising. Kyle Nevins EMC Help Desk Phone: 301.763.8000 x7299 Email: emc.helpdesk@noaa.gov Web: http://www2.emc.ncep.noaa.gov --------- 24FEB2010 {Switch did not take place but will be rescheduled. Since the develdpment super computer was off line to development some data could be late on nomad[1,3,5] (*j*)} -------- Original Message -------- Subject: [NCEP.List.SP-Announce] CCS Switch cancelled Date: Wed, 24 Feb 2010 07:28:49 -0500 From: SDM To: _NCEP.List SP-Announce The CCS switch has been canceled due to CWD. IBM will give Stratus back to developers asap. SDM --------- 23FEB2010 SDM wrote: Production will switch from Cirrus to Stratus on Wednesday February 24 at 07:30 local time (1230Z). Beginning at 06:30 (1130Z)all development jobs will be drained on Stratus. At 07:30 all remaining development jobs will be canceled and users that do not have production access will be logged out. {This means that the development nomad[1,3,5] servers may have interrupted data flow during the switch. The high availability, 24/7 server nomads.ncep.noaa.gov, are not affected and will be up to date and on time. (*j*)} --------- 16FEB2010 {This means that there will be a short time when nomad[1,3] will be unavailable Tuesday 3:00pm 20100217 2000Z. As usual, the high availability, 24/7, nomads.ncep.noaa.gov, NOMADS server will be unaffected. (*j*) } -------- Original Message -------- Subject: Re: [EMC #CHV-51386-582] Nomad Updates Date: Mon, 15 Feb 2010 10:08:48 -0500 EMC Help Desk wrote: > Jordan and Jun, > >> so would tomorrow around 3:00 p.m. be good? I would prefer to do it when I >> am in the office so I can act if the system goes down and can update you if >> there is an issue or if the system forces a standard disk check. > There are kernel updates available for Nomad1 and Nomad3 which require > reboots. Nomad5 does not have any updates at this time. When would be the > best time to update and reboot Nomad1 and Nomad3? > > Kyle Nevins > > EMC Help Desk > Phone: 301.763.8000 x7299 > Email: emc.helpdesk@noaa.gov > Web: http://www2.emc.ncep.noaa.gov --------- 28JAN2010 {...yet again but I do not think this will have that much impact. nomad[1,3,5] data flow is from Stratus (IBM-SP super copmputer) and the production is where the development usually has been. nomad[1,3,5] should recover automatically. There was not a lot of lead time for this but the nomads.ncep.noaa.gov NOMADS server will be unaffected. (*j*)} --- -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 28 Jan 10: Stratus Maintenance Outage Date: Thu, 28 Jan 2010 06:12:05 -0500 From: Madhuveer Konidena (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Beginning 07:30 local today Thursday Jan 28, Stratus will incur a 2 hour maintenance outage. At 06:30 local, all LoadLeveler jobs will be drained. At 7:30 all remaining jobs will be cancelled and users logged off. After the maintenance has been completed, production staff will run the baseline verification test. Once the verification process has completed, user access will be restored and notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 27JAN2010 Dataflow switched for nomad[1,3,5]. No delay on nomads.ncep.noaa.gov. (*j*) --- -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Production Switch Complete Date: Wed, 27 Jan 2010 08:06:02 -0500 From: catherine.schaefer@noaa.gov To: NCEP.List.SP-Announce@noaa.gov Production has been switched to Cirrus and Stratus has been returned to development. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --- Production will switch from Stratus to Cirrus today at 07:30 local time. Momentarily, all development jobs will be drained on Cirrus. At 07:30 all remaining development jobs will be canceled and users that do not have production access will be logged out. *When PMB determines* that the switch is complete, Stratus will be returned to development. The switch back to Stratus has been tentatively scheduled for 07:30 local on Wednesday February 3, 2010. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 26JAN2010 Beginning 07:30 local on Thursday Jan 28, Stratus will incur a 2 hour maintenance outage. At 06:30 local, all LoadLeveler jobs will be drained. At 7:30 all remaining jobs will be cancelled and users logged off. After the maintenance has been completed, production staff will run the baseline verification test. Once the verification process has completed, user access will be restored and notification will be sent out via sp-announce. Production will switch from Stratus to Cirrus on Wednesday January 27 at 07:30 local time. Beginning at 06:30 all development jobs will be drained on Cirrus. At 07:30 all remaining development jobs will be cancelled and users that do not have production access will be logged out. The switch back to Stratus has been tentatively scheduled for 07:30 local on Wednesday February 3, 2010. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce -------- 21JAN2010 {This came in today and indicates that 1/27 will be a production/dev switch and 2/03. On these two days the data will not be available for some period of time on nomad[1,3,5], but nomads.ncep.noaa.gov will have the data on time. (*j*)} --- -------- Original Message -------- Subject: Head's up on maintenance prod switch next week Date: Thu, 21 Jan 2010 12:51:23 -0500 From: Tammy Braun Organization: NOAA To: _NCEP All EMC There will be a special CCS prod switch to Cirrus next week (01/27) so that a special microcode fix can be installed on Stratus on 01/28 - this is the same fix that was installed on Cirrus today (see below). NCO will announce this before it happens but here's an early warning so you can make sure your file mirror is up to date. According to NCO's schedule, they'll switch prod back to Stratus on 02/03. --- Cirrus had Fastt1 multiple disk failures (on a single RAID array) on 12/31 caused all gpfs to go offline. Root cause of failure was a 'bad zone recovery' error in the ESM module; an ESM firmware update was installed on 01/21 to fix this problem & prevent another gpfs crash (NCO didn't want risk losing the whole gpfs filesystem if this happened again before the next maintenance cycle starts at the end of Feb). Beginning 12:30z on Thursday January 21, Cirrus gpfs will be unavailable. At 11:30z all LoadLeveler queues will be drained. At 12:30z any remaining jobs will be canceled. IBM will be performing a microcode update to the GPFS disk that is expected to take approximately 2.5 hours. Immediately following the update, the SPA team will run the production baseline test. This test is expected to take approximately 45 minutes. Assuming no problems are detected, user access will be restored and the development queues will be resumed. --- {I did not think there would be much impact from the work indicated in the message from 19JAN. Indeed it did impact as indicated below. As of 15Z IBM super computer is still down and when it returns the production mirror will need to be filled which takes a few hours... (*j*)} -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 21 Jan 10: Cirrus Maintenance Outage Date: Thu, 21 Jan 2010 06:16:54 -0500 From: Madhuveer Konidena (sysadmin) To: NCEP.List.SP-Announce@noaa.gov At 7:30 AM today, Jan 21st, Cirrus will incur a 2 hour maintenance outage. At 6:30 AM local, all LoadLeveler jobs will be drained. At 7:30 all remaining jobs will be cancelled and users logged off. After the maintenance is completed, notification will be sent via sp-announce and the system will be returned to users. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 19JAN2010 This should not impact nomad[1,3,5]... (*j*) -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 21 Jan 10: Cirrus GPFS Outage beginning 12:30z Thursday Jan. 21 Date: Tue, 19 Jan 2010 15:44:38 -0500 From: root@cirrus.ccs.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov Beginning 12:30z on Thursday January 21, Cirrus gpfs will be unavailable. At 11:30z all LoadLeveler queues will be drained. At 12:30z any remaining jobs will be cancelled. IBM will be performing a microcode update to the GPFS disk that is expected to take approximately 2.5 hours. Immediately following the update, the SPA team will run the production baseline test. This test is expected to take approximately 45 minutes. Assuming no problems are detected, user access will be restored and the development queues will be resumed. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 12JAN2010 This means that some data may be missing on nomad[1,3,5] but high availability server http://nomads.ncep.noaa.gov will have all the data present on time. (*j*) -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Production Switch to Stratus 07:30 Local Date: Tue, 12 Jan 2010 09:59:58 -0500 From: catherine.schaefer@noaa.gov To: NCEP.List.SP-Announce@noaa.gov Tomorrow at 07:30 local, production will be switched to Stratus. At 06:30 local, development jobs will be drained on Stratus. At 07:30 all remaining development jobs will be killed on Stratus and users logged off. *When PMB determines* that the production switch is complete, an announcement will be made and Cirrus will be released to developers. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 06JAN2010 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] UPDATE: Stratus "Black Start" Test Date: Wed, 06 Jan 2010 09:50:36 -0500 From: Don Avart To: NCEP.List.SP-Announce@noaa.gov IBM is experiencing problems with the monitoring hardware for the Black Start test in Gaithersburg. Therefore, the test start has been delayed. Users should not expect access to Stratus to be restored before 17:30z. IBM will continue to provide updates via sp-announce as more information becomes available. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 05JAN2010 The following should not result in any missed data unless it takes longer (*j*)] -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Stratus "Black Start" Test Date: Tue, 05 Jan 2010 09:20:01 -0500 From: Don Avart To: NCEP.List.SP-Announce@noaa.gov Due to the scheduled "Black Start" test in Gaithersburg, Stratus will be unavailable to developers from 13:30z until approximately 15:30z on Wednesday January 6, 2010. Beginning 12:30z all development jobs will be drained on Stratus. At 13:30z all remaining jobs will be cancelled. Once the test has been concluded, access to Stratus will be restored and users will be notified via sp-announce. [The following has been taken care of (*j*)] -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Production Switch to Cirrus Complete Date: Tue, 05 Jan 2010 09:27:35 -0500 From: catherine.schaefer@noaa.gov To: NCEP.List.SP-Announce@noaa.gov The production switch to Cirrus has been completed. Production has now released Stratus and development access has been resumed. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Production Switch to Cirrus 07:30 local Date: Tue, 05 Jan 2010 06:39:53 -0500 From: catherine.schaefer@noaa.gov To: NCEP.List.SP-Announce@noaa.gov At 07:30 local, production will be switched to Cirrus. Momentarily, development jobs will be drained from Cirrus. At 7:30 any remaining development jobs will be killed and developers will be logged off. Production will then resume on Cirrus. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 22DEC2009 This means that data may not flow to nomad[1,3,5] but the WOC high availability server will have the data. (*j*) -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 5 Jan 10: Production switch from Stratus to Cirrus on 1/5/2010 07:30 Date: Fri, 18 Dec 2009 12:36:00 -0500 To: NCEP.List.SP-Announce@noaa.gov In support of the Gaithersburg facility "Black Start" test, operations will be switched from Stratus to Cirrus at 07:30 local on Tuesday January 5, 2010. Beginning 06:30 development jobs will be drained on Cirrus. At 07:30 all remaining development jobs will be cancelled and users that do not have production access will be logged out. The switch back to Stratus has been tentatively scheduled for 07:30 local on Thursday January 7, 2010. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 05NOV2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 5 Nov 09: Update: Stratus Maintenance Outage and Data Migration Date: Thu, 05 Nov 2009 07:46:38 -0500 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Stratus Quarterly update completed successfully and the data migration between /gpfs/s and /gpfs/s4 is progressing. Once all of the targeted data has been migrated, IBM will begin the verification process. At this time we estimate that user access to Stratus will be enabled by 14:00 local. We will send out an update if more time is required. Once access has been restored IBM will send out an annoucement via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 04NOV2009 There could be no data flow to nomad[1,3,5] today. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 04 Nov 09: Stratus Maintenance Outage on Wednesday 4 Nov. 2009 Date: Wed, 04 Nov 2009 06:33:40 -0500 From: Madhuveer Konidena (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Beginning 06:30 local, all LoadLeveler jobs will be drained. At 07:30 all remaining jobs will cancelled, users logged out, and LoadLeveler will be stopped. Upon completion of scheduled maintenance and following successful verification, the IBM team will begin data migration from /gpfs/s to /gpfs/s4. This work is expected to take a minimum of 24 hours to complete. User access will not be restored until all data has been migrated and verified. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 28OCT2009 [Things seems a little confusing today. I place the announcements I received in reverse order below. (*j*)] -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Correction to previous message about Stratus Maintenance Date: Wed, 28 Oct 2009 09:07:00 -0400 From: catherine.schaefer@noaa.gov To: NCEP.List.SP-Announce@noaa.gov Stratus maintenance is NOT scheduled for today, 10/28. Stratus will be returned to developers today when Production Management Branch is satisfied that the production switch has been successful. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Stratus Maintenance Today Date: Wed, 28 Oct 2009 07:46:51 -0400 From: catherine.schaefer@noaa.gov To: NCEP.List.SP-Announce@noaa.gov Correction to prior message: Stratus will be undergoing maintenance, not returned to developers at 08:30 AM local. Sorry for any inconvenience. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Production Switch to Cirrus at 08:30 AM local Date: Wed, 28 Oct 2009 07:41:43 -0400 From: catherine.schaefer@noaa.gov To: NCEP.List.SP-Announce@noaa.gov Today, October 28th at 08:30 local, production will be switched to Cirrus. Momentarily, the development queues will be drained on Cirrus in preparation for the switch. At 08:30, any remaining development jobs will be killed and developers will be logged off Cirrus. Production will resume on Cirrus, and Stratus will be returned to developers at approximately 08:30 local. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 27OCT2009 The following means that development nomads nomad[1,3,5] will not receive any data flow from operations beginning 12Z (8AM EDT) 10/29. Use http://nomads.ncep.noaa.gov, the high availability 24/7 server. It is rumored that the data flow will be interrupted for about 3-hours. (*j*) -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 29 Oct 09: 3 Hour Stratus Outage Thursday 29 October 09:00 - 12:00 Local Date: Tue, 27 Oct 2009 15:38:54 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov IBM will be conducting a "Black Start" test of the NCEP Phase IV electrical and cooling systems. During this test Stratus be unavailable to users. Beginning at 08:00 (Local) on Thursday 10/29 all loadleveler queues will be drained. At 09:00 all remaining jobs will be cancelled and users will be logged out. Upon completion of the test, access will be restored and users will be notified via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 15OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage on 10/15/2009 Date: Thu, 15 Oct 2009 12:58:35 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The current Cirrus outage will be extended until approximately 17:30 local. Once the work is complete user access will be restored. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 14OCT2009 Beginning 09:30 local on Thursday 10/15 Cirrus will be unavailable to non-production users. --------- 13OCT2009 (Anytime there is production switch and dev is taken by production there will be no data flow to nomad[1,3,5]. One should use the http://nomad.ncep.noaa.gov high availability server. Thus, "Vapor" changes do not affect nomad servers. If Cirrus is unavailable then there is no data flow to nomad servers. (*j*) ) - Please note the following scheduled maintenance activities and plan accordingly: Vapor update 10/14/09: Beginning 14:00z Vapor will be unavailable for 24 hours. Upon completion of scheduled maintenance and system verification user access will be restored. Cirrus update 10/21/09: Beginning 11:30z Cirrus will be unavailable for 24 hours. Upon completion of scheduled maintenance and system verification user access will be restored. Production Switch to Cirrus 10/28/09: A production switch from Stratus to Cirrus is scheduled to begin 11:30z on 10/28/09. Beginning at 10:30z all development queues on Cirrus will be drained. At 11:30z all remaining development jobs will be cancelled. Users that do not have production access will be logged out and their crontabs will be moved to their home directories. Non-production users will be granted access to Stratus once the failover has successfully completed. Stratus update 11/4/09: Beginning 11:30z Stratus will be unavailable for 24 hours. Upon completion of scheduled maintenance and system verification user access will be restored. Production switch back to Stratus 11/12/09: A Procuction switch from Cirrus to Stratus is scheduled to begin 11:30z on 11/12/09. Beginning at 10:30z all development queues on Stratus will be drained. At 11:30z all remaining development jobs will be cancelled. Users that do not have production access will be logged out and their crontabs will be moved to their home directories. Non-production users will be granted access to Cirrus once the failover has successfully completed. --------- 02OCT2009 High Availability Server: http://nomads.ncep.noaa.gov Users of NOMADS are reminded that they should use the URL http://nomads.ncep.noaa.gov/ to access the system and they will always be placed on the current active server. Starting on Tuesday October 7, 2009 at approximately 1400 UTC, users that have been using direct IP addresses to access NOMADS systems may no longer be able to access the system. --------- 30SEP2009 The anouncement below means that there will be no data flow to development NOMADS servers nomad[1,3,5].ncep.noaa.gov until 10/5/2009. The high availability server http://nomads.ncep.noaa.gov will continue to have the data on time. All, Production has been switched to Cirrus. Stratus is down due to scheduled new disk drive installation by IBM. Production is scheduled to be switched back to Stratus on Monday 5 October 2009. Dew will be shut down at 1600z today and will remain down until further notice. --------- 26SEP2009 Another nomad[1,3,5] data flow interruption is scheduled as indicated below in the sp-announce list_server (it means no data flow will be availabile to the development NOMADS servers but the high availability server should remain uo to date: Beginning 07:30 local on Wednesday 30 September, production will be switched from Stratus to Cirrus. Upon successful completion of the production switch, Stratus will be shutdown in order to facilitate the new disk installation. The disk installation process is expected to take approximately 36 hours. Development access to Stratus will be enabled as soon as the new disk has been installed and validated. Production will remain on Cirrus through the weekend. We will switch production back to Stratus on Monday 5 October at 07:30. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 25SEP2009 NOMADS development servers have returned to service including DODS/OPENDAP. The super computer development side returned around noon on Thursday but file corruption kept DODS/OPENDAP from running. Also the servers came up with a secure shell problem so no data was written to development serves. These problems are fixed (1500Z) now and data is flowing. (*j*) --------- 23SEP2009 The message below, received today, indicates that there will be no data flow to the development systems, particulary nomad[1,3,5] until further notice. The high availability server, http://nomads.ncep.noaa.gov will have all the data. These warning are also available from the list_server: NCEP.List.SP-Announce@noaa.gov (*j*) ------------------------- Original Message -------- Subject: [NCEP.List.SP-Announce] Cirrus/Dew offline Date: Wed, 23 Sep 2009 13:11:18 -0400 From: SDM To: _NCEP.List SP-Announce All, Dew and Cirrus are being taken down due to cooling problems in Fairmont. SDM - Joe Carr --------- 18SEP2009 The message below means the development super computer will be unavailable on Monday near Noon and will not return until Thurs after the data mirror is replinished therefore no real time data will get to development nomad[1,3,5]. (*j*) The high availability server will have the data. Beginning 13:00 EDT Monday 9/21/2009 Cirrus will be unavailable. The IBM team will be installing additional disk on Cirrus during this outage. All development queues will be drained at 12:00 EDT. At 13:00 all remaining jobs will be cancelled and any remaining users will be logged out. The cron daemons will be stopped on all cirrus interactive nodes. Therefore, when access to cirrus is restored existing crontabs will be resumed. Work is scheduled to complete by 01:30 9/24/2009. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 09SEP2009 This means that there will be no data flow for the developnment servers 1,3,5, but the high availability 24/7 sever http://nomads.ncep.noaa.gov will have all the data. (*j*) -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Scheduled Fairmont Power Outage...Shutdown of Cirrus and Dew Date: Wed, 09 Sep 2009 09:45:20 -0400 From: SDM To: _NCEP.List SP-Announce On Friday, September 11, facilities work at Fairmont will result in the shutdown of both Dew and Cirrus. They will be unavailable for up to 18 hours. The schedule that we are working to is as follows: Note: EDT has been utilized for all times. September 11th 12:00PM-1:00PM Configure Cirrus nodes 1-8 to run production (DEV jobs will be suspended). 12:00PM-1:00PM Take down Cirrus frames 9-14. 2:00PM-2:45PM Relocate Cirrus networking to Cisco 6509. 2:45PM-4:15PM Cirrus and Dew shutdown 4:15PM-4:30PM Shutdown lnxfmt1, lnxfmt2, smsfmt1, smsfmt2, and svn-fmt 4:15PM-4:30PM Shutdown sdmfmta and sdmfmtb 4:30PM-4:45PM Shutdown Dew Force 10, Cirrus Force 10, and Cisco 6509 5:00PM- Fairmont power shutdown September 12th 1:00AM - Power is restored 1:15AM – 1:45AM Power up 6509, Cirrus Force10, and Dew Force10. 1:45AM – 2:15AM Power on lnxfmt1, lnxfmt2, smsfmt1, smsfmt2, svn-fmt, sdmfmta sdmfmtb, and CWS. 2:15AM – 2:30AM Verify connectivity, routing, power supplies and redundancy 2:15AM – 2:30AM Bring up disk for Cirrus and Dew. 2:30AM – 7:00AM Power up Cirrus, and Dew and test. 7:00AM - Release Cirrus and Dew --------- 29JUL2009 There have been a number of firewall problems as we move onto new super computing systems. When develdopment servers are down one can use the 24/7 high availabilty server http://nomads.ncep.noaa.gov In addition there will continue to be switching of the operational and development systems and when this happens it often means the development has to be shut down to enable the operations and when this happens there is no dataflow to the development servers althought dataflow continues to the high availability server. The schedule is complex but I include the following summary and invormation from John Ward: Hopefully you have all received the e-mail outlining the final round of tests that NCO will be performing over the next two weeks. I won't repeat the schedule here, but basically, as of Friday afternoon Cirrus & Stratus will be configured as the Development & Production, followed by a week or more of nearly daily switches of Dev & Prod. During the weeks of August 3 & 10, production will be switched about 8 times and the systems will be rebooted at least twice. Dev users will only have access to the Development machine throughout these switches. Since it isn't practical to mirror all your data between Cirrus & Stratus, Dev users should attempt to copy the bare minimum they will need to continue some level or work during these two weeks. All Classes & Groups should be configured the same as on Mist & Dew. I would recommend that all DevOnProd & Class1onProd users verify their access to both Cirrus & Stratus on Friday afternoon, after the systems have been configured as Prod & Dev. If you have any problems with access or running jobs, you should immediately notify IBM support, since the system will remain in that configuration for the weekend. --------- 10APR2009 The problems from bandwidth reductions noticed in early March should be mitigated by the decision below. nomad[1,3,5] will be back to normal. All should note that by the end of this year NCEP plans to switch completely to GRIB2. When all operations stops producing GRIB1 files, perhaps by this Fall, there will be no choice but to only have grib2 files. This should be transparent to most NOMADS users. -------- Original Message -------- Subject: Bandwidth Increase to NOMADS RFC Approved Date: Thu, 09 Apr 2009 13:59:06 -0400 From: Bill Lapenta Organization: NCEP/EMC To: Jordan Alpert CC: daniel Starosta , shrinivas Moorthi RFC approved and expected implementation date is 14 April. Thanks for the coordination. --------- 24MAR2009 -------- Original Message -------- Subject: [EMC #10789]: Slow network speed on NOMADS data Date: Tue, 24 Mar 2009 17:08:07 -0400 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov References: Jordan, I have submitted an RFC today to have the port speed increased from 10 Mbps to 100 Mbps on both nomad3 and nomad5. -Kyle ----- I am hoping this will fix the slow data transfer problems that we have been having on the development servers, nomad[1,3,5]: --- System admin wrote... Your request #128762 was updated by reginald.pace: Kyle, I got the green light to proceed with port speed increase from 10-100Mbps. Can you submit the RFC today and schedule for first thing next week? -Reggie -------- 20MAR2009 We have noticed a degradation in the transfer speeds for the last week, from our development servers, nomad[1,3,5], and the system admins are working on the problem. 13MAR2009 NCEP production/development IBM-SP super computers are changing. Development NOMADS will move today from Dew to Cirrus as Dew will not be available to development accounts (NOMADS) at COB today. Users should find this switch transparent. NOMADS high availability 24/7 server at the Web Operations Center http://nomads.ncep.noaa.gov will be unaffected. --------- 09FEB2009 nomad1 is in a state where all the data areas are showing read_only. I have sent a message to the helpdesk. -------- Original Message -------- Subject: [EMC #10515]: nomad1 problem Date: Mon, 09 Feb 2009 11:54:32 -0500 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov References: <200902091453.n19Er0wk030565@mailrt1.ncep.noaa.gov> Jordan, The disk arrays ran into issues because of an overflow of I/O to the RAIDs which caused the controller to shut them off. I have rebooted the system and they are now back online. At this time it is functioning normally; however, if this issue occurs again, it will require firmware updates which we will coordinate at that time. -Kyle Jordan --------- 18DEC2008 The testing for the change (see 5DEC 2008) had system/ops taking the develpment super computer for their work on Dec 17. As the message below states the system has returned, and we have restarted cron. "Transfer" jobs refers to the development mirror from which the experimental NOMADS servers nomad[1,3,5] get their data. NOMADS jobs need to be started manually which Jun and I have already done. Even though the system came back early this morning it takes a day to get the mirror replinished so all should use the new high availability server, 24/7, at http://nomads.ncep.noaa.gov Below is the message from super computer system. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] SP-ANNOUNCE LIST NOTIFICATION - Planned Dew maintenance work Date: Thu, 18 Dec 2008 05:50:59 -0500 From: SDM To: _NCEP.List SP-Announce All, MIST is now available for developers. All upgrading and testing has been done, all transfer jobs have been restarted. SDM Mike Wooldridge Correction: Mist is expected to be returned to development approximately 07:30 18 Dec 08. -Don -----Original Message----- From: ncep.list.sp-announce-bounces@lstsrv.ncep.noaa.gov [mailto:ncep.list.sp-announce-bounces@lstsrv.ncep.noaa.gov] On Behalf Of Don Avart (sysadmin) Sent: Wednesday, December 17, 2008 5:54 AM To: ncep.list.sp-announce@noaa.gov Subject: [NCEP.List.SP-Announce] 17 Dec 08: Mist Maintenance Wed. Dec 17 2008: 24 hours Beginning 07:30 on Wednesday 17 Dec 08 Mist will be unavailable for system maintenance. At 06:30 all development LoadLeveler queues will be drained. At 07:30 any remaining jobs will be cancelled and all users will be logged off. Once system maintenance has been completed Mist will be turned over to production for parallel operations and testing. Mist is expected to be returned to development approximately 07:30 17 Dec 08. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 05DEC 2008 {This means that data flow may begin on nomad[1,3,5]. Note also that there will be another data flow interuption when Dew and Mist are interchanged later this month -- not yet announced.} Subject: [NCEP.List.SP-Announce] 5 Dec 08: Dew testing complete. Cron Available Date: Fri, 05 Dec 2008 05:11:15 -0500 From: Don Avart (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Production testing of Dew is complete. Users are now free to restore crontabs. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 01DEC 2008 Beginning 07:30 on Wednesday 3 Dec 08 Dew will be unavailable for system maintenance. At 06:30 all development LoadLeveler queues will be drained. At 07:30 any remaining jobs will be cancelled and all users will be logged off. Once system maintenance has been completed Dew will be turned over to production for parallel operations and testing. Dew is expected to be returned to development by 07:30 4 Dec 08. {This means that the data flow to nomad[1,3,5] may be interrutped or late but the flow to http://nomads.ncep.noaa.gov should be OK.} --------- 25NOV 2008 I hope everybody is aware of the switch of Operations to Dew on 3 December as part of the quarterly OS upgrade. Production will be on Dew for 2 weeks, so please check to be sure you are ready for the switch. {This means that the data flow to nomad[1,3,5] may be interrutped or late but the flow to http://nomads.ncep.noaa.gov should be OK.} --------- 21NOV 2008 nomads6.ncdc.noaa.gov tenure as a backup server will end beginning on December 1! I have been informed by the NCDC group due to security limits on the nomads6 server, the backup server, will be turned off shortly. It will be unavailable for a time (most of December) but will return with http and ftp service only -- at least a first. Some NOMADS applications and GDS might be returned at some point but it will no longer be the backup server. By the end of this year, NOMADS real time model files will be on the high availability server at the WOC so we should not need such a backup. I encourage all to use that server, http://nomads.ncep.noaa.gov and also the development servers are in operation. Other applications on the backup server in the last week have made it impossible to update the GDS server as it is too busy so GDS is unlikely to return at all. A lot of the problems you have encountered this week has been due to competition (high load average) of other applications on the server in anticipation of the renovation. In the future, the server will continue to run ftp, http services and pdisp, ftp2u, http and GDS are operating but updating new files will be erratic, and will continue that way through the rest of this month. A copy of the reanalysis and other data sets should remain when the system is returned to operation in 2009 and still be available. --------- 12NOV 2008 Update #3 All, Dew has returned to service and developers can use DEW. Transfer jobs are currently running and may take some time to catch up (overnight) with all model products. SDM Grant Newby --------- 12NOV 2008 This just in from action director GWCB: 1600Z: i Folks, Power was lost to Dew this morning. IBM will have to fsck the file system once power is restored. I would not expect Dew to become available until very late today or tomorrow morning. DevonProd will also be unavailable until /com is synced between Mist & Dew. John Another outage for the Super-computer.... -------- Original Message -------- Subject: [NCEP.List.SP-Announce] DEW down due to power problems Date: Wed, 12 Nov 2008 09:28:21 -0500 From: SDM To: _NCEP.List SP-Announce All, The Dew supercomputer is currently down due to power problems at the Fairmont Facility. It is currently unknown how long Dew will be down. Updates will be provided as more info becomes avbl. SDM - Joey 08NOV 2008 --------- Sorry I did not get this out on time ..... (*j*) -------- Original Message -------- Subject: [NCEP.List.SP-Announce] DEW is now available to developers Date: Thu, 06 Nov 2008 17:31:33 -0500 From: SDM To: _NCEP.List SP-Announce All, Dew is now available to developers. Please note that over 24 hours of production data needs to be mirrored over from Mist to Dew...this will take a while. Therefore a full current set of production data in /com on Dew will not be available until tomorrow sometime. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Dew GPFS problems - update #5 Date: Wed, 27 Aug 2008 14:13:28 -0400 From: SDM To: _NCEP.List SP-Announce All, Dew continues to be unavailable to development..production baseline testing will begin shortly. We expect testing and data syncing will take most of the rest of the day to accomplish, so we expect DEW will be available to developers no earlier than 12z tomorrow morning. Sorry for any inconvenience this may cause. SDM - Mark Shirey/Grant Newby IBM has no estimate of when Dew will be back in service. It is highly likely that this could possibly be an extended outage. Also, because NCEP is in a critical weather day, all developers will be taken off of Mist in order to minimize risk to Mist. Sorry for any inconvenience this may cause. 02OCT 2008 We will be shutting down Nomad1 on Monday, 15Z, October 6th to setup additional storage. It should be down for a few hours. The GFS and NAM will be on the http://nomads6.ncdc.noaa.gov/ncep_data backup. All other data sets should be available during this period. ---------- 08AUG 2008 This following means no data flow for 8/26-27 for nomad[1,3,5] -- backups at nomads6.ncdc.noaa.gov/ncep_data or nomads.ncdc.noaa.gov and ftpprd. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Dew GPFS problems - update #4 Date: Wed, 27 Aug 2008 09:38:08 -0400 From: SDM To: _NCEP.List SP-Announce All, GPFS is still not available on Dew...the file system check continues to run on Dew...it is believed that the fsck is running successfully...once the fsck is done and analyzed a more firm time will be able to be provided as to when Dew will be available again. Sorry for any inconvenience this may cause. SDM - Mark Shirey -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Dew GPFS problems - update #3 Date: Tue, 26 Aug 2008 21:56:28 -0400 From: SDM To: _NCEP.List SP-Announce All, IBM has no estimate of when Dew will be back in service. It is highly likely that this could possibly be an extended outage. Also, because NCEP is in a critical weather day, all developers will be taken off of Mist in order to minimize risk to Mist. Sorry for any inconvenience this may cause. SDM - Joe Carr --------- 18 AUG 2008 Attention NCDC-NOMADS users, The NCDC NOMADS servers will soon undergo a reconfiguration that will change the way users access data. These changes will simplify and stabilize the Uniform Resource Locators (URLs) used across the NOMADS systems; and most importantly will remove the need for specific port numbers to access data. This way future NOMADS systems changes will be transparent to users. Users will need to modify any stored URLs they have for accessing the NCDC NOMADS suite of servers which contain specific references to port numbers. (Note: these changes will have no impact on the NCEP suite of NOMADS servers.) A transition period will be used to allow users to modify their access scripts. From the period Tuesday, August 19th, 2008 to September 01, 2008, the existing access points will remain in parallel with the new configuration, which is currently in place. On September 02, 2008 all URLs that contain port numbers will be discontinued. We urge users now to change their bookmarks, OPeNDAP applications, URL references in upcoming publications, or access scripts of any kind to remove all port numbers from their links and substitute the following: Service Current URL New URL Ensemble Probability Tool http://nomads.ncdc.noaa.gov:9091/EnsProb/ http://nomads.ncdc.noaa.gov/EnsProb/ GrADS Data Server (GDS) http://nomads.ncdc.noaa.gov:9090/dods/ http://nomads.ncdc.noaa.gov:9091/dods/ http://nomads.ncdc.noaa.gov/dods/ Live Access Server (LAS) http://nomads.ncdc.noaa.gov:8085/las/servlets/dataset http://nomads.ncdc.noaa.gov/las/servlets/dataset SRRS / NCEP Charts http://nomads.ncdc.noaa.gov:9091/ncep/NCEP http://nomads.ncdc.noaa.gov/ncep/NCEP Thredds Data Server (TDS) http://nomads.ncdc.noaa.gov:8085/thredds/ http://nomads.ncdc.noaa.gov/thredds/ --------- 07 JUL 2008 The message below means there could be data delays on Wednesday, 7/9 for nomad[1,3,5], and a week later when the production switch is reversed. The backup http://nomads6.ncdc.noaa.gov/ncep_data and http://nomads.ncdc.noaa.gov are on a separate data flow and should not be affected (*j*) ------- Original Message -------- Subject: [NCEP.List.SP-Announce] 9 Jul 08: Production Switch to Dew Date: Mon, 07 Jul 2008 08:40:44 -0400 From: Don Avart (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Production will be switched from Mist to Dew beginning 07:30 local Wednesday 9 July. All non-production classes on Mist will be drained at 06:30. At 07:30 all development users will be logged off of Dew, their LoadLeveler jobs will be cancelled, and their crontabs will be moved out of /var/spool/cron/crontabs and placed into their home directories. Once production has switched to Dew, all non-production classes will be resumed. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 01 JUL 2008 This means that the data flow on nomad[1,3,5] could be delayed. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 2 Jul 08: Dew Maintenance 2 Jul 08: 24 hours Date: Tue, 01 Jul 2008 08:56:41 -0400 From: Don Avart (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Beginning 07:30 on Wednesday 2 Jul 08 Dew will be unavailable for system maintenance. At 06:30 all development LoadLeveler queues will be drained. At 07:30 any remaining jobs will be cancelled and all users will be logged off. Once system maintenance has been completed Dew will be turned over to production for parallel operations and testing. Dew is expected to be returned to development by 07:30 3 Jul 08. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 10 JUN 2008 It appears that the cache on Nomad1 was overloaded and as a result cut off communications to the storage array. I have cleared the cache, updated the kernel in order to prevent a similar situation from occuring, performed disk checks and rebooted the system and it appears to be proper working order now. The system had been online for 145 days which may have contributed to the issue occuring. -Kyle -------- Original Message -------- Subject: [EMC #8905]: nomad1 problem? Date: Tue, 10 Jun 2008 09:26:52 -0400 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov References: <200806101217.m5ACH4kx028481@mailrt1.ncep.noaa.gov> Jordan, For some reason, Nomad1 is not recognizing the storage array attached to nomad1. We have checked the connections and the hardware indicates everything is fine. I am going to unmount the drives in a short while and run some disk checks. This will require a reboot of the system. I will perform the unmounts at 9:45 a.m. this morning and the reboot shortly there after. -Kyle --------- 23 MAY 2008 This means that NOMADS may not have data flow for all or part of this weekend... All, What: Production will switch from Mist to Dew. When: 2345Z (7:45 PM EDT) Fri May 23. Why: Due to planned power maintenance on the IBM campus in Gaithersburg, Mist will be placed on back up generator at 10 PM Fri May 23 and remain on generator through Sat May 24 at midday. It is anticipated that Mist will remain up and viable through the period. A Critical Weather Day remains in place through 12Z (8 AM EDT) Sat morning. Due to the above factors, production will be switched to Dew. Developer Impact: Developers will be switched from Dew to Mist beginning at 7:45 PM Fri May 23 and remain on Mist through 7:45 AM Tue May 27. It is anticipated that production will switch back to Mist Tue morning at 7:45 AM. Duration: 84 hours SDM - Joe Carr Senior Duty Meteorologist Senior Duty Meteorologist NCEP Central Operations Production Management Branch -------- Original Message -------- Subject: [Fwd: warning: Production may switchover to dew this weekend.] Date: Fri, 23 May 2008 14:48:01 -0400 From: Tammy Braun Organization: NOAA To: _NCEP All EMC FROM GEOFF DIMEGO: It looks like there may be a switchover between mist and dew this weekend. Eric saw a message while logging in and he confirmed it with Doris Pan. IBM is doing work on the power system in Gaithersburg. We knew this because they are taking haze & hpss down. Apparently, they are worried the power work will effect mist and want to (AT THE LAST MINUTE) move production to dew. I've complained to Don Avart ... Since we are in Critical Weather Day, they won't be able to do the change until it is lifted - maybe Saturday! I can't change this. I am powerless. If you have critical jobs or crons that have to be switched by hand when there is a switchover, you might want to look in on the machine situation this weekend. --------- 08 MAY 2008 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 8 May 08: Production Bufr Lib Test on Dew 18:00 - 03:00 Date: Wed, 07 May 2008 21:20:30 -0400 From: Don Avart (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Beginning 18:00 local 8 May 2008 until 03:00 9 May production will be conducting a parallel bufr lib test. During this time Dew will be inaccessible to development users. Beginning at 17:00 local all LoadLeveler classes will be drained. At 18:00 any remaining jobs will be cancelled. Once maintenance has been completed and all systems testing and validation have completed all LoadLeveler classes will be resumed. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 07 MAY 2008 The following means that there may be some disruption in the data flow for nomad1, 3, 5 on 9MAY2008: -------- Original Message -------- Subject: [NCEP.List.SP-Announce] SP-ANNOUNCE LIST NOTIFICATION - Production switch to Dew from Mist Date: Wed, 07 May 2008 06:49:23 -0400 From: SDM To: _NCEP.List SP-Announce All, What: Production will switch from Mist to Dew. When: 1045Z (645 AM) Wed May 7 through 13Z (9 AM) Fri May 9. Why: NOAA COOP Exercise. Developer Impact: Developers will switch from Dew to Mist for the duration of the period. Duration: ~50 hours NOTE: More information is forthcoming on the scheduled bufr library test on Dew which was scheduled from 6 PM Wed May 7 through 3 AM Thu May 8. SDM - Joe Carr --------- 28 APR 2008 A (premature) switch back to dew development after emergency: Original Message -------- Subject: [NCEP.List.SP-Announce] SP-ANNOUNCE LIST NOTIFICATION - Availability of Dew Date: Mon, 28 Apr 2008 14:23:17 -0400 From: SDM To: _NCEP.List SP-Announce All, What: The mirroring of production data from Mist to Dew continues. When: The mirroring process is expected to last until about 29/0000Z. Why: The process is required as a result of an emergency switch to Mist Sunday morning April 27, and the power down of Dew at that time. Developer Impact: Developers are not expected to have complete access to all data until the mirroring process is complete. SDM - Bill Kneas --------- 27 APR 2008 Following from Central Operaions indicating that there will be no data flow for nomad1, 3, 5. Use nomads6.ncdc.noaa.gov and nomads.ncdc.noaa.gov View message header detail SDM Sent Sunday, April 27, 2008 12:29 pm To "_NCEP.List SP-Announce" Subject [NCEP.List.SP-Announce] SP-ANNOUNCE LIST NOTIFICATION - Dew Out of Service All, What: Due to a power problem at the Fairmont Site the Dew computer has been powered down. When: Power loss was approximately 1130Z (7:30 A.M) Sunday April 27, 2008 Why: Dew was shut down. The power interruption caused a loss of cooling to the facility. Developer Impact: Developers will not have access to Dew until further notice. Duration: Unknown SDM - Bill Kneas ----------- 09 APR 2008 The message below implies that on 20080409 nomad1, 3, 5 will not receive data. We have switched development and production machines last week (sorry I did not announce this), and switching back will casue no data access for a day. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] SP-ANNOUNCE LIST NOTIFICATION - Planned Mist maintenance work today Date: Wed, 09 Apr 2008 07:19:45 -0400 From: SDM To: _NCEP.List SP-Announce All, What: Mist maintenance work has begun. IBM began draining the Mist nodes at 06:30 AM and will take the entire system by 08:00 AM. When: 06:30 AM Wed Apr 9 through approximately 08:00 AM Thu Apr 10. Why: Quarterly maintenance on Mist Developer Impact: Developers will not have access to Mist during the maintenance window. Duration: ~24 hours --------- 25 MAR 2008 nomad3 is returned to service with a grib2 feature for ftp2u called g2sub which we are testing on GFS output. The grib1 holdings are still present as before. Also: -------- Original Message -------- Subject: [EMC #8090]: stale nfs handle Date: Tue, 25 Mar 2008 12:02:54 -0400 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov References: Jordan, Nomad5 has been updated and rebooted. -Kyle --------- 21 MAR 2008 nomad5 will be rebooted (after over 285 days of running) on Tuesday, March 25, 2008 at 1300Z. We did not boot it last Dec because nomad3 was rebooted and did not come back and we had to deal with it. (*j*) --------- 18 MAR 2008 All, (The following means that nomad1,3,5 data sets will be delayed/missing 3/20-21. Use nomads6.ncdc.noaa.govi/ncep_data or nomads.ncdc.noaa.gov.) Dew maintenance work scheduled for Wed Mar 19 is delayed by one day. Dew will not be available to developers during the maintenance window. When: 10Z (6 AM) Thu Mar 20 through 13Z (9 AM) Fri Mar 21. Why: Critical Weather Day was declared through 12Z (8 AM) Thu Mar 20. Developer Impact: Developers will not have access to Dew from 10Z (6 AM) Thu Mar 20 through 13Z (9 AM) Fri Mar 21. Duration: 27 hours SDM - Joe Carr Senior Duty Meteorologist Senior Duty Meteorologist NCEP Central Operations Production Management Branch --------- 29 JAN 2008 After security/firewall problems are worked out (any day now) a new nomads server is coming up at address: http://nomad1.ncep.noaa.gov nomad1 contains is own independent copy of the NCEP reanaysis (unlike nomad5 which pointed to nomad3). 0.5 degree GFS and SREF are already present and operating with more data sets to follow. Tests have been completed with these datasets, and we will work to get the rest of the datasets on nomad1 as well as resolving security/firewall problems so outside users can use the server. The server should be accessible soon as it is in the hands of sys admin security. We hope nomad3 will return to service but we do not know what is keeping it from restructuring to raid5 with new drives. nomad3 server which holds 2/3 of NOMADS real time data has not been working since Dec 24 2007. nomad3 has been "broken" since xmas when the power was found off and a subsequent restart showed a bad drive. New disks were placed in the raid5 but the raid would not restructure the disk meaning that the system was no longer a "raid(5)" and the next disk that was lost would cause all the system and data to be lost. We have saved off the code/data and the sys admins are working to report a new system. nomad5 continues to hold most of the data but reanalysis and some other data sets are not present. It has been running as a lone server since Dec 24. NCO (Last June) decreased the bandwidth of all NOMADS servers because of the possibility of NOMADS interfering with operations whenever an IBM-SP swap of prod and dev needs to be done. Even though an IBM-SP swap does not happen often, NCO felt that the increased all around usage of the network required that NOMADS bandwidth remain throttled. This may contribute to users having problems downloading data that is present. Some data like SREF is not on the nomads6 backup at NCDC since band width has been decreased. nomad3 continues to be worked on by EMC sys admin. Efforts to make NOMADS operational and move applications to the WOC/ftpprd with 24/7 service and improved reliability and band width continue and implementation is on schedule for end of this summer. --------- 11 JAN 2008 -------- Original Message -------- Subject: [EMC #7042]: nomad3 not responding Date: Thu, 10 Jan 2008 23:44:54 -0500 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov, Kyle.Nevins@noaa.gov References: Jordan, Just an update, after talking to Yinka up in NCO, he agreed that one of the disks should be replaced. The replacement disk that I put into Nomad3 was a used disk that was labeled as a replacement. He and I are going to wait until tomorrow to see if the disks arrive, if they do not, then we are going to recompile the driver and reinstall it. -Kyle --------- 07 JAN 2008 Sorry. On Friday PM nomad3 would not answer or allow a login. A message was sent to emc.helpdesk@cerberus.ncep.noaa.gov. --------- 04 JAN 2008 16Z nomad3 is operating. GDS/OPENDAP(DODS) will come up a few hours (waiting for the data logjam to ease). --------- 31 DEC 2007 -------- Original Message -------- Subject: [EMC #7042]: nomad3 not responding Date: Mon, 31 Dec 2007 11:11:56 -0500 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov, Kyle.Nevins@noaa.gov References: We are currently rebuilding the two raid5s on the system as the disks were reporting not in use and not that they were dead. We are also running another fsck on /raid2. We have shut down all network connections on the machine to ensure that no outside interference occurs. We will keep you updated upon further details. -Kyle --------- 27 DEC 2007 nomad3 status: From EMC Helpdesk: 15:34EST: The system was rebooted again and the root filesystem and the raid1 filesystem checked out as clean but the raid2 is still running the file system check. We will let that run overnight and may need some input tomorrow. Once that has completed the machine should be back online. So our target time for Nomad3 to be back online is tomorrow afternoon. fsck is still running as 0800 Thursday on file system #2. Unfortunately,I cannot give you an accurate time frame for the disk repair. However,Kyle should be in by 0930, once he arrives we will make this issue our focal point today. --------- 26 DEC 2007 -------- Original Message -------- Subject: [EMC #7042]: nomad3 not responding Date: Wed, 26 Dec 2007 09:45:58 -0500 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov References: <200712261259.lBQCx78F028930@mailrt1.ncep.noaa.gov> When I arrived this morning, I received warnings regarding nomad3, after inspection in the sever room, I noticed that nomad3 was not powered on.Nomad3 is currently powered up, however, disk checks will delay the progress of it being reachable for now. --------- 28 NOV 2007 28 Nov 07 Mist outage extended 6 hours Due to unforeseen circumstances, development access to Mist will be delayed an additional 6 hours. Upon completion of testing, notification will go out via ncep.list.sp-announce@noaa.gov. --------- 27 NOV 2007 From ncep.list.sp-announce@noaa.gov .... 24 hour scheduled outage on Mist 11/27/07. Beginning 06:30 on 11/27/07 all jobs on Mist will be drained. At 07:30 all users will be logged off and all remaining LoadLeveler jobs will be cancelled. Upon completion of maintenance and testing, a parallel production test will be run. Development access to Mist will be restored approximately 07:30 11/28/07. Notification will go out via ncep.list.sp-announce@noaa.gov. (This means that on 27NOV2007, real time NCEP NOMADS servers, nomad3 & 5 may have an interruption in data flow during this time.) --------- 06 NOV 2007 This (below) means data flow on nomad5 and 3 may be late on 11/07/2007 for a number of hours before 12z: Dew will be unavailable beginning 04:00 on 11/07/07. All non-production jobs on Dew will be drained beginning 03:00 local. Beginning 04:00 any remaining jobs will be cancelled and all users on Dew will be logged off. Upon completion of maintenance and system testing and validation, a 6 hour parallel production test will be run for the the 12Z cycle. Upon completion of the 12Z test cycle users will be allowed on Dew. Notification will go out via sp-announce@noaa.gov upon completion of maintenance and testing. --------- 31 OCT 2007 nomad5 has been up/running 139 days and nomad3 has been up 98 days. We like to reboot servers every quarter so at 3PM today we will reboot nomad3 and then nomad5. --------- 15 OCT 2007 Change of date/time see below and 09 OCT 2007... ********* UPDATE: System Maintenance on Dew Pushed Back 1 Week *************** A 24 hour maintenance period is scheduled for Dew on 10/23/07. Non-Production jobs on Dew will be drained beginning 07:00 local. Beginning 08:00 any remaining jobs will be cancelled and all users on Dew will be logged off. Upon completion of maintenance and system testing and validation, users will be allowed back on Dew. This work is not anticipated to take the entire 24 hour maintenance period. Notification will go out via sp-announce@noaa.gov upon completion of maintenance. --------- 09 OCT 2007 The following means that on 16OCT2007, NCEP NOMADS servers nomad3 & 5 most likely will have an interruption in data flow during this time. http://nomads6.ncdc.noaa.gov and http://nomads.ncdc.noaa.gov should be unaffected: --------- Subject: [NCEP.List.SP-Announce] 16 Oct 07: Dew Scheduled Maintenance 16 Oct 2007 Date: Tue, 09 Oct 2007 08:45:58 -0400 --------- A 24 hour maintenance period is scheduled for Dew on 10/16/07. Non-Production jobs on Dew will be drained beginning 07:00 local. Beginning 08:00 any remaining jobs will be cancelled and all users on Dew will be logged off. Upon completion of maintenance and system testing and validation, users will be allowed back on Dew. This work is not anticipated to take the entire 24 hour maintenance period. Notification will go out via sp-announce@noaa.gov upon completion of maintenance. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 11 SEP 2007 There are changes to the GFS post on 9/25. See http://www.nws.noaa.gov/om/notification/tin07-59gfs_upgrade_unifiedpost.txt for an official statement. The GFS 0.5 degree "master" file, on 9/25, which was a GRIB1 file, will not be made in the same way anymore, but there will be a new file to replace it on the IBM-SP. The file that is currently copied from the dev IBM-SP machine known as "0.5 degree master" with 48 levels and land surface and other fields will not be there any more.... but there will be a replacement. There will be a feed to NCDC through ftpprd 0.5 degree file: The file on ftpprd will be composed of the ...0p5... file (sometimes called the "military" file) which has 28 layers compared to the 48 layers of the nomad3 master file (and some land surface fields), and the difference between these two files in one ftpprd file so it will be --should be -- the same, except the new file is in GRIB2. NCDC should get this on their ingest system and the potential for, and planning for, a 0.5 degree data set archive there, the first of its kind, for this data set. In addition, I hope to have a copy on the real time backup server, nomads6.ncdc.noaa.gov in GRIB1 so ftp2/4u and DODS works, as well as for real time backup of nomad3. These files will not be available to the public from ftpprd. On nomad3 & 5, starting 9/25, the old 0.5 degree master file will be replaced by the ...pgrb2... (the "military" 0.5 degree) and the difference between this "file and the old master (in) from a separate file, .....pgrb2b.... which is being placed on the IBM-SP. Our plan is to get both GRIB2 files, change them to grib1, and append them, and name them so the same "master" file data set will continue on nomad3. The name "master" now refers to an internal (native model vertical and horizontal grid) GFS gaussian model (hybrid) vertical coordinate grid or GFS "physics grid" file (this is not a lon/lat pressure GRIB1 file!). It is unfortunate that we also used that name for the 0.5 degree pressure lon/lat grid. Ultimately in the future, all GFS files/products will be posted/made from this master and unify the post processing code for all NCEP models. The NOMADS goal here is to make this transition transparent. We will keep our 0.5 master file name the same and the contents should also be the same. --------- 12 July 2007 Recalling the 03 July 2007 announcement from the SP: > 16 July 07:30 local, production will switch from Dew Back to Mist. This means on July 16, at 0730 the data flow will be interrupted and that data may be delayed or unavailable for a time on nomad3 and nomad5. --------- 03 July 2007 This means data may not be present on NOMADS for this period! http://nomads6.ncdc.noaa.gov/ncep_data backup server will continue to operate. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 10 Jul 07: Updated CCS Maintenance Schedule Date: Tue, 03 Jul 2007 10:38:13 -0400 From: Don Avart (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Beginning 07:30 local Tuesday July 10 through 07:30 local Thursday July 12, Mist will be unavailable for scheduled system maintenance. In the event that work concludes early, the system will be returned to the users and notification will go out via ncep.list.sp-announce@noaa.gov. Upon completion of this maintenance, Mist will continue to operate as the development cluster. 16 July 07:30 local, production will switch from Dew Back to Mist. 23 July 07:30 local, LSI patch will be applied to Dew Storage. This work is concurrent and should not impact users on Dew. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 21 JUN 2007: Updated ftp2u to 0.8.0 beta (1) reduce incidence of premature "done" of web pages (I think the server should be updated to fix this problem.) (2) code cleanup (3) remove option to send files to user updated reanalyses and gdas ftp2u only. other updated code is in !wd23ja/cgi/ Wesley --------- 15 JUN 2007 Subject: Re: NOMADS Network Usage Date: Thu, 14 Jun 2007 16:29:21 -0400 From: Luis Cano Organization: DOC/NOAA/NWS/NCEP Louis,: Here is our current status. We have implemented the rate-limiting between the WWB NOMADS and the NOAA NOC (Internet). We see relief with the infrastructure and this component of the infrastructure is now better configured to allow proper sharing of resources. Jordon, Please let me know if there is any feedback from customers of degraded services. Thank you, Lou Luis Cano wrote: > Louis and etal: > > We are experiencing a two-fold increase of WWB NOMADS traffic to the > Internet that started two weeks ago. This usage is placing other > requirements that share the same networks to NOAA NOC at risk. In > addition, we are also experience higher-than-expected latencies with the > CCS production dataflow to the TOC. > > Here is our plan: > > 1. Today at 11:00 Eastern, we will conduct a test of rate-limiting the > NOMADS (DMZ) to an acceptable rate. This will allow NOMADS to better > share common infrastructure with other requirements. This change has the > potential of increasing transfer times to NOMADS customers. The change > will become permanent assuming a valid solution. > > 2. In parallel, we are investigating the lower latency issues with the > TOC. We will have a better understanding of this problem by this afternoon. > > I'll send a follow-up status Email by 3:00. Please call my cell if there > are questions: 202-345-7384. > > Thank you, > > Lou > --------- 31 MAY 2007 20070531: nomad3 and nomad5 servers are back on line, that is access to the servers has been restored. Data was being transmitted to nomad3 and nomad5 during the outage period. Most of the model data is present, back to (and before) May 14, except for a few days missing, and these appear to be from external problems with the IBM-SP when operations had to move to the development system, or when system administration had taken the servers. In all, the problem seems to have been in firewall conflicts that happened with the firewall settings. Some items that are still not operating: The network communications between servers, nomad5 and nomad3 are not yet up so a few data sets like the 0.5 degree master is not available for datasets shared between servers. Please Check both nomad3 and nomad5 for data sets until this can be resolved. Join the NOMADS (NCEP) list server https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.nomads-announce to get updates about problems and changes. --------- 01 MAY 2007 GFS implementation day, 01MAY2007 in case you forgot. GFS native history file changes: The GFS restart or history file (also called sigma or sigma spectral) is changing due to changes in the vertical coordinate as described below. This file is considered an internal file and is not recommended for public use. These changes should not impact most NOMADS users. This implementation of the GFS goes into operations 01MAY2007 12Z. An excerpt from "A guide to using the new GFS history file" located at http://wwwt.emc.ncep.noaa.gov/gmb/para/guidehistory/ is below: The vertical coordinate of the operational GFS forecast model will become a hybrid sigma-pressure coordinate in 2007. This will affect the file structure of the native GFS history files used by many other applications. In addition, the GRIB surface flux files will have several more fields. The implementation will not affect the GFS surface files or the posted pressure files {pressure-GRIB files}. The GFS restart files will be in an even newer format to accommodate coming anticipated changes to the GFS in succeeding implementations. No application outside of the global system needs to read GFS restart files In the near future, the GFS will output Gaussian grid files as the history files. Unfortunately, we are not ready yet to make them operational, so yet another conversion will be necessary when these files are implemented. --------- 23 APR 2007 There was an unannounced (power) outage in our central computer, and all development and data flow was down most of the day. The following day some model output files were also missing. It caused a gap in some data on NOMADS. We mention it here, a week later, for completness. --------- 21 FEB 2007 All on the list; The NCEP Operations switched to the development system and is having a problem reseting the firewall access for NOMADS data flow. NCO is working on the problem. I have shifted into a backup mode (ftpprd) and will try to get the 0.5 and nam fields operating tomorrow (2/22). The 1x1 should be OK on nomad3 and 5. Jordan --------- 25 JAN 2007 Large scale super computer changes are taking place at NCEP. As you can see from the message below (date stamp included) it is out with the old super computer system and in with the new. The new Dew supercomputer, as it is called, did not have access to NOMADS servers until Jan 24 so we are working to get the data flow moving again. It would have been better to have the data flow running on the old Blue supercomputer for a few days overlap with the new system so we could make the move transparent, but as NOMADS is an experimental prototype this was not to be. I can report that there has been progress on making NCEP Real Time NOMADS servers have operational data flow and operational user client applicaitons. This may happen by 2008. NOMADS has tried to keep the most used data sets like GFS (1x1) (0.5) and NAM up to date first but some of the less used data sets, like the MRF (legacy) 2.5 degree data set will not get updated for awhile. -------- Original Message -------- Subject: nomads Date: Wed, 24 Jan 2007 14:43:57 -0500 From: Joe Carr To: Jordan Alpert CC: Brent Gordon , John Ward Jordan, NOMADS has been turned off on both Blue and Mist. It is allowed on Dew. If you have any problems, please contact Matt Springer or Cameron Shelton. Thanks, Joey --------- 06 Dec 06 NOMADS issues. > 1. NCO will switch to Blue for operations tomorrow [06 Dec see below]and when that > happens we will not have enough bandwidth to support both operations > and NOMADS traffic. NOMADS will be out of service until Friday. Even > when NOMADS is back on line, only a little more than 50% of the NOMADS > data is available [on alternate offical servers]. > > 2. This will be an ongoing problem until the "new" TOC is up and > running. They are currently using their old system. Once the TOC is > up and running the NOMADS data will be stored at the TOC's Web > Operations Center and NCDC can pull nearly 100% of the NOMADS data > from the Web Operations Center. > > 3. The CIO is having major problems getting the new system fully on > line. If I recall correctly, the new systems was supposed to be fully > operational in Jan. 06. Ben (NCO) has agreed to send people to the > TOC to help them resolve their problems. > ----------------------------------------------------------------------- --------- 05 Dec 2006 -- NOMADS DATA FLOW OUTAGE -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Bue/White: Production Switch to Blue Date: Tue, 05 Dec 2006 15:22:18 -0500 From: Don Avart To: NCEP.List.SP-Announce@noaa.gov Due to a required network outage in Fairmont between midnight and 6 am on December 8, production will be switched from white to blue beginning 7 am local on 6 Dec until 7 am 8 Dec. Beginning at 5 am 6 Dec. LoadLeveler queues will be drained, blue will be rebooted at 6 am. Only NCEP Production and operational accounts for the NCEP Service Centers will be permitted on the system. All user accounts, cron, and interactive access will be denied. NFS will only be mounted on interactive nodes. White will remain available for user access except during the network outage. --------- 11 Nov 06 IBM-SP (Blue) data flow returns to nomad3 and nomad5 Many of the data sets you need are now available on nomad3 and nomad5. Data flow ramps up to almost normal for nomad3 and nomad5 6NOV. All parties have agreed to a long term plan for making NOMADS Operational. Some data sets are not yet transmitted, such as olr, sst, rtofs, sref etc, and we are working to get these back to normal, perhaps in a week. Check data on nomad3 or nomad5 before giving up. I can not promise that missed data in all cases will be replinished but we will see what we can do. In the short term, the data flow to the backup server at National Climate Data Center (NCDC) will not resume from the "dev" machine, but can be pulled from the ftpprd service. nomads6.ncdc.noaa.gov will still operate for archived data. (We hope that) ftpprd holdings will be improved to have more complete data sets with the goal of duplicating the content of variables, levels, times, that NOMADS presented before the outage. We will write programs and attempt to populate nomads6.ncdc... from ftpprd but it will take a little more time. Having data at the backup server, nomads6.ncdc... in real time, as well as the NCEP servers, nomad3 and 5, kept these systems from becoming over extended. Thank you all for your support and patients. The message I want to send is that NOAA management recognizes the importance of getting data out to users of all categories and is committed to making NOMADS Operational, 24/7/365. It is your requirements that are driving this process. ---------------------------------------------------------------------- --------- 26 Oct 06 Blue/White: Production Switch to Blue A production switch from White to Blue will occur beginning 6 am Thursday October 26 and ending 2 pm Thursday October 26, (8 hours). During this time period, only NCEP production and operational accounts for the NCEP Service Centers will be permitted on the system. All nfs mounted filesystems will be dismounted from compute nodes including /u (user home directories). NFS will be available on Interactive and Class 1 nodes. White will not be available for development use during this time period. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce -------------------------------------------------------------------- announce.txt : 20061016 -------- Original Message -------- Subject: Production Switch to White Date: Sun, 15 Oct 2006 17:46:02 -0400 From: Susan.Fenwick@noaa.gov To: ncep.all.hands@noaa.gov CC: John.Ward@noaa.gov > Corruption of the GPFS file system on White prevented the schedule > switch of Production to White on Saturday. GPFS has been > restored, but > the entire Production file system was lost. The file system is > currently being mirrored from Blue. > > Production is expected to be switch to White by 12Z on Tuesday, 17 > October. > --------- announce.txt : 20061012 FYI SJL -------- Original Message -------- Subject: Access To Blue Date: Thu, 12 Oct 2006 06:57:02 -0400 From: John Ward Organization: NCEP/NCO/Production Management Branch To: Stephen Lord , Jim Laver Steve & Jim, We were not able to turn on the limited list of users on Blue yesterday. We have been pushing the limit on the system this week, with on time delivery at only 94%. In addition, we have had unexpected network contention with Mist, which caused lengthly delays in delivering products. We feel that adding any additional load to Blue will cause additional delays in production and on the network. The good news is that work is ahead of schedule in Fairmont. There is a chance we will have White back on line 24 hours earlier than expected. We'll have a better estimate latter this morning. John --------- announce.txt : 20061009 All: Dave Michaud has informed me that the earliest date when EMC jobs will be turned on is Tuesday 10 October. Earlier dates proposed by Dave were rejected by NCO Configuration Board. Dave will contact you individually regarding turning on your jobs. If he doesn't contact you, your stuff will not run. I'm sorry for this situation. It is out of my control. Please pass the word if I have left someone off this email list. SJL