NERSCPowering Scientific Discovery Since 1974

Franklin Timeline

This page records a brief timeline of significant events and user environment changes on Franklin. Franklin compute nodes have been upgraded from dual core to quad core from July to October 2008.

Apr 30, 2012
Franklin is retired.
Apr 5, 19, 27, 2012
Reminder announcements on Franklin retirement schedule.
-- Thurs Apr 26, 23:59: Batch system is drained, batch queues are stopped (no jobs will be running at this point)
-- Mon Apr 30: Last day to retrieve files from Franklin scratch file systems
-- Mon Apr 30, 23:59: User logins are disabled
Mar 6, 2012
Announcement on Franklin retirement date set: 04/30/2012.
Feb 22, 2012
HW and SW maintenance. Set to default versions: pgi/11.10.0 and xt-mpt/5.3.5.
Feb 21, 2012
Update on Franklin earliest retirement date: 4/30/2012.
Dec 20, 2011
First announcement: Franklin earliest retirement date: 03/31/2012.
Oct 3, 2011
Set to default versions: pathscale/4.0.9.
Sep 14, 2011
HW and SW maintenance.  Installed field notices and software patches.  OS upgraded to CLE2.2UP03B.  GPFS upgraded to 3.3. Set to default versions: xt-asyncpe/5.0, xt-mpt/5.3.2, pgi/11.7.0, cce/7.4.2, gcc/4.5.3, xt-libsci/10.5.02, petsc/3.1.08, petsc-complex/3.1.08, tpsl/1.1.0, atp/1.3.0, xt-lgdb/1.4, perftools/5.2.2, papi/4.1.3, libfast/1.0.9.
Jun 15, 2011
HW and SW maintenance.  Installed field notices and software patches. Moab updated to 6.0.4.  Converted to new LDAP server. Set to default versions: pgi/11.3.0, perftools/5.2.0. CPU limit on Freedom set to 24 hrs. 
Apr 20, 2011
HW and SW maintenance. Installed field notices.  Set to default versions: xt-asyncpe/4.9, pgi/11.2.0, xt-mpt/5.2.1, cce/7.3.3, xt-libsci/10.5.01, petsc/3.1.05, petsc-complex/3.1.05, gcc/4.5.2, gdb/7.2.
Feb 16, 2011
HW and SW maintenance. Installed field notices. Updated system softwares: Moab, GPFS, myquota, chkmyqta. Deconfigured RDAC, configured GRES for /project. Set to default versions: xt-asyncpe/4.6, pgi/10.9.0, xt-mpt/5.1.3, cce/7.3.0, xt-libsci/10.5.0, petsc/3.1.04, petsc-complex/3.1.04, perftools/5.1.3 (includes xt-craypat/5.1.3, apprentice2/5.1.3), trilinos/10.6.0, chapel/1.2.1.
Nov 10, 2010
HW and SW maintenance. Installed field notices. Set to default versions: xt-asyncpe/4.3, pgi/10.8.0, xt-mpt/5.1.0, cce/7.2.7, xt-libsci/10.4.8, petsc/3.1.02, petsc-complex/3.1.02, hdf5/1.8.5, hdf5-parallel/1.8.5, netcdf/4.1.1, perftools/5.1.2 (includes xt-craypat/5.1.2, apprentice2/5.1.2, and xt-papi/4.1.0), modules/3.1.6.6, atp/1.0.3.
Sep 8, 2010
HW and SW maintenance. Installed field notices. Set to default versions: xt-asyncpe-4.0, pgi/10.5.0, xt-mpt/5.0.0, acml/4.4.0, xt-craypat/5.1.0, apprentice2/5.1.0, gcc/4.4.4, cce/7.2.4, petsc/3.1.00, petsc-complex/3.1.00, pathscale/3.2.99, java/6.0_20.
Jul 27-31, 2010
Franklin down due to NERSC Center Power upgrade and system stress test afterwards.
Jun 23, 2010
HW and SW maintenance. Installed field notices. Installed DVS patch. Upgraded SSH, Moab, and nscd. Changed Torque configuration so that batch job stdout/stderr spool files appear and progress in user job submit directory while the job is running.
May 12, 2010
HW and SW maintenance. Installed field notices. Installed ldap, gpfs, and nscd patches. Added Torque "SERVERHOST" setting.
May 3, 2010
Decreased the boundary between reg_small to reg_med from 512 to 256 nodes, increased the boundary between the reg_med and reg_big queues from 1024 to 2000 nodes, and changed the queue charge factor for reg_med from 0.8 to 0.75 (more discount).
Apr 21, 2010
HW and SW maintenance. Installed field notices. Installed torque/2.4.7. Lustre configuration (disable statahead) changes. Set to default versions: pgi/10.0.0, xt-libsci/10.4.3, xt-mpt/4.0.3, xt-asyncpe/3.7, fftw/2.1.5.2, xt-craypat/5.0.2, apprentice2/5.0.2, gcc/4.4.3, cce/7.2.1, netcdf/4.0.1.3, netcdf-hdf5parallel/4.0.1.3, hdf5/1.8.4.1 and hdf5-parallel.1.8.4.1.
Apr 1, 2010
The Franklin external login node "Freedom" entered into production.
Mar 24, 2010
OS upgraded to CLE 2.2UP02. OS version is 2.2.48B.
Feb 10, 2010
Hardware maintenance. Virtual Channel 2 (VC2) enabled. Idle loop set to halt.
Jan 21, 2010
Hardware maintenance. Franklin home converted to global homes.
Dec 9, 2009
Hardware maintenance. Installed field notices include new nscd. Preparation work for Global Home conversion. New modules 3.1.6.5 that has options for "module avail" installed.
Nov 17, 2009
Installed field notice (Linux security fix). Set hdf5 and hdf5-parallel 1.8.3.1 to default. Installed portals bug fix, OS version is now 2.1.50HDB_PS13A.
Nov 4, 2009
Hardware maintenance. Installed field notices. New nscd and glic. Set to default versions: pgi/9.0.4, xt-mpt/3.5.0, xt-asyncpe/3.3, xt-libsci/10.4.0, xt-craypat/apprentice2 5.0.0, gcc/4.4.1, cce/7.1.4, chapel/1.0, Cray hdf5_netcdf-1.5. Implement Pathscale no license checking. GPFS upgrade. Configured to prepare for future /global/home conversion.
Oct 1, 2009
Charging discount reg_med queue decreased from 25% to 20% (charging factor is 0.8).
Sept 30, 2009
Hardware maintenance. Installed DVS and portals patches. Connected the eslogin node (freedom) to Franklin (routing and network changes needed).
Sept 16, 2009
Hardware maintenance. Installed field notices. Retired "feature=quad" for batch script. Installed new NERSC SSH. /usr/common mounted on NGF as /global/common/franklin. Set to default versions: pgi/9.0.2, xt-mpt/3.4.1, xt-asyncpe/3.2, xt-libsci/10.3.8, acml/4.3.0, cce/7.1.2, xt-mpt/3.4.1, Cray hdf5 and hdf5-parallel/1.8.3.0, and Cray netcdf/4.0.1.0.
Sept 4-6, 2009
Linux Kernel, RDAC, GPFS, DVS, portals, and lustre patches are upgraded on Franklin due to a security vulnerability in Linux kernel. OS level is upgraded from 2.1.50HD to 2.1.50_HDA_PS12A.
Sept 2, 2009
Hardware maintenance. Installed field notices. Restarted LDAP server.
Aug 19, 2009
Hardware maintenance. Installed field notices. Installed DVS patch and software portals patch for MPICH PtlEQPoll "unresponsive nodes in the system" error. Added new batch queue execution class reg_short with max user run limit of 12. and max user idle limit of 8. xt-asyncpe/3.0, xt-mpt/3.3.0, and xt-libsci/10.3.6 are set to default.
Aug 5, 2009
Hardware maintenance. Installed field notices. Franklin internal networking changed to free up two network nodes for eslogin.
Jul 22, 2009
Hardware maintenance. Installed field notices. Installed DVS and HSS patches. Installed RDAC. Upgraded Torque/Moab to use node counts basis (instead of mppwidth) for execution class routing.
Jul 8, 2009
Hardware maintenance. Installed field notices. GPFS upgraded to PTF12. Torque upgrade to resolve the core dump at reservation problem.
Jun 19, 2009
xt-asyncpe/2.5, pgi/8.0.6, xt-mpt/3.2.0, xt-libsci/10.3.5, and gcc/4.3.3 are set to default.
Jun 18, 2009
Daily puring on scratch file systems starts. Cut off date is 12 weeks as defined by last access time.
Jun 2, 2009
Hardware maintenance. DDN upgrade for one cabinet. Installed software patch for UPC.
May 12, 2009
Introduced reg_med queue (using 2,045-4,092 cores) with 25% charging discount. Jobs running in reg_big (using 4,093-24,572 cores) and reg_xbig (using >=24,573 cores) queues now have 50% charging discount.
Apr 27, 2009
Made some changes on ldap parameter settings to mitigate the "identifier removed" problem. Installed new security patch for udev. Changes in ports reservation for GPFS.
Apr 21, 2009
Set maxproc limit on login/MOM nodes to 150 soft and 200 hard.
Apr 16, 2009
libfast/1.0.2, xt-libsci/10.3.3, xt-mpt/3.1.2, pgi/8.0.4 and xt-asyncpe/2.3 are set to default. Installed security patch for udev.
Mar 31 - Apr 1, 2009
Hardware maintenance. New IO hardware added. /scratch reformatted. /scratch2 created. Service nodes redistributed. High Speed Network recabled.
Mar 24, 2009
Hardware maintenance. Hardware upgrade login nodes, MOM nodes and network nodes to PCI-Express. Redistribute login nodes, Mom nodes and network nodes in the torus. Change settings for two HSN Tuning parameters: dateline and aging settings.
Mar 17, 2009
OS upgrade from CLE 2.1 to CLE 2.1UP01 with patch sets 01, 01A and 02. Hardware repairs. PAM/LDAP related changes. Network configuration change (from 42 net to 41 net). Reduce max wallclock time for regular queues from 36 hrs to 24 hrs. Insert new SIO modules (converted from compute modules). Franklin has 9,532 compute nodes (with 60 spare nodes).
Mar 10, 2009
Hardware maintenance. Upgrade version to Moab/5.2.5-s12920. Updated a lustre timeout parameter (kptllnd) from 50 sec to 250 sec to reduce the chances of user jobs being terminated by Lustre timeout.
Mar 9, 2009
Free charging (due to /scratch reformat) ends.
Mar 3, 2009
Franklin rebooted to clear situation of several login nodes out-of-memory conditions and lost connection to /scratch, which also caused some jobs hung exiting. Also made the following changes: Separate login and mom nodes (10 login nodes and 6 MOM nodes); Set a 512MB limit on /tmp for login and mom nodes; And set the following new memory limits (unit in bytes) on the login nodes:
                Type    Item         Current      New
Soft stack 2097152 131072
Hard stack 2097152 262144
Soft data 2097152 1048576
Soft memoryuse 2097152 1048576
Soft vmemoryuse unlimited 2097152
Hard vmemoryuse unlimited 2097152
Feb 25, 2009
Due to disk controller problem caused Franklin /scratch to be reformatted. Charging is free until further notice.
Feb 22, 2009
Installed the new firmware patch that caused SWOs caused by CAM overflow condition.
Feb 21, 2009
Installed firmware patch that caused SWOs caused by CAM overflow condition.
Feb 7, 2009
Turned off virtual Channel 2 (VC2) setting.
Feb 4, 2009
Hardware and Software maintenance. Installed secutiry patches. Enabled VC2. Installed new torque/2.4.0b1. Set usecp for /project. Started to enforce job submission filter by checking user quota.
Jan 29, 2009
Reinstalled a Moab scheduler version 5.2.4-s11143 with 10k group entries.
Jan 27, 2009
xt-craypat/4.4.0 and apprentice2/4.4.0 are set to default.
Jan 26, 2009
Installed two security patches. Installed the corrected patch for LBUG. Hardware maintenance.
Jan 18, 2009
Removed the software patch installed for LBUG on 01/15/09.
Jan 16, 2009
Installed a revised patch for the ssnal bug.
Jan 15, 2009
Installed two software patches, one to fix an ssnal bug that caused login/network nodes to become unresponsive, and the other to fix an LBUG. Set parameter max_rw_chunka to zero as a workaround for slower single stream IO performance.
Jan 13, 2009
Allocation Year 2009 starts. Charging factor is now based on Franklin hours. 50% Quad core discount ended (Each quad core node is now charged fully). reg_big discount is reduced from 50% to 25%. pgi/8.0.1 , xt-mpt/3.1.0, acml/4.2.0 , and fftw/2.1.5.1 are set to default.
Jan 12, 2009
Allocation Year 2008 ends. Hardware maintenance.
Jan 6, 2009
Franklin software maintenance. Installed the lustre portals patches removed on Dec 25, plus more software patches. Put into place the aprun wrapper to reject MPT2 parallel executables.
Dec 29, 2008
xt-mpt/2.x.x versions removed from the system. Installed xt-mpt/3.0.4, xt-mpt/3.1.0, acml/4.2.0, xt-libsci/10.3.1, fftw/2.1.5.1, pgi/8.0.1, xt-asyncpe/2.0, fast_mv/1.0.1.
Dec 25, 2008
Softwares patches installed on Dec 16 were removed to help diagnose system crashes.
Dec 21, 2008
Turned off virtual Channel 2 (VC2) setting.
Dec 16, 2008
pgi/7.2.5, xt-mpt/3.0.3 , and acml/4.1.0 set to default. Turned on virtual Channel 2 (VC2) setting. Installed several bug software patches (2 lustre and 1 portals patches), including warmbooting down nodes, MDS node panic related to quota enforcement, and the nodes marked down de to lustre apps hanging.
Dec 8, 2008
Changed the SeaStar netmask from 255.0.0.0 to 255.255.0.0 to fix a communication problem (for example login and ssh to Franklin) from IP address 192.x.x.x. Changed the nfsserver services on the service nodes. Installed new sshd.
Dec 3-4, 2008
Franklin OS was upgraded from 2.0 to 2.1. Service nodes now run SLES 10 Service Pack 1 (upgraded from SLES 9.2). Lustre upgraded from 1.4.12 to 1.6.5. New SSHD installed.
Nov 17, 2008
Franklin was down to fix hardware for a Link Inactive, to upgrade SMW software and Lustre software conversions.
Oct 29, 2008
Franklin quad core upgrade final configuration. The completely upgraded Franklin has 9,660 quad core compute nodes (38,640 quad cores).
Oct 17-19, 2008
Franklin quad core upgrade phase 4 started. Hardware maintenance. 7 columns of quad core modules from the test system merged to the production system. Modules from columns 14 and 16 swapped with those in columns 0 and 1. Franklin production environment is now a pure quad core system: columns 0-13 and 15 quad core, total of 34,032 compute cores. Torque upgraded to version 2.3.4-snap.20080901135 and Moab upgraded to version 5.2.4-s11143.
Oct 16, 2008
Started to implement franklin /scratch purge policy. Start from files older than 24 weeks, defined by last access date, down to files older than 16 weeks if necessary.
Oct 15, 2008
Started to implement the policy to kill jobs with stdout/stderr file sizes bigger than 100 MB.
Oct 13, 2008
pathscale/3.2 set to default.
Oct 6, 2008
pgi/7.2.4 set to default.
Sept 30, 2008
Hardware maintenance. Also installed securities patches from field notices.
Sept 19, 2008
License server moved from login01 to login11.
Sept 17, 2008
Franklin quad core upgrade phase 3b started. Hardware maintenance. Franklin full configuration in production: even columns 2-16 quad core, columns 0, and columns 0 and 1 dual core, total of 20,392 compute cores. pgi/7.2.4 installed.
Sept 10, 2008
Franklin quad core upgrade phase 3a started. Hardware maintenance. Franklin full configuration in production: even columns 2-16 quad core, columns 0, and odd columns 1-15 dual core, total of 28,456 compute cores. Default environment set to quad core. Franklin quad core started charging (in dual core rate). Queue structure adjusted by doubling min core and max cores for each queue class. 50% discount for reg_big with >= 4829 cores (instead of 2415 cores).
Aug 21, 2008
Franklin quad core upgrade phase 2b started. Hardware maintenance. Franklin full configuration in production: cols 12,14,16 quad core, cols 0-10,11,13,15 dual core, total of 17,016 compute cores. Installed new SSH. Set xt-mpt/3.0.2 as default.
Aug 13, 2008
Franklin quad core upgrade phase 2a started. Hardware maintenance. Franklin full configuration in production: cols 12,14,16 quad core, cols 0-10,11,13,15 dual core, total of 22,776 compute cores.
Aug 11, 2008
Installed xtpe-quadcore, xt-async/1.0, gcc/4.2.0.quadcore, libsci/10.2.1. Also set gcc/4.2.0.quadcore and libsci/10.2.1 to be default. Set xt-craypat/4.3.1and apprentice2/4.3.0 to be default. Reduced the process memory hard limit on the service nodes from 4 GB to 2 GB.
Jul 15, 2008
Franklin quadcore upgrade started. 4 columns were removed from Franklin production system. A total of 14,712 compute cores are available. (cols 0-11,13 dual cores). Installed new portals firmware patch. Installed ALPS with new patches. Installed Field Notice. Performed hardware maintenance for failed nodes.
Jun 28, 2008
Torque/2.2.2 rolled back to torque/2.2.0.
Jun 22-24, 2008
/var mount options added; NERSC sshd installed on service nodes; We have 10 login nodes, and 16 MOM nodes. Upgraded DDN fimware to 3.11.h and rolled back to 3.11.i due to potentiality of data corruption.
Jun 18, 2008
Hardware maintenance (nodes and DDN controller). Installed portals fix for SPR 743158.
Jun 12, 2008
Added two more pathscale compiler licenses (total of 4 now).
Jun 11, 2008
OS upgraded from 2.0.44a2 to 2.0.53. Lustre upgraded from 1.4.9 to 1.4.12+ with bug fixes. Torque upgraded to a newer version 2.2.2-200805051805. Login nodes and MOM nodes are separated. (later found out not effective) We now have 10 login nodes, and 10 MOM nodes.
May 30, 2008
Installed a DDN bug fix to correct /scratch IO problem.
May 28, 2008
Set pgi/7.1.6 to be the default pgi compiler.
May 27, 2008
Upgraded DDN controller firmware to correct /scratch IO problem. Had some other HW actions for the maintenance.
May 22, 2008
Increase regular queue wall clock limit from 24 to 36 hours. Increase low queue wall clock limit from 12 to 24 hours.
May 21, 2008
Started to enforce 2 GB soft memory limit and 4 GB hard memory limit on processes running on login/service nodes. Started to enforce 8-hour idle session logout. The 60 min CPU time limit was put onto batch jobs in the morning and backed out in the afternoon.
May 19, 2008
nscd (name service caching daemon) time-to-live increased to 600 sec for password, and 3600 sec for group change.
May 12, 2008
Implemented automatic login node failure clearing after Franklin comes back up from either a crash or a scheduled down.
Apr 28, 2008
Hardware and software maintenance. Installed Field Notice. Installed additional patch for SeaStar HB fault. Reconfigured Lustre to allow access from 4 network nodes. NERSC reserves the right to kill jobs that are over the limit of 8 for multiple simultaneous apruns in a batch job from today.
Apr 22, 2008
Started to enforce 60-min CPU time limit on processes running on login/service nodes from interactive jobs.
Apr 21, 2008
Fixed the firmware change defect for SPR 741141. Set the limit of 8 for multiple simultaneous apruns in a batch job, will enforce from 04/28/08.
Apr 2, 2008
OS upgrade to 2.0.44a2. Installed a portals patch (firmware change) for SPR 741141 to increase system stability. Installed patches for two Lustre quota bugs.
Mar 26, 2008
Hardware and software maintenance. More power modules swapped. Updated PIC code for voltage settings. Set pgi/7.1.4, craypat/4.1.1, Apprentice2/4.1, and papi/3.5.99b to default.
Mar 22, 2008
Set CNL kernel boot parameter idle=poll.
Mar 19, 2008
gcc/4.2.3 and acml/4.0.1a set to default. Torque rolled back to version 2.2.0
Mar 12, 2008
Hardware and software maintenance. More power modules swapped. Torque upgraded to version 2.2.2-200712171618. Security field notice FN5509 applied.
Feb 27, 2008
Hardware maintenance. More power modules swapped.
Feb 13, 2008
OS upgraded to CNL version 2.0.39. Has the multiple apruns in background feature. ACML is not automatically invoked by compiler driver any more. Fixed the Lustre falsely thinks user inode over quota problem. Installed patch for degraded IOR read and write performance due to quota enabled. "2s" firmware setting in place to fix a portal's issue related to seastar non-deadbeaf hang. set CNL kernel boot parameter idle=halt.
Feb 7, 2008
Hardware maintenance. Swapped some power modules.
Jan 22, 2007
MPT portals fix for the MPI send-to-self deadlock problemi, also solved the problem of MPI_allreduce call needing an explicit barrier.
Jan 9, 2008
xt-craypat/4.1 and apprentice/4.1 installed as non-default.
Jan 8, 2008
FY08 production year starts. Production queues max wall clock limit increased from 12 hours to 24 hours. Reduced the reg_small maximum tasks to 2416. Enabled premium queue. Franklin starts official charging with charging factor of 6.5 and 50% discount for jobs using 1208+ nodes.
Dec 13, 2007
Moab/5.1.0p8 installed and set to default.
Dec 10, 2007
GCC/4.1.2 installed.
Dec 5, 2007
UPC/1.0.2 installed.
Dec 3, 2007
Set apprentice/4.0 as default.
Nov 29, 2007
Moab configure changes. Max user idle limit for reg_big increased from 1 to 2.
Nov 16, 2007
Installed CrayPat 4.0.0, made module xt-craypat/4.0 the default. Installed Apprentice 4.0.0, still set apprentice2/3.2.3 the default.
Nov 7, 2007
Moab configuration changes for enforcing gloabl limit, and for settings per CLASS.
Nov 2, 2007
Set Pathscale default version from 3.0 to 3.1.
Oct 31, 2007
Libsci default changed from 10.0.1 to 10.2.0.
Oct 26, 2007
Franklin officially accepted by NERSC form Cray.
Oct 23, 2007
Fix for NWCHEM, GAMESS and Global Arrays codes: The default xt-mpt module has been set from 2.0.24b to 2.0.24d, with a new environment variable SHMEM_SWAP_BACKOFF set to 100.
Oct 19, 2007
Default compiler versions upgraded: pgi from 6.2.5 to 7.0.7, and gcc from 3.2.3 to 4.2.1.
Oct 17, 2007
Franklin some Moab config changes.
Oct 15-22 2007
Production CNL benchmark runs.
Oct 14, 2007
User quota enforced. Franklin queue structure regarding regular batch classes was simplified to have only "reg_small", "reg_big" and "reg_xbig" classes to replace the original 10+ buckets.
Oct 13-14 2007
Dedicated CNL witness benchmark runs.
Oct 8 2007
OS upgrade to OS 2.0.24b+ (GA version). Installed ALPS 9/5 RPM, which includes fix for MPMD aprun command line issues, and aprun -L issues.
Oct 5, 2007
Installed patch on Franklin for SHMEM portals problem
Sept 24-26, 2007
All NERSC Users enabled on Franklin
Sept 24, 2007
Trap installed for Shmem atomic problem on Franklin
Sept 14, 2007
Installed new Moab version 5.1.0p7. /dd>
Sept 12, 2007
Installed pgi/7.0.7 and gcc/4.2.1. Not set as default.
Sept 10, 2007
OS 2.0.14 patched for enabling "aprun -L" option.
Sept 7, 2007
OS 2.0.14 patched for enabling aprun MPMD mode.
Aug 3, 2007
Acceptance period begins. The system is tested and tuned and is required to pass strict performance and functionality tests.
Early July 2007
Early users started to have access on Franklin.
June to August, 2007
Assessed CNL with initial testing, testing at scale and improvements.
First two weeks in June 2007
Franklin CNL evaluation, installed CNL the week it was released from Cray Develop to Cray Testing.
March to May, 2007
System integration and partial acceptance testing with CVN.
Early March, 2007
Very-early staff users started to have access to Franklin.
Feb 10-12, 2007
Franklin phase 2a (30 cabinets) and phase 2b delivered (36 cabinets).
Jan 29-31, 2007
Franklin Isobase installed.
Jan 16, 2007
Franklin phase 1 delivered (36 XT4 cabinets, 11 DDN cabinets).
Dec 7, 2006
Silence upgraded to XT4.
Aug 21, 2006
Test System "Silence" (1 XT3 cabinet) arrived at OSF.
Aug 10, 2006
Franklin contract awarded to Cray by DOE.