April 17, 2017 – Talon 3's launch began in earnest April 5, 2017, with a workshop to unveil a new batch-queuing system and user-support portal. A small group has been beta testing the new cluster, said DaMiri Young, manager, High-Performance Computing.
The HPC team has implemented a new job-control system for Talon 3 called Slurm, an open source, fault-tolerant, and highly scalable cluster management and job-scheduling system for large and small Linux clusters.
"It is advisable to begin familiarizing yourself if you haven't already. Unfortunately, currently-running jobs and job scripts are not compatible with the new system. The plan is to close the UGE queues on Talon 2 next week and let those jobs in execution run to completion. You will need to rewrite/resubmit your job to the SLURM system unless you utilize the run script," Young said.
In case you were wondering...Talon 3 has 8,304 CPU cores, and 160,000 graphics processing unit, GPU, cores available for research computation.
It has 14 cores in each of the Intel Xeon E5-2680 CPUs in the 208 new Dell servers.
It has 355 servers in 13 racks that include the 208 Dell new servers and 147 Talon 2 servers that will be redeployed in Talon 3.
The new servers' calculation rate is 280 teraflops (trillion calculations per second), or 0.280 petaflops. The total Talon 3 theoretical peak performance is over 320 teraflops.
For more information about Talon 3, please refer to the article in the February 2017 issue of Benchmarks Online or visit the UIT HPC home page.