April 23-26, 2018
Theme: Maximizing ROI for HPC in a Changing Computing Landscape
History: The Salishan Conference on High Speed Computing was first established in 1981 to be a forum for the exchange of technical information and lessons learned among the DOE Weapons Labs regarding how to make effective use of the new Cray X-MP vector supercomputers. While these halcyon days were well before the time of any members of the organizing committee, we can envision how the development of capabilities like the Cray Time Sharing System (CTSS) benefited from the interdisciplinary discussions among our predecessors in the Long and Council Houses.
The ASCI/ASC Program has been investing in applications, software environments, and platforms for over two decades. These investments have always been guided by mission impact on the DOE/NNSA Defense Programs. The cumulative Tri-lab ASC investment in our application portfolio is well over $1B at each of the Tri-Labs. This investment supports efforts to develop our Integrated multi-physics and engineering analysis codes, and physics and engineering models; and to address confidence in our mission-critical applications through independent validation and verification of the ASC application portfolio. Mission impact led to the decision to move away from dependence on the custom Cray vector supercomputers to a strategy that was based on the integration of commodity computing components into large scale, integrated massively parallel processor systems. During the heyday of Moore’s Law, this strategy really paid off well for ASCI/ASC.
The Changing Computing Landscape: The end of Moore’s Law has two components. Over a decade ago, the end of Dennard Scaling imposed a constraint on the maximum clock rate, which led to the advent of multi-core CPUs and many-core designs like IBM’s BlueGene line and GP-GPU accelerators. The main implication of this change is that realizing commodity performance improvements requires increasingly disruptive changes to application codes. The second component of the end of Moore’s Law is that there are only a couple more generations of feature size shrink left. While there may still be additional transistors for computer architects to work with, the cost per transistor no longer significantly cheaper for each of these last feature size generations.
ROI: Maximizing return on investment requires targeting areas where impact can be best realized and then optimizing across several distinct areas (e.g., data movement, memory-to-flops ratios, and energy/power/water requirements, to name but a few). These, in turn, are affected by critical cost factors, including the cost of open source system software; hardware costs and capability tradeoffs (for both commodity and custom/semi-custom HW design/development options); operating costs (e.g., energy, water, infrastructure); and the cost of refactoring applications for new architectures. The benefit of different approaches must also be considered, as we can all recognize that many times the lowest cost option can actually cost more in the long run, or in a bigger picture perspective. Another important framework consideration is how far the HPC community should go with Total Cost of Ownership (TCO), as there is an indication that European and Japanese HPC sites are increasingly focused on developing TCO models. The five sessions for the Salishan 2018 conference will explore different perspectives/dimensions on the ROI for future HPC directions.
The ASCI/ASC Program has been investing in applications, software environments, and platforms for over two decades. These investments have always been guided by mission impact on the DOE/NNSA Defense Programs. The cumulative Tri-lab ASC investment in our application portfolio is well over $1B at each of the Tri-Labs. This investment supports efforts to develop our Integrated multi-physics and engineering analysis codes, and physics and engineering models; and to address confidence in our mission-critical applications through independent validation and verification of the ASC application portfolio. Mission impact led to the decision to move away from dependence on the custom Cray vector supercomputers to a strategy that was based on the integration of commodity computing components into large scale, integrated massively parallel processor systems. During the heyday of Moore’s Law, this strategy really paid off well for ASCI/ASC.
The Changing Computing Landscape: The end of Moore’s Law has two components. Over a decade ago, the end of Dennard Scaling imposed a constraint on the maximum clock rate, which led to the advent of multi-core CPUs and many-core designs like IBM’s BlueGene line and GP-GPU accelerators. The main implication of this change is that realizing commodity performance improvements requires increasingly disruptive changes to application codes. The second component of the end of Moore’s Law is that there are only a couple more generations of feature size shrink left. While there may still be additional transistors for computer architects to work with, the cost per transistor no longer significantly cheaper for each of these last feature size generations.
ROI: Maximizing return on investment requires targeting areas where impact can be best realized and then optimizing across several distinct areas (e.g., data movement, memory-to-flops ratios, and energy/power/water requirements, to name but a few). These, in turn, are affected by critical cost factors, including the cost of open source system software; hardware costs and capability tradeoffs (for both commodity and custom/semi-custom HW design/development options); operating costs (e.g., energy, water, infrastructure); and the cost of refactoring applications for new architectures. The benefit of different approaches must also be considered, as we can all recognize that many times the lowest cost option can actually cost more in the long run, or in a bigger picture perspective. Another important framework consideration is how far the HPC community should go with Total Cost of Ownership (TCO), as there is an indication that European and Japanese HPC sites are increasingly focused on developing TCO models. The five sessions for the Salishan 2018 conference will explore different perspectives/dimensions on the ROI for future HPC directions.