Salishan Conference on High Speed Computing

April 25-28, 2016

Theme: Data Movement for Computing at Scale

The Salishan Conference on High-Speed Computing was founded by the three National Nuclear Security Administration (NNSA) laboratories to pursue collaborative discussions with academia and industry in the areas of algorithms, architectures, and languages, and to maximize the nation’s return on investment in high performance computing (HPC). In particular, balancing the performance and cost of scientific simulations on current and future platforms has been a primary objective. Optimizing data movement for speed, cost, or space has been a compelling research topic for many years. This year’s conference examines the decadal challenges the laboratories face in the context of data movement for computing at scale.
As HPC approaches exascale, there are new and changing pressures and drivers affecting data movement, including but not limited to the explosion of cores, increase in heterogeneity, increasingly complex memory hierarchies, and power management concerns. Careful workflow analysis informs data movement decisions at every level of the system. Data movement must be reviewed from multiple perspectives within the overall context of an integrated environment. At this year’s sessions, participants will examine data movement challenges and opportunities from the hardware, system software, applications, storage, data analysis, workflow, and visualization perspectives. Invited talks will focus on recent research in areas that are particularly important to facilitate data movement optimized for computing at scale. The main conference goal, as always, is to provide ample forums for discussion among participants, and to provide feedback, discuss issues, develop collaborations, and recommend solutions.

Session 1: Hardware Architecture Data Movement Capabilities

From a hardware perspective, the balance of data movement capabilities has a primary impact on both realized application performance and energy to solution. For the HPC community, concepts of balance include both on-node data movement up/down the local memory hierarchy and off-node data movement across the interconnection network fabric. New architectural capabilities such as heterogeneous multicore and many-core node architectures and multilevel main memory with stacked DRAM, conventional dynamic random access memory (DRAM), and non-volatile random access memory (NVRAM) complicate the analysis of system balance. From an interconnection network perspective, the system includes compute nodes that may support in-situ visualization/analysis, input/output (I/O) storage nodes and perhaps new node categories, NVRAM nodes for burst-buffer functions, in-transit data analysis and visualization, all adding further complexity to the analysis of system-level data movement balance. This session examines data movement hardware capabilities at both the local node architecture level and the global system architecture level. Key questions include: what data movement capabilities have the greatest potential for synergy between HPC and high-performance data analysis (HPDA)? What hardware technology demonstrators are needed in the next few years to create an on-ramp for high-performance, energy-efficient, resilient data movement capabilities as candidate technologies for integration into the future Department of Energy (DOE) exascale platforms?

Session 2: System Software Data Movement Capabilities

As new capabilities are introduced into memory, compute, and interconnect hardware, system software must continue to evolve to manage an increasing amount of complexity in HPC systems. New and alternative approaches to programming HPC systems and developing scalable applications bring about new requirements and challenges for operating and runtime systems and the programming interfaces between layers of the software stack involved in moving data. The desire to support a broader set of applications, including integrated applications composed of many components and more sophisticated application workflows, also creates challenges for system software. The need to reduce the cost of data movement within endpoints as well as across all of the endpoints in the system motivates the need for more sophisticated resource allocation and management strategies. This session examines the data movement challenge from a system software perspective from both the node- and system-level perspective. Key topic areas for discussion include: programming interfaces for data movement, resource management strategies for reducing data movement, requirements and challenges introduced by alternative programming models and runtime systems, and software support for implicit and explicit communication mechanisms.

Session 3: Data Movement from the Applications Perspective

This session will illustrate the problems and solutions of data movement for computing at scale from the applications side. With the growing size of machines and increasing levels of memory hierarchies, efficient data movement will become a greater burden for the application programmer, and certainly new algorithms and programming techniques are required. In this session, the following questions should be attacked: How does data gets moved through different cache levels and to the memory while the application processes? How can applications be written to minimize unnecessary data movement? How can this data management be abstracted away from the application designer using portable frameworks? Can the data movement be modeled using analytical or semi-analytic methods at small scale to predict results at a larger scale? How can the above optimizations be done in a portable way?

Session 4: Data Movement from the Data Analysis, Workflow, and Visualization Perspectives

In the search for high performance with improved energy efficiency, large-scale systems over the next decade increasingly will continue to utilize alternatives to traditional disk storage. Technologies such as NVRAM or solid-state disk can be placed on the compute node, or close-by, to provide higher bandwidth and faster access but at the expense of reduced capacity. One such use case is the burst-buffer concept that will be present on flagship machines delivered to several DOE and DOE/NNSA laboratories over the next four years. These technologies present opportunities and challenges for data analysis, workflow, and visualization. The key will be effectively handling the large amounts of data generated by a simulation. Key questions include: how might the workflow change to achieve better data efficiency? How well can in-situ analysis and visualization techniques provide equivalent understanding to today’s traditional methods? Can partial analysis techniques reduce the data set sufficiently to fit in a burst buffer while retaining enough features to enable data exploration? What might an efficient framework look like for supporting this tiered analysis? Could data analytics techniques (from the “big data” space) be useful?

Session 5: Input/Output, File Systems, and Data Storage Data Movement Challenges

This session examines the decadal challenges facing data storage, file systems, and I/O in the context of data movement for computing at scale, including but not limited to new and changing pressures and drivers, both technical and economic. In concert, the session explores game-changing ideas and technology and/or radical changes in perspective or approach to meet the identified challenges. Key questions include: what are the consequences in an ecosystem in which the economics of storage/data movement/data analysis are fundamentally changing and where the simple model of memory/disk/tape is insufficient to meet minimum technical requirements for computing at scale? What are the complexities and challenges and potential solutions that arise at the interface where parallel data movement meets the wide area network? Are opportunities hidden in the melding of cloud- and HPC-based technologies and approaches? Is there a role for peer-to- peer storage solutions?