DoD HPC Modernization Program Insights on Maximizing Impact

An Investment Approach Tailored to Mission Requirements

Dr. Roy Campbell
24 April 2018
Air Force HPC Initiatives

Interactome Sciences in Genomic High Throughput Studies
Examination of physical interactions among molecules via machine learning on HPC systems; speeds up successful drug treatment programs and CBR threat analysis

Hypersonics
Evaluation of aerodynamic structure, advanced materials, combustion, and maneuverability; prediction of experimental results to streamline test and evaluation processes in order to reduce frequency and cost of expensive live experiments

GOTCHA Monitoring System
Transformation of synthetic aperture radar ground images (observed from a single UAV) into 3-D holograms via real-time computation using a combination of onboard and ground-based processing

Improved Methodologies for Fighter Aircraft Stability and Control
Simulation of aircraft maneuvers in lieu of live experiments; support for modeling and simulation research at the United States Air Force Academy (USAFA)

Target Recognition and Adaptation in Contested Environments
Generation of synthetic data based on known signatures to understand possible intentional permutations; development of detection methods via deep learning on HPC systems

Counter-electronics High Power Microwave Advanced Missile Project (CHAMP)
Development of directed energy strikes from small UAVs; challenging design due to space, weight, and power limitations

Distribution A: Approved for public release: distribution unlimited.
Army HPC Initiatives

Terminal Ballistics for Lethality and Protection
Evaluation of armor and anti-armor technologies to improve the effectiveness of weapons and defenses (e.g., ceramics, composites, electromagnetic armors)

Advanced Methods for Weapons Effects M&S
Development, verification, and validation of blast modeling codes for hull analysis

Gray Eagle Flight Performance
Prediction of flight performance (e.g., climb, cruise, descent) for an Army UAV

Rapid Data Analysis for Army C4 Systems
Rapid analysis of network integration evaluation (NIE) test and evaluation data; 12x reduction in time, enabling improved event fidelity

Design and Analysis of Next-Generation Rotorcraft Systems
Aerodynamic and aeromechanic assessment of vehicles in Joint Multirole Technology Demonstrator (JMR-TD) Program; supports wind tunnel testing and proposal evaluation

Ground Vehicle Acquisition
Trade-space assessment based on advanced mobility, transportability, payload, and sustainability concepts for the Joint Light Tactical Vehicle (JLTV)
Navy HPC Initiatives

Pacific Missile Range Facility
Rapid/accurate range risk and safety analysis; cost avoidance of ~$500K per antiballistic missile test event

COAMPS Tropical Cyclone
Accurate tropical storm forecasts; better sortie decisions and avoidance of damaging winds/seas

V-22 Osprey PMA 275
Analysis of engine exhaust airflow path; enhanced durability of exhaust nozzle components and reduction in operational costs

Advanced Arresting Gear (AAG)
Structural validation and life-cycle analysis of fighter jet stopping mechanisms for Ford class carriers

Columbia Class Submarine
Trade-space analysis and design assessment for new class of ballistic missile submarines

T-45 Goshawk Trainer
Virtual evaluation of a proposed air inlet modification to reduce the number of potentially unsafe flight tests

Distribution A: Approved for public release: distribution unlimited.
Mission Requirements

Primary Use Cases

- High-end, physics-based calculations
  - Hypersonic vehicles
    - Aerodynamics
    - Advanced materials
    - Combustion
    - Maneuverability
  - Electromagnetic railgun
  - Armor and anti-armor

- Trade space analysis, technical issue resolution, and life-cycle management
  - Airplanes
  - Ships
  - Rotorcraft
  - Ground vehicles
  - Submarines
  - Missiles

- Machine and deep learning
  - Target recognition
  - Vaccine discovery
  - Treatment after chemical, biological, and/or radiological events

- Data processing
  - Monitoring dismounts and vehicles in an area of interest using a UAV
  - Determining key measures during a test event

- Scenario planning
  - Prediction of cyclone location and intensity; optimization of military asset placement and logistics
  - Identification/mitigation of dangerous aspects of a test event (e.g. blast range for antiballistic missile launch)
A New Approach
Primary Considerations

- Growing number of calculation types
  - Tightly-coupled, large-scale, high-end
  - Loosely-coupled, high-capacity, rapid turnaround
  - Training *(model and data-parallelism)*
  - Inference *(large-scale and tactical)*
  - Data-intensive *(graph-based and simple processing)*
  - Combination of types *(in series or parallel)*

- Growing number of sensitivity levels
  - Open
  - Unclassified
  - Multiple levels of classified
  - Numerous caveats

- Growing number of deployment modes
  - CONUS supercomputing center
  - Forward operating base
  - On-the-move

- New objectives *(assumes narrow user base)*
  - Maximize impact
  - Build end-to-end solutions
  - Accommodate use case workflows
  - Implement pragmatic approaches based on realities of technology roadmaps
    - Slowing CPU capacity growth per $*
    - Slowing CPU capability growth per $*
    - Significant paradigm shifts
      - Away from traditional HPC and toward artificial intelligence
      - Away from homogenous architectures which satisfy a broad range of workloads

- Old objectives *(assumes broad user base)*
  - Maximize utilization
  - Minimize wait-time
  - Maximize application performance per $ for a broad array of codes
## Technology Insertion 2017 (TI-17)

First Acquisition Under New Approach

<table>
<thead>
<tr>
<th>DSRC</th>
<th>#</th>
<th>Class</th>
<th>Make</th>
<th>System Cost ($M)</th>
<th>Compute Cores</th>
<th>Nodes</th>
<th># of MICs</th>
<th># of GPUs</th>
<th>Memory (TiB)</th>
<th>Memory per Core (GiB)</th>
</tr>
</thead>
<tbody>
<tr>
<td>AFRL</td>
<td>1</td>
<td>U</td>
<td>HPE</td>
<td>13.7</td>
<td>56448</td>
<td>1176</td>
<td>24</td>
<td>0</td>
<td>245</td>
<td>4.3</td>
</tr>
<tr>
<td></td>
<td>2</td>
<td>AS</td>
<td></td>
<td>4.8</td>
<td>13824</td>
<td>288</td>
<td>0</td>
<td>0</td>
<td>58</td>
<td>4.2</td>
</tr>
<tr>
<td></td>
<td>3</td>
<td>AS</td>
<td></td>
<td>2.4</td>
<td>6912</td>
<td>144</td>
<td>0</td>
<td>0</td>
<td>30</td>
<td>4.3</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>AS</td>
<td></td>
<td>2.4</td>
<td>6912</td>
<td>144</td>
<td>0</td>
<td>0</td>
<td>30</td>
<td>4.3</td>
</tr>
<tr>
<td>NAVY</td>
<td>5</td>
<td>U</td>
<td></td>
<td>10.0</td>
<td>35328</td>
<td>736</td>
<td>16</td>
<td>16</td>
<td>154</td>
<td>4.4</td>
</tr>
<tr>
<td></td>
<td>6</td>
<td>U</td>
<td></td>
<td>10.0</td>
<td>35328</td>
<td>736</td>
<td>16</td>
<td>16</td>
<td>154</td>
<td>4.4</td>
</tr>
<tr>
<td></td>
<td>7</td>
<td>S</td>
<td></td>
<td>2.6</td>
<td>7104</td>
<td>148</td>
<td>4</td>
<td>32</td>
<td>32</td>
<td>4.4</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>DSRC</th>
<th>#</th>
<th>Memory BW per Socket (GiB/s)</th>
<th>Interconnect</th>
<th>Storage (PB)</th>
<th>Storage Per Core (GB)</th>
<th>I/O Full Duplex BW (GB/s)</th>
<th>File System</th>
<th>DP Compute (PF)</th>
<th>Exit</th>
</tr>
</thead>
<tbody>
<tr>
<td>AFRL</td>
<td>1</td>
<td>128</td>
<td>Intel OmniPath Gen 1 <em>(non-blocking fat tree)</em></td>
<td>13.1</td>
<td>232</td>
<td>264</td>
<td>Lustre</td>
<td>5.0</td>
<td>Q4 FY23</td>
</tr>
<tr>
<td></td>
<td>2</td>
<td></td>
<td></td>
<td>2.2</td>
<td>162</td>
<td>54</td>
<td></td>
<td>1.2</td>
<td></td>
</tr>
<tr>
<td></td>
<td>3</td>
<td></td>
<td></td>
<td>1.5</td>
<td>214</td>
<td>26</td>
<td></td>
<td>0.6</td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td></td>
<td></td>
<td>1.5</td>
<td>214</td>
<td>26</td>
<td></td>
<td>0.6</td>
<td></td>
</tr>
<tr>
<td>NAVY</td>
<td>5</td>
<td></td>
<td></td>
<td>7.9</td>
<td>224</td>
<td>178</td>
<td></td>
<td>3.1</td>
<td></td>
</tr>
<tr>
<td></td>
<td>6</td>
<td></td>
<td></td>
<td>7.9</td>
<td>224</td>
<td>178</td>
<td></td>
<td>3.1</td>
<td></td>
</tr>
<tr>
<td></td>
<td>7</td>
<td></td>
<td></td>
<td>1.5</td>
<td>208</td>
<td>26</td>
<td></td>
<td>0.6</td>
<td></td>
</tr>
</tbody>
</table>

*AFRL NAVY*
Tightly-Coupled, Large-Scale Workloads
Challenges for Future High-End HPCMP Systems

<table>
<thead>
<tr>
<th>FY</th>
<th>Capital Invest. ($M)</th>
<th>Compute Cores (1K)</th>
<th>Nodes (1K)</th>
<th># of MICs</th>
<th># of GPUs</th>
<th>Memory (PB)</th>
<th>Memory BW (PB/s)</th>
<th>Storage (PB)</th>
<th>I/O BW (TB/s)</th>
<th>DP Compute (PF)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Absolute</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>17</td>
<td>239</td>
<td>912</td>
<td>35</td>
<td>964</td>
<td>484</td>
<td>2.8</td>
<td>4.1</td>
<td>76</td>
<td>3.3</td>
<td>26</td>
</tr>
<tr>
<td>18</td>
<td>227</td>
<td>865</td>
<td>28</td>
<td>1508</td>
<td>580</td>
<td>3.1</td>
<td>3.7</td>
<td>104</td>
<td>3.4</td>
<td>32</td>
</tr>
<tr>
<td>19</td>
<td>231</td>
<td>909</td>
<td>26</td>
<td>1260</td>
<td>576</td>
<td>3.5</td>
<td>4.0</td>
<td>131</td>
<td>3.6</td>
<td>43</td>
</tr>
<tr>
<td>Relative to FY17</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>17</td>
<td>0%</td>
<td>0%</td>
<td>0%</td>
<td>0%</td>
<td>0%</td>
<td>0%</td>
<td>0%</td>
<td>0%</td>
<td>0%</td>
<td>0%</td>
</tr>
<tr>
<td>18</td>
<td>-5%</td>
<td>-5%</td>
<td>-20%</td>
<td>56%</td>
<td>20%</td>
<td>11%</td>
<td>-9%</td>
<td>37%</td>
<td>4%</td>
<td>22%</td>
</tr>
<tr>
<td>19</td>
<td>-3%</td>
<td>0%</td>
<td>-25%</td>
<td>31%</td>
<td>19%</td>
<td>25%</td>
<td>-3%</td>
<td>73%</td>
<td>12%</td>
<td>65%</td>
</tr>
<tr>
<td>Relative to FY17 &amp; PF Increase</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>17</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0%</td>
<td>0%</td>
<td>0%</td>
<td>0%</td>
<td>0%</td>
</tr>
<tr>
<td>18</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>-11%</td>
<td>-31%</td>
<td>15%</td>
<td>-18%</td>
<td>0%</td>
</tr>
<tr>
<td>19</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td><strong>-40%</strong></td>
<td><strong>-68%</strong></td>
<td>7%</td>
<td><strong>-53%</strong></td>
<td>0%</td>
</tr>
</tbody>
</table>

- Balance increasingly difficult to maintain
- Closing the gap
  - **Memory capacity**: add non-volatile memory to nodes
  - **Memory bandwidth**: select components with on-package high bandwidth memory (HBM)
  - **I/O bandwidth**: add SSD-based tier to buffer I/O requests; place spinning disks in second tier
High-End System Comparison
Balanced Versus Unbalanced

- PetaFLOP/s
- Job duration (hours)
- Memory capacity (petabytes)
- Memory BW (petabytes/s)
- Interconnect bisection BW (petabits/s)
- Disk capacity (petabytes)
- I/O bandwidth (terabytes/s)
- 1/(interconnect latency) (1/milliseconds)

- HPCMP Onyx (2017)
- China TaihuLight (2016)
- ORNL Summit (2018)
- Exascale Reference (2023)
U.S. Productivity Growth★

Headwind: End of CMOS Shrinking

★ Bureau of Labor Statistics (8 Mar 2017): Change in Non-Farm Output Per Labor Hour
Chip Fabrication Challenges
Headwind: End of General Purpose Computing

Reference 1: silicon $\rightarrow$ 0.2nm covalent diameter

Reference 2: current lithography $\rightarrow$ 193nm wavelength

Feature Size
- 32nm – planar (T1-11/12)
- 22nm – tri-gate (T1-13/14/15)
- 14nm – tri-gate (T1-16/17)

Future fabrication methods - X-gate, $\Omega$-FET, gate-all-around, 3D stacking

Future lithography method - extreme UV (EUV) with 13.5nm wavelength

Future material - indium gallium arsenide (InGaAs) + indium phosphide (InP)
U.S. Productivity Growth (10Y-MA)

Headwind: Transition to New Productivity Driver

Post WWII Innovations

Proliferation of Computing, Internet, Smart Phones

Proliferation of Automation, Robots, Self Driving Vehicles, Additive Manufacturing

* Bureau of Labor Statistics (8 Mar 2017): Change in Non-Farm Output Per Labor Hour
HPCMP Architectures
Forecast for 2022

- **x86 (strong candidate)**
  - New competition from AMD (+)

- **POWER (weak candidate)**
  - Low volume expected (-)
  - Mainly used to feed GPGPUs (+/-)

- **GPGPU (strong candidate)**
  - Increasing interest in machine/deep learning within DoD (+)
  - Increasing number of libraries; otherwise, hard to program for physics-based problems (+/-)
  - Continued innovation from gaming space (+)

- **Many-core (weak candidate)**
  - Knights Hill cancelled (-)
  - Distrust of claims regarding many-core (-)

- **ARM64 (strong candidate)**
  - Precision and external BW/latency now competitive (+)
  - Floating point intensity expected to become competitive (+)
  - Continued innovation from smart phone space (+)
  - Strong interest from U.S., Europe, Japan, and China (+)
  - High volume expected (+)