

## Coherent Accelerator Processor Interface (CAPI)

## Overview Package

**CAPI** Developer Kit





## Coherent

Accelerator

# Processor Interface



#### FPGA as an Accelerator





- FPGA: Field Programmable Gate Array
  - It's a re-programmable chip
  - It can run fast (cycle times of 250 500 Mhz or more)
  - It has Industry Standard Interfaces like PCI-E Gen3
  - The Major FPGA Suppliers, Altera and Xilinx, are OpenPOWER Foundation members

Source code for FPGAs has traditionally been written in RTL\* (VHDL\*\* or Verilog).

Now, we also have OpenCL, a more

programmer friendly language.



ATTERA, & XILINX.







#### When to Use FPGAs

- Transistor Efficiency & Extreme Parallelism
  - Bit-level operations
  - Variable-precision floating point
- Power-Performance Advantage
  - >2x compared to Multicore (MIC) or GPGPU
  - Unused LUTs are powered off
- Technology Scaling better than CPU/GPU
  - FPGAs are not frequency or power limited yet
  - 3D has great potential
- Dynamic reconfiguration
  - Flexibility for application tuning at run-time vs. compile-time
- Additional advantages when FPGAs are network connected ...
  - allows network as well as compute specialization



#### Why is an Accelerator Faster?







Question: The POWER8 Processor runs at ~3Ghz while our

FPGA runs at 250Mhz. So why would an accelerator

be better?

**Answer:** The FPGA is better for certain algorithms, such as

those that are numerical intensive or have parallelism.

The POWER8 processor has a finite set of instructions

to implement the algorithm in SW.

The FPGA is customized logic built for specific

processing of an algorithm.



## Why is an Accelerator Faster?







#### **Example 1: Numerical Intensive Algorithm**

$$\int P(x)dx = a_0 + \sum_{n=1}^k \left( a_n \cos \frac{n\pi v}{L} + a_n \sin \frac{n\pi w}{L} \right)$$



Variables





### Why is an Accelerator Faster?







#### Example 2: Parallelism

Monte Carlo Risk Analysis to determine probability of financial success:
Given current finances, run 100 scenarios







## Coherent

Accelerator

Processor Interface Accelerators on FPGAs have been around for a long time....

So what is new?

Coherency makes the accelerator a peer to the POWER8 cores



#### What was done before CAPI?



Prior to CAPI, an application called a device driver to utilize an FPGA Accelerator.

The device driver performed a memory mapping operation.





## **CAPI** Coherency



#### With CAPI, the FPGA shares memory with the cores





#### **CAPI vs. I/O Device Driver: Data Prep**









#### **CAPI** Differentiation



#### CAPI vs. I/O or Socket FPGA Solution

#### **IBM** Innovation

Customer Impact

FPGA is a peer to the processor
-- Caching and translations by PSL



Simple Programming paradigm Higher performance

Architecture allows for any kind of FPGA or even an ASIC



Flexible solutions Connection to Flash, FC, EN....

Virtualization in the Architecture



Applications can share Accelerator







**CAPI Paradigm** 



#### **POWER8 Processor**



#### Let's take a closer look at how IBM Engineers made CAPI work

#### **Technology**

• 22 nm SOI, eDRAM, 15 ML 650 mm2

#### Cores

#### • 12 cores (SMT8)

- 8 dispatch, 10 issue, 16 execution pipes
- 2x internal data flows/queues
- Enhanced prefetching
- 64 KB data cache,
  32 KB instruction cache

#### **Accelerators**

- Crypto and memory expansion
- Transactional memory
- VMM assist
- Data move/VM mobility

#### POWER8 Scale-Out Dual Chip Module





#### Caches

- 512 KB SRAM L2 / core
- 96 MB eDRAM shared L3

#### Memory

 Up to 230 GB/s sustained bandwidth

#### **Bus Interfaces**

- Durable open memory attach interface
- Integrated PCIe Gen3
- SMP interconnect
- CAPI

#### **Energy Management**

· On-chip power management microcontroller



#### **How CAPI Works**







#### POWER8 with CAPI Cards



#### Front View





## Basic concepts of CAPI



#### CAPI vs. CAPI Solutions

- <u>CAPI</u> is a platform to enable acceleration
- <u>CAPI</u> provides an infrastructure to improve performance of an application through FPGA acceleration
  - Enables customer-defined acceleration within the processor complex
- <u>CAPI</u> allows implementation of a wide range of accelerators to optimally address many different customer challenges
  - Each implementation is a unique <u>CAPI Solution</u>
- A <u>CAPI Solution</u> is a specific implementation of an algorithm that uses an FPGA + application
- A <u>CAPI Solution</u> requires logic designers and programmers to implement the solution
- CAPI Solution Examples:
  - Flash Appliance (IBM Data Engine for NoSQL)
  - MonteCarlo Algorithm

Platform for Innovation

Specific Customer Solution





- We compared three paradigms:
  - 1. Software
  - 2. IO attached Accelerator
  - 3. CAPI attached Accelerator
- We showed the components that make CAPI work
  - 1. POWER8 Hardware: CAPP, PSL, Coherent Memory, PCIE
  - 2. OS Extensions
  - 3. Customer's Solution Algorithm and Application



### Why Accelerate on CAPI?



- Reasons to consider CAPI Acceleration
  - Higher Performance
    - If your customer has a complex application running on a core, consider CAPI for better performance
    - If your customer already does I/O attached FPGA acceleration, CAPI will simplify their software and provide better performance
  - Lower IT Costs
    - By moving workload to CAPI, your customer will need fewer cores
    - In some cases, such as the IBM Data Engine for NoSQL, CAPI can do the same work with far less infrastructure
  - Lower Power
    - Running acceleration on an FPGA can result in lower power consumption vs. running the application as software on a core

#### Note:

When considering CAPI for a particular solution, we compare it to:

- 1. The same solution running as software –OR–
- 2. The same solution running on an IO attached FPGA



## CAPI ecosystem partners and consumers



Have a client who wants their IBM Application to be accelerated on CAPI? (ex: DB2, CPLEX, Streams)
Contact: Jonathan Dement (dementj@us.ibm.com)

CAPI-APPS
For
Clients

Have a client or partner who wants to create a CAPI-App and sell it to others? Point them to the CAPI resources in this doc (IBM and Nallatech websites) and email Bruce Wile (bwile@us.ibm.com) about the opportunity

IBM CAPI Solutions
IBM Data Engine for NoSQL

AMILES DA.

Contraction

Contra

Partner Solutions

Why tell Bruce Wile about

the opportunity?

Clients with their 
Own Proprietary Solutions

Depending on the size of the opportunity, we will engage the CAPI Customer Enablement Team

Have a client or partner who wants to create a proprietary CAPI Solution? Point them to the CAPI resources in this doc (IBM and Nallatech websites) and email Bruce Wile (bwile@us.ibm.com).



#### Two Paths into CAPI





IBM & Partners create business solutions for the CAPI Market.
Clients buy pre-packaged solutions from the CAPI Market.

proprietary business solution.



## **CAPI** Solutions





### Open Development Driving CAPI Solutions







#### **Potential Markets for CAPI Solutions**







## **CAPI** Availability



- See: <a href="http://www.ibm.com/support/customercare/sas/f/capi/home.html">http://www.ibm.com/support/customercare/sas/f/capi/home.html</a>
- CAPI Developer Kit
  - Procure through Nallatech





- For customers considering creating their own CAPI Solution
  - -CAPI Decision and Process Guide
- Requires POWER8 Server
- Available now
- See www.nallatech.com/capi
- First CAPI Solution:

- Procure through IBM
- GA in early 2015





#### Nallatech's CAPI Developer Kit Contents

|                                   | Hardware Components                                    | Software / IP Components                                       | Tools                                                | Documentation                     |
|-----------------------------------|--------------------------------------------------------|----------------------------------------------------------------|------------------------------------------------------|-----------------------------------|
| Included in CAPI Developer<br>Kit | Nallatech 385 FPGA Accelerator                         | IBM CAPI Power Service Layer (PSL) (Encrypted FPGA IP)         | Altera Quartus FPGA Tools                            | White Paper and Decision<br>Guide |
|                                   | Nallatec JTAG Debug Kit                                | CAPI Host Support Library (libcxl)                             | PSL Simulation Engine                                | CAPI User's Guide                 |
|                                   |                                                        | 'Memcopy' Example                                              |                                                      | 385 FPGA Card User Guide          |
| Also Required                     | Power 8 System<br>(IBM Model 8247-21L or 8247-<br>22L) | CAPI Enabled O/S<br>(initially Ubuntu 14.10 LE from Canonical) | HDL Simulator<br>(i.e. Cadence, Mentor,<br>Synopsis) |                                   |



## CAPI Developer Kit – FPGA Card



#### Nallatech's CAPI Developer Kit Contents







#### Nallatech's CAPI Developer Kit Contents

|                                   | Hardware Components                                    | Software / IP Components                                      | Tools                                                | Documentation                     |
|-----------------------------------|--------------------------------------------------------|---------------------------------------------------------------|------------------------------------------------------|-----------------------------------|
| Included in CAPI Developer<br>Kit | Nallatech 385 FPGA Accelerator                         | IBM CAPI Power Service Layer (PSL) (Encrypted FPGA IP)        | Altera Quartus FPGA Tools                            | White Paper and Decision<br>Guide |
|                                   | Nallatec JTAG Debug Kit                                | CAPI Host Support Library (libcxl)                            | PSL Simulation Engine                                | CAPI User's Guide                 |
|                                   |                                                        | 'Memcopy' Example                                             |                                                      | 385 FPGA Card User Guide          |
| Also Required                     | Power 8 System<br>(IBM Model 8247-21L or 8247-<br>22L) | CAPI Enabled O/S<br>Unitially Ubuntu 14.10 LE from Canonical) | HDL Simulator<br>(i.e. Cadence, Mentor,<br>Synopsis) |                                   |



IBM POWER8<sup>TM</sup> Server





#### Nallatech's CAPI Developer Kit Contents =

|                                   | Hardware Components                                    | Software / IP Components                                       | Tools                                                | Documentation                     |
|-----------------------------------|--------------------------------------------------------|----------------------------------------------------------------|------------------------------------------------------|-----------------------------------|
| Included in CAPI Developer<br>Kit | Nallatech 385 FPGA Accelerator                         | IBM CAPI Power Service Layer (PSL) (Encrypted FPGA IP)         | Altera Quartus FPGA Tools                            | White Paper and Decision<br>Guide |
|                                   | Nallatec JTAG Debug Kit                                | CAPI Host Support Library (libcxl)                             | PSL Simulation Engine                                | CAPI User's Guide                 |
|                                   |                                                        | 'Memcopy' Example                                              |                                                      | 385 FPGA Card User Guide          |
| Also Required                     | Power 8 System<br>(IBM Model 8247-21L or 8247-<br>22L) | CAPI Enabled O/S<br>(initially Ubuntu 14.10 LE from Canonical) | HDL Simulator<br>(i.e. Cadence, Mentor,<br>Synopsis) |                                   |





#### Nallatech's CAPI Developer Kit Contents

|                                   | Hardware Components                                    | Software / IP Components                                       | Tools                                                | Documentation                     |
|-----------------------------------|--------------------------------------------------------|----------------------------------------------------------------|------------------------------------------------------|-----------------------------------|
| Included in CAPI Developer<br>Kit | Nallatech 385 FPGA Accelerator                         | IBM CAPI Power Service Layer (PSL) (Encrypted FPGA IP)         | Altera Quartus FPGA Tools                            | White Paper and Decision<br>Guide |
|                                   | Nallatec JTAG Debug Kit                                | CAPI Host Support Library (libcxl)                             | PSL Simulation Engine                                | CAT User's Guide                  |
|                                   |                                                        | 'Memcopy' Example                                              |                                                      | 385 FPGA Card User Guide          |
| Also Required                     | Power 8 System<br>(IBM Model 8247-21L or 8247-<br>22L) | CAPI Enabled O/S<br>(initially Ubuntu 14.10 LE from Canonical) | HDL Simulator<br>(i.e. Cadence, Mentor,<br>Synopsis) |                                   |

http://www.ibm.com/support/customercare/sas/f/capi/home.html



## IBM Data Engine for NoSQL



## **Capturing the growth in Big Data**

- Growth of NoSQL solutions is explosive
  - In Memory requirements drive large infrastructures and cost
  - Deployment complexity limits growth
- IBM CAPI-Flash delivers a new size/performance price point
  - Equivalent end to end performance
  - Significantly lower deployment and operational costs
  - Reduce infrastructure complexity
- 100% Redis Compliant
  - Your redis applications just work!
  - No lock in
  - You choose the amount of real memory vs flash memory and we handle the rest
  - Huge savings 3x cheaper and nearly the same performance (for well behaved access patterns)









## Innovative "In-Memory" NoSQL/KVS







#### Demonstrating the Value of CAPI Attachment







## IBM Data Engine for NoSQL



## Ways to get Started:

- Move your REDIS or MEMCACHED application over transparently
  - BigRedis and Memcached from Redis Labs is 100% compliant with current client APIs.
  - Your redis applications just work!
- Need a different NoSQL?
  - Talk to IBM as we are working with other NoSQL providers and you may be able to join a Proof of Concept when available
- Want to move your application/service directly to or KVS or Block APIs?
  - NDA release of APIs
  - Development Tools & Education available
    - Dev Kit available for Power8 Systems
    - Use time on Systems available online
    - Engage IBM for a Proof of Concept project



#### Workloads to Innovate



- Start with what FPGAs are good at: Embarrassingly Parallel Problems
- Combine with CAPI strengths:
  - Ease of programming
  - Lack of device driver
  - Shared memory & caching (host to accelerator communication)
- What do you get:
  - Bitwise data manipulation (e.g. Deep Compression)
  - Pattern recognition
  - Encryption
  - Monte Carlo

Statistical modeling for complex predictions

Image Analytics & Biometrics

Facial recognition

Feature detection (e.g. cancer)

- Network Packet Processing & Inspection
- Bioinformatics (e.g. Sequence alignment)
- Reverse time migration (Oil & Gas)
- Ensemble Calculations of Numerical Weather Prediction
- Machine Learning
- And on and on



#### **IBM Accelerated GZIP Compression**



#### What it is:

 An FPGA-based low-latency GZIP Compressor & Decompressor with single-thread throughput of ~2GB/s and a compression rate significantly better than low-CPU overhead compressors like snappy.

## Human Whole Genome 3GB (% of orig.) (hg19, GRCh37) 2/2009





#### **IBM Accelerated Text Processing**



**Annotations** 

#### What it is:

 A compiler/runtime system for accelerating text analytics on a sharedmemory CPU-FPGA

#### Results

Big Speedup vs. Multithread SW



#### To appear @:

Hot Chips 2014

#### AQL

- rule language
- SQL-like syntal

For years, Microsoft Corporation
CEO Bill Gates was against open
source. But today he appears to
have changed his mind. "We can
be open source"



## FPGA Image & Video Processing



#### Information Extraction

**Object Recognition** 

Edge Detection, Feature Extraction, Segmentation

Template Matching

Extract *relevant information* from input image to enable *object recognition*Information located where pixels change color (edges, blobs)

- > Intrinsic properties of objects
- > Object boundaries

Design fully-pipelined FPGA architectures -> streaming application

Real-time, low-power, onboard image processing solution

> Sobel and Canny: extract contours/edges

SURF: extract scale & rotation-invariant features

Applications requiring edge detection & feature extraction span a wide range of domains

O O

- > Computer/Machine Vision: Tracking, Object Recognition & Navigation
- > General image proc.: Compression
- > Quality Control: Unsupervised Defect Identification

Medical Imaging: Analysis + Diagnosis & Computer Guided Surgery



мопуап



#### **Directions for CAPI Ideas**









© Copyright International Business Machines Corporation 2014

Printed in the United States of America September 2014

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at <a href="https://www.ibm.com/legal/copytrade.shtml">www.ibm.com/legal/copytrade.shtml</a>.

The following terms are trademarks or registered trademarks licensed by Power.org in the United States and/or other countries: Power ISA. Information on the list of U.S. trademarks licensed by Power.org may be found at <a href="https://www.power.org/about/brand-center/">www.power.org/about/brand-center/</a>. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, and service names may be trademarks or service marks of others.

All information contained in this document is subject to change without notice. The products described in this document are NOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction could result in death, bodily injury, or catastrophic property damage. The information contained in this document does not affect or change IBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of IBM or third parties. All information contained in this document was obtained in specific environments, and is presented as an illustration. The results obtained in other operating environments may vary.

While the information contained herein is believed to be accurate, such information is preliminary, and should not be relied upon for accuracy or completeness, and no representations or warranties of accuracy or completeness are made.

**Note:** This document contains information on products in the design, sampling and/or initial production phases of development. This information is subject to change without notice. Verify with your IBM field applications engineer that you have the latest version of this document before finalizing a design.

You may use this documentation solely for developing technology products compatible with Power Architecture®. You may not modify or distribute this documentation. No license, express or implied, by estoppel or otherwise to any intellectual property rights is granted by this document.

THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN "AS IS" BASIS. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document.

IBM Systems and Technology Group 2070 Route 52, Bldg. 330 Hopewell Junction, NY 12533-6351

The IBM home page can be found at ibm.com®.

Version 1.0 29 September 2014—IBM Confidential