For nearly a decade, theHPC (high-performance computing) market has been divided into two camps: CUDA and OpenCL. CUDA, of course, is Nvidia’s proprietary standard. It was first out of the gate in 2007 and its competitor, OpenCL, wouldn’t reach 1.0 status until just over two years later, in 2009. Unlike CUDA, OpenCL is supported by a number of companies, including Intel, Imagination Technologies, AMD, Qualcomm, and ARM.
Despite the above potential advantage, Nvidia has held the lion’s share of the HPC and supercomputing markets. According to the recent Top500 list, AMD’s GCN is used in three systems, compared with 66 systems for Nvidia’s Fermi + Kepler architectures and 28 systems that use Xeon Phi. There are also four hybrid systems that use both NV and Xeon Phi.
AMD’s Boltzmann Initiative is meant to change the status quo by offering developers and researchers a much-needed software stack that should boost the company’s competitiveness in the HPC market. AMD’s competitive weakness in HPC and scientific computing has never been about hardware — GCN’s raw compute performance, at least in certain types of problems, was far better than Nvidia’s Fermi or Kepler cards. (Maxwell has not been positioned as an HPC solution.) Nvidia, however, poured huge amounts of money into developing its CUDA ecosystem, including a great deal of support for HPC developers and scientific research.
Here’s Boltzmann at a high level. The goal is to improve the workloads where AMD can compete effectively, offer better tools for evaluating performance, improve Linux support (including a new 64-bit driver for headless Linux), and to allow implementation of a new HSA (heterogeneous system architecture) extension, HSA+. This last item won’t be folded into the larger HSA standard — it’s an AMD-specific extension meant to allow for a greater range of HSA features when used with discrete GPUs. This will also allow supported GPUs to “see” GPU and CPU memory as a unified space.
The major announcement today, however, concerns a new HSA compiler and AMD’s heterogeneous-compute interface for portability, or HIP.
The new HSA compiler (HCC) can compile for both CPUs and GPUs and leverages existing ecosystems built around Clang and LLVM as well as HSA itself. The goal is to allow developers to create CPU and GPU code in a single language and source file. OpenCL, even in v 2.0, requires separate source for GPU kernels — HCC eliminates this bottleneck. The goal is to provide an ecosystem that developers can target and use more easily — something Nvidia did with CUDA. It should be easier for developers to optimize code for parallel execution. The new compiler will also include support for GCN-specific features, like asynchronous compute and GCN’s cache structure.
These types of features can bring AMD’s capabilities more in line with Nvidia’s, but that’s not sufficient to meaningfully dent the market. That’s where HIP comes in.
Hipify Tools: Translating CUDA source to run on AMD GPUs
As AMD describes it, HIP accomplishes several goals. It allows developers accustomed to Nvidia’s CUDA to develop using similar syntax. It includes a new toolset (Hipify Tools) that can convert CUDA code to HIP code. And once code is written in HIP (whether converted from CUDA or written that way initially) it can be compiled to target either Nvidia or AMD GPUs — using either Nvidia CUDA compiler (NVCC) or AMD’s HCC.
Just to be clear, Hipify Tools doesn’t run CUDA applications on AMD chips. Instead, it performs a source-to-source translation that’s meant to make it easy for developers to target either architecture. We asked AMD what the typical performance hit looked like for performing this task, and the company told us that in general-use cases, the performance hit is effectively zero. If a developer has specifically targeted a specific NV architecture with a great deal of optimization for each GPU, then it would take more time to optimize the same cases for GCN — but that the code would work out of the box, even in those cases.
In short, developers who are curious about FirePro and GCN performance and want to take their code for a test driver should find it much easier to do so. Out-of-the-box compatibility is useful for testing use-cases, even if it takes some additional optimization to bring performance up. Anandtech put together a useful image that showcases what HIP and HCC can do:
Unlike what’s been reported by some publications, AMD does not execute CUDA on GCN, CUDA applications are not analyzed or reverse-engineered, and AMD is not compiling these applications into OpenCL. The point of HIP is to allow for a vendor-neutral approach that targets either NVCC or HCC.
A huge step forward
AMD still has much to do to establish a place for itself in the HPC market, and it’s not clear how quickly Hipify Tools will adapt to newer versions of CUDA. AMD has told us that any interested developer will be able to sign up for the program beginning in Q1 2016, but that the tools will require a FirePro card to run. That’s unfortunate, because it limits the available audience for the card to developers who either own FirePros or are willing to fork over serious scratch to develop with them. AMD might have been better served by opening the software to consumer hardware.
Then again, this is just the beta, and it’s possible that AMD will expand compatibility in the longer term.
Either way, this project has the potential to reinvent AMD’s approach to the HPC space. While that market is small in absolute terms, it’s far more lucrative. It also gives AMD a seat at the table and the option to fight alongside Intel and Nvidia for market access to supercomputing over the long term. Overall, if AMD keeps attention on the product, it could help developers take much better advantage of the company’s hardware in a wide array of software environments.