About MIOpen’s insane HIP backend

Few months ago a kind gentleman asked me if I can get MIOpen’s HIP backend working on Nvidia’s GPU. And thus I dug deeper into MIOpen’s code. Here is what I have learned so far.


The HIP banked is insane. And I didn’t succeed in making MIOpen’s HIP backend working with CUDA.


MIOpen  is AMD’s deep learning library designed to feel and work like Nvidia’s cuDNN. In fact it is almost a drop-in replacement or cuDNN. You initialize MIOpen with miopenCreate,  prepare convolution by miopenGetConvolutionDescriptor or miopenFindConvolutionForwardAlgorithm, run convolutions calling miopenConvolutionForward. Then call miopenDestory to clean up when you are done with everything.

Better than cuDNN. MIOpen supports both OpenCL and HIP. So developers can accelerate both applications written in both APIs. Nice.

… But AMD must have some way to sharing kernels between the OpenCL backend and  the HIP backend. Otherwise they’ll have to maintain 2 separate kernels. That is just not piratical.

The OpenCL backend

The OpenCL backend in MIOpen is straight forward. miopenCreate sets up the environment,  miopenGetConvolutionDescriptor builds the convolution kernel and miopenDestory cleans up the stuff.

But looking at MIOpen’s directory. What on Earth is src/kernels/conv3x3wrw.s? After a bunch of googling I found that it is LLVM assembly. These seems to be hand tuned LLVM assembly kernels.

The HIP backend

Looking MIOpen’s directory tree. You’ll find that there are .cl files but no .hip /.cu anywhere. So we know MIOpen must be sharing the kernel between OpenCL and HIP. But how? I initially think it shares code by letting HIP loading binary or IR from a compiled OpenCL kernel. Hell I’m wrong.

2017-11-20 17-47-23 的螢幕擷圖.png
No CUDA and HIP files anywhere

While I’m trying to force MIOpen to compile over NVIDIA’s nvcc compiler. I came across a weird variable. What the heck is HIP_OC_COMPILER? It is set to /opt/rocm/hip/bin/hipcc by default, but what OC is standing for? The only reasonable thing I can think of is OpenCl. And it is! MIOpen is using hipcc to compile OpenCL code.

Here is the proof. In src/hipoc/hipoc_program.cpp you find this piece of code.

std::string filename =
is_kernel_str ? "tinygemm.cl" : program_name;
dir->Execute(HIP_OC_COMPILER, params + " " + filename +
" -o " + hsaco_file.string());

It literary tries to compile OpenCL kernels with HIP!!

In fact looking at the entire function that builds kernels:

void BuildModule(const std::string& program_name, std::string params, bool is_kernel_str)
    std::string filename =
        is_kernel_str ? "tinygemm.cl" : program_name; // jn : don't know what this is
    hsaco_file = dir->path / (filename + ".o");

    std::string src = is_kernel_str ? program_name : GetKernelSrc(program_name);
    if(!is_kernel_str && miopen::EndsWith(program_name, ".so"))
        WriteFile(src, hsaco_file);
    else if(!is_kernel_str && miopen::EndsWith(program_name, ".s"))
        AmdgcnAssemble(src, params);
        WriteFile(src, hsaco_file);
        WriteFile(src, dir->path / filename);

        params += " -Werror" + KernelWarningsString();
        params += " -Wno-everything";
        dir->Execute(HIP_OC_COMPILER, params + " " + filename + " -o " + hsaco_file.string());

Interestingly the HIP backend can also load .so files as it’s source of kernels. But is_kernel_str ? "tinygemm.cl" : program_name; seems to prohibit it.

Fun fact.In src/hip/handlehip.cpp there is a function GetDeviceName

std::string Handle::GetDeviceName()
    hipDeviceProp_t props{};
    hipGetDeviceProperties(&props, this->impl->device);
    std::string n("gfx" + std::to_string(props.gcnArch));
    return GetDeviceNameFromMap(n);

Imagine running this on a NVIDIA card and get something like gfx Pascal back. That makes me laugh so hard.

Correct me if someone knows what it will really return on a NV card. I never acuatlly tried it.

Lesson learned

  1. hipcc can compile OpenCL code
  2. MIOpen uses OpenCL kernels even if it is running in HIP mode
  3. MIOpen contains GCN assembly code
  4. Maybe MIOpen can load and use binary kernels (in a shared object form)
  5. AMD is insane with their compiler. But it works!

A lot of work will be needed to port MIOpen’s HIP backend to CUDA. Either by using pocl‘s CUDA backend to compile OpenCL into CUDA code or by writing some macros to make nvcc thinks that is is compiling CUDA code. And I’ll have to come up with a way to deal with the assembly kernels.


One thought on “About MIOpen’s insane HIP backend

Add yours

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.

Up ↑

%d bloggers like this: