I’m currently working on porting my ray tracng OpenCL kernel to Altera’s FPGA. Yes, I’m porting a OpenCL kernel. Its really frustrating working with Altera’s OpenCL SDK.
Altera’s OpenCL compiler sucks.
It took me less then 2 to implement all the host code and write a decent ray tracing kernel with BVH in OpenCL for my renderer. It runs fine on a GPU. Yet is took me more then 12 hours trying to get the kernel compile into FPGA bitstream and failing.
Here is a part my original code.
When I try to compile it with Altera’s OpenCL compiler (AOC). It comes up with this:
void checkArgMatches(llvm::Value*, unsigned int&, llvm::FunctionType*): Assertion `Elt->getType() == FTy->getParamType(ArgNo)' failed.
After some googling. Apparently aoc cannot build OpenCL kernels with non kernel functions. Really, no self defined function allowed? Really?
Fine, I’ll just inline all functions by hand. Then, aoc comes up with this crap.
. Program arguments: D:/altera/16.0/hld/windows64/bin/aocl-opt --acle [...HUGE AMOUNT OF GARBAGE TEXT...] -board e:/altera_pro/16.0/hld/board/..... 1. Running pass 'Function Pass Manager' on module 'raycastStream.1.bc'. 2. Running pass 'Scalarize: convert vector operations to scalar operations' on function 0x000000013FB87736 (0x0000000002B8F6A8 0x0000000000000000 0x0000000000B2EE40 0x0000000000000003) [... array of numbers ...] 0x000000013FCFBF86 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), ??4_Init_locks@std@@QEAAAEAV01@AEBV01@@Z() + 0x7E6 bytes(s) 0x00000000776C59CD (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), BaseThreadInitThunk() + 0xD bytes(s) 0x00000000778FA2E1 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), RtlUserThreadStart() + 0x21 bytes(s) Error: Optimizer FAILED.
Yup. The compiler crashes. It god damn crashes. Errrrrrrr
After hours of randomly guessing what causes aoc to crash. I got aoc to compile my kernel. Then it reports:
Error: Cannot fit kernel to hardware
Fine, maybe my kernel is too large for a Cyclone V SoC. I’ll make it slimmer. Let me have a look at the hardware generation report. I’m out of RAM cells… Why is
BVHNode node = bvh[workId]; const AABB aabb = node.aabb;
taking up 88 RAM cells? Why isn’t aoc(which is based on LLVM) optimizing this out. Even GCC does that. Gosh.
After fixing that by hand. aoc finally compiles my code. Then this is showed.
compiler error, not able to generate hardware
and this in the log.
Error (10232): Verilog HDL error at raytracer.v(12241): index 351 cannot fall outside the declared range [223:0] for vector "local_bb3_c2_ene3" File: C:/Users/Mars/Downloads/FPGACL/raytracer/system/synthesis/submodules/raytracer.v Line: 12241
To address this. I want to send Altera a message.