[Issue]: maybe_undef
on cooperative_groups::thread_block_tile_base::shfl
fails causing miscompile
#146
Labels
maybe_undef
on cooperative_groups::thread_block_tile_base::shfl
fails causing miscompile
#146
Problem Description
The argument to some GPU functions is frequently only initialized in some threads. Clang created the
maybe_undef
annotation for precisely this use case. Callingcooperative_groups::thread_block_tile_base::shfl
with a maybe-uninitialized argument results in a miscompile (at least vs the user intent. Not sure if the library or the compiler is at fault) where the compiler assumes that all threads have to take a branch where the variable is initialized.Operating System
22.04.5 LTS (Jammy Jellyfish)
CPU
AMD Ryzen Threadripper PRO 7975WX 32-Cores
GPU
AMD Instinct MI210
ROCm Version
ROCm 6.2.4
ROCm Component
clr
Steps to Reproduce
Execute with
#!/bin/bash /opt/rocm/lib/llvm/bin/clang++ \ -DHIP_ENABLE_WARP_SYNC_BUILTINS \ -D__HIP_ROCclr__=1 \ -g3 \ -ggdb3 \ --offload-arch=gfx90a \ -fPIE \ -O1 \ -std=gnu++17 \ -o miscompile_test.hip.o \ -x hip \ -c miscompile_test.hip /opt/rocm/lib/llvm/bin/clang++ \ -O0 \ -g3 \ -ggdb3 \ --offload-arch=gfx90a \ --hip-link \ --rtlib=compiler-rt \ -unwindlib=libgcc \ miscompile_test.hip.o \ -o miscompile_test \ /opt/rocm/lib/libamdhip64.so.6.2.60204 ./miscompile_test
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
Additional Information
With normal C++ semantics, if
warp_position
is uninitialized, the only way it gets defined before it gets shuffled is if it enters the if block, so the compiler concludes that it has to have entered theif
block and all threads do so.The
_shfl
operations have__attribute__((maybe_undef))
:clr/hipamd/include/hip/amd_detail/amd_warp_functions.h
Lines 130 to 136 in 3c863da
The groups-level version don't though:
clr/hipamd/include/hip/amd_detail/amd_hip_cooperative_groups.h
Lines 451 to 474 in 3c863da
I'm not really clear on whether that's something that would be propagated to the calling function.
As an aside, the documentation also says that the value argument can only be a 32-bit int or float, which is a bit weird, especially since it's templated and has a
static_assert
that just checks if it's integral or float, and delegates to the warp functions that support way more types than that. That documentation was just added. Seems maybe wrong? Changingwarp_position
toint32_t
doesn't fix anything though.I tried throwing
MAYBE_UNDEF
(the HIP Macro around the clang attribute) onthread_block_tile_base::shfl
(confirmed that this is the one getting called via printf and the debugger), but that doesn't fix the issue.Calling
__shfl
directly like__shfl(warp_position, 0, 64)
also fixes the issue. I'm not sure whether this is a bug in ROCm or [hip] clang.The text was updated successfully, but these errors were encountered: