You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
is not checking the return value from amd::Comgr::LoadLib(). The result is that when the subsequent call to create_data_set is issued, we get a crash on due to a nullptr dereference:
#0 0x0000000000000000 in ?? ()
#1 0x000073a32a0ffd59 in amd::Comgr::create_data_set (data_set=0x73a28c045198) at /therock/src/sources/clr/rocclr/device/comgrctx.hpp:205
#2 hiprtc::RTCProgram::RTCProgram (this=0x73a28c045120, name=...) at /therock/src/core/clr/hipamd/src/hiprtc/hiprtcInternal.cpp:62
#3 0x000073a32a103b7c in hiprtc::RTCCompileProgram::RTCCompileProgram (this=0x73a28c045120, name_=...)
at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/basic_string.h:1070
Instead of just relying on a bool return value from LoadLib, which is easy to miss (especially with call_once), maybe add an additional argument abort_on_failure and put a crashWithMessage inside the LoadLib function. These kind of delay load situations are basically unrecoverable, and it is better to crash with an error message/information than just segfault on a null dereference.
The text was updated successfully, but these errors were encountered:
I found this issue while working with a non-standard directory layout, attached a debugger, and found the root cause of the segfault.
This code in clr/hipamd/hiprtc/hiprtcInternal.cpp:
is not checking the return value from
amd::Comgr::LoadLib()
. The result is that when the subsequent call to create_data_set is issued, we get a crash on due to a nullptr dereference:Instead of just relying on a bool return value from LoadLib, which is easy to miss (especially with
call_once
), maybe add an additional argumentabort_on_failure
and put acrashWithMessage
inside theLoadLib
function. These kind of delay load situations are basically unrecoverable, and it is better to crash with an error message/information than just segfault on a null dereference.The text was updated successfully, but these errors were encountered: