-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calling coro::sync_wait with coro::ring_buffer::consume returns default constructed objects for complex return values #242
Comments
I've got a PR open to upgrade https://github.com/TartanLlama/expected/releases |
TEST_CASE("ring_buffer issue-242 default constructed complex objects on consume", "[ring_buffer]")
{
struct message
{
uint32_t id;
std::string text;
};
struct example
{
std::optional<message> msg;
};
coro::ring_buffer<example, 1> buffer;
const auto foo = [&buffer]() -> coro::task<void>
{
example data{};
data.msg = {message{ .id = 1, .text = "Hello World!" }};
std::cout << "Inside the coroutine\n";
std::cout << "ID: " << data.msg.value().id << "\n";
std::cout << "Text: " << data.msg.value().text << "\n";
co_await buffer.produce(std::move(data));
co_return;
};
coro::sync_wait(foo());
auto result = coro::sync_wait(buffer.consume());
REQUIRE(result);
auto data = std::move(*result);
REQUIRE(data.msg.has_value());
REQUIRE(data.msg.value().id == 1);
REQUIRE(data.msg.value().text == "Hello World!");
std::cout << "Outside the coroutine\n";
std::cout << "ID: " << data.msg.value().id << "\n";
std::cout << "Text: " << data.msg.value().text << "\n";
} This also seems to be reproducing the issue without protobuf, it fails on |
In auto return_value(return_type&& value) noexcept
{
m_return_value = std::addressof(value);
return final_suspend();
} It takes a pointer to the return object instead of moving/taking ownership. Later when |
No problem. Just tag me when you push it. Thanks for figuring this out. |
Its going to take a little more time, the fix isn't as easy as I had anticipated, I know the root cause but fixing it is having some cascading issues that were already fixed on |
No problem. There's no pressure to fix this. I just found a bug and though it was worth reporting. It's not currently affecting anything I'm working on. |
…lt constructed objects for complex return values There appears to be a bug in tl::expected with the usage of coro::sync_wait when the return_type is not trivially moveable. It causes the destructor of the users return_type to be called before the move constructor when calling the coroutine's promise.result() function. It is worth noting in the coro::task class this exact same method of returning the promise's result is used and works fine, but it doesn't use tl::expected, which likely points to a bug in that library. To work around this bug having a temporary value in the promise.result() call appears to make the compiler generate two move constructor calls, which is less than idea, but at least the users data isn't destroyed before it is returned. Closes #242
@ripose-jp I've added a PR that I think resolves the issue, the bug does appear to be in This is the important part of the PR if you want to take a look. |
…lt constructed objects for complex return values There appears to be a bug in tl::expected with the usage of coro::sync_wait when the return_type is not trivially moveable. It causes the destructor of the users return_type to be called before the move constructor when calling the coroutine's promise.result() function. It is worth noting in the coro::task class this exact same method of returning the promise's result is used and works fine, but it doesn't use tl::expected, which likely points to a bug in that library. To work around this bug having a temporary value in the promise.result() call appears to make the compiler generate two move constructor calls, which is less than idea, but at least the users data isn't destroyed before it is returned. Closes #242
…lt constructed objects for complex return values There appears to be a bug in tl::expected with the usage of coro::sync_wait when the return_type is not trivially moveable. It causes the destructor of the users return_type to be called before the move constructor when calling the coroutine's promise.result() function. It is worth noting in the coro::task class this exact same method of returning the promise's result is used and works fine, but it doesn't use tl::expected, which likely points to a bug in that library. To work around this bug having a temporary value in the promise.result() call appears to make the compiler generate two move constructor calls, which is less than idea, but at least the users data isn't destroyed before it is returned. Closes #242
…lt constructed objects for complex return values There appears to be a bug in tl::expected with the usage of coro::sync_wait when the return_type is not trivially moveable. It causes the destructor of the users return_type to be called before the move constructor when calling the coroutine's promise.result() function. It is worth noting in the coro::task class this exact same method of returning the promise's result is used and works fine, but it doesn't use tl::expected, which likely points to a bug in that library. To work around this bug having a temporary value in the promise.result() call appears to make the compiler generate two move constructor calls, which is less than idea, but at least the users data isn't destroyed before it is returned. Closes #242
After thinking about the current solution over night I'm not sure I want to merge this, but it does seem relatively clear its a bug in |
I'm personally not worried about your PR since it works with my example. This behavior is only only useful in unit testing for me since any production code dealing with If you're worried about it, replacing consume operations with Fixing |
Hmm, maybe its time to just roll a There is the option of upgrading to c++23 for |
That said, I don't know what the difficulty of maintaining a |
Yeah I tried replacing the sync_task with just coro::task and it's more difficult than it seems. coro::task might need a way to support the initial suspend and final suspend awaiters generically to make it work properly, it's the one piece that is different from sync_task. On a side note Tlthis also gave me the thought to do something similar in coro::task_container in the cleanup task by using a final awaiter. Fwiw I'm not planning on upgrading to c++23 anytime soon, would be 3+ years minimum for the next major standard drop, so it isn't a realistic solution today, just was mentioning it. I think I'll take a stab at coro::expected and see how difficult it is for the singular ring buffer usage. I appreciate the feedback and thoughts. |
I feel silly for not thinking of this earlier, but I just tested my example with C++23's |
Hmm that is interesting. Good idea, I'll play around with it as well. That does point to some kind of bug in libcoro then even though up until now it seems like it was tl::expected. Also confused why the same code in coro::task seems fine. |
So I tried it with |
Ok I finally got smart and ran the test through valgrind, and it is a use after free bug.
This probably works on simple types that fit in a register since its not calling the destuctor on the promise? The solution with the double move gets the return value off the promise so it isn't destructed on the |
I ran it with asan and it agrees with Valgrind about the use-after-free. I'd say you should merge your double move solution since correctness is going to be more important than performance. The only other way I can think of to get out of this is allocating return values in the heap since this bug doesn't affect trivially copyable types, but that comes with its own set of drawbacks. |
…lt constructed objects for complex return values There appears to be a bug in tl::expected with the usage of coro::sync_wait when the return_type is not trivially moveable. It causes the destructor of the users return_type to be called before the move constructor when calling the coroutine's promise.result() function. It is worth noting in the coro::task class this exact same method of returning the promise's result is used and works fine, but it doesn't use tl::expected, which likely points to a bug in that library. To work around this bug having a temporary value in the promise.result() call appears to make the compiler generate two move constructor calls, which is less than idea, but at least the users data isn't destroyed before it is returned. Closes #242
Agreed, I'm going to adjust some of the comments since it isn't a tl:: expected bug and then I'll get it merged. |
…lt constructed objects for complex return values Running the test that replicates the bug via valgrind or asan it shows that the compiler is calling sync_wait()'s promise's destructor before moving the complex object return value. This causes the object to be destructed and then the final move out is on deleted (use after free) memory causing the object to be in a bad empty state. To resolve this for now a double move has been introduced to move the complex object off the promise object and onto the sync_wait() function callstack. This seems to keep the object alive and not be destructed when sync_wait() finally returns. Other solutions could be to heap allocate the promise() object or even the return_type but this is probably more expensive than a double move, but will vary on use case. Closes #242
…lt constructed objects for complex return values (#244) Running the test that replicates the bug via valgrind or asan it shows that the compiler is calling sync_wait()'s promise's destructor before moving the complex object return value. This causes the object to be destructed and then the final move out is on deleted (use after free) memory causing the object to be in a bad empty state. To resolve this for now a double move has been introduced to move the complex object off the promise object and onto the sync_wait() function callstack. This seems to keep the object alive and not be destructed when sync_wait() finally returns. Other solutions could be to heap allocate the promise() object or even the return_type but this is probably more expensive than a double move, but will vary on use case. Closes #242
* CMake: Keep git hooks as optional (jbaldwin#234) * Keep git hooks as optional since they are not required for using the library, only for development. * Cmake LIBCORO_RUN_GITCONFIG=ON|OFF option * Introduce concepts::io_executor (jbaldwin#238) * Introduce concepts::io_scheduler The dns resolver hard links against coro::io_scheduler via coro::task_container<coro::io_scheduler>. Since the concepts::executor was introduced a while ago to make it possible to pass in different executors (or even user defined executors) the resolver has apparently been forgotten about. Lets introduce an io_executor (needs poll()) and make the resolver use an agnostic concepts::executor. The other reason for doing this is that the windows build is failing when LIBCORO_FEATURE_PLATFORM=OFF and building as a SHARED library since the coro::io_scheduler is sneaking in via the resolver class but then ultimately isn't compiled since its excluded from the build. Closes jbaldwin#237 * Remove LIBCORO_FEATURE_PLAFTORM It really is identical with LIBCORO_FEATURE_NETWORKING at this point, it just wasn't so obvious. * Add shared library option (jbaldwin#236) * It generates static only or shared libraries according to LIBCORO_BUILD_SHARED_LIBS option via the `cmake` `BUILD_SHARED_LIBS` to propagate to all libraries built by the project. * When generating shared library in Windows, it will produce bin/libcoro.dll and lib/libcoro.lib (exposed symbols to load in the dll) * The CMake definition `CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS` does not export static variables * To export semaphore's variables a new header file is created, the `coro/export.hpp`. It's generated by `cmake` using the method `generate_export_header`, and should manage `__declspec`. * The ctest does not include the DLL folder by default, so it is configured by using `set_tests_properties` * Fix use after free in coro::task_container (jbaldwin#240) In coro::task_container, this is potential for a use after free bug to occur between the final co_return of make_cleanup_task and another thread executing gc_internal. This can happen because there is a delay between the std::scoped_lock being destroyed upon the co_return of make_cleanup_task and its final suspend. This leads to a race condition where another thread can acquire the lock and initiate a garbage collection, where that cleanup task is destroyed before it's final suspend. This leads to a use after free. This commit fixes the issue by reworking how tasks get garbage collected. Instead of unconditionally destroying tasks scheduled for deletion, the gc_internal calls is_ready on the coro::task first. This function will only be true in the cases where the task has entered final suspend, has been default constructed, or has been destroyed. This means a task can get scheduled for deletion, but not be deleted when gc_internal is called. This allows time for the task to enter final suspend, which avoids the use after free bug. To make this implementation easier, several aspects of the implementation were changed. m_task_indexes was removed in favor of a queue which keeps track of free indices into m_tasks. This makes things easier since it removes the hard constraint of keeping used indices behind m_free_pos and free indices at or after. m_tasks_to_delete was then changed to a list in order to allow for constant time deletion of elements in place. This is helpful because it is possible for tasks to be queue for deletion, but not yet safe to delete. This means unconditionally clearing the m_tasks_to_delete data structure is no longer viable. * Upgrade tl::expected to 1.1 (jbaldwin#243) * Creating a CI workflow for macos, (jbaldwin#249) - Created ci-macos.yml - tweaked CMake to deal with clang17 location * Fix for higher version of C++ usage, Clang support, and some build policy change. (jbaldwin#246) * Enable test/example build only if project is top level. LIBCORO_FEATURE_NETWORKING and LIBCORO_FEATURE_TLS macro should set on Linux environment only. * Make compilable above the C++20 mode. * Missing include: <exception> * Update CMakeLists.txt * Calling coro::sync_wait with coro::ring_buffer::consume returns default constructed objects for complex return values (jbaldwin#244) Running the test that replicates the bug via valgrind or asan it shows that the compiler is calling sync_wait()'s promise's destructor before moving the complex object return value. This causes the object to be destructed and then the final move out is on deleted (use after free) memory causing the object to be in a bad empty state. To resolve this for now a double move has been introduced to move the complex object off the promise object and onto the sync_wait() function callstack. This seems to keep the object alive and not be destructed when sync_wait() finally returns. Other solutions could be to heap allocate the promise() object or even the return_type but this is probably more expensive than a double move, but will vary on use case. Closes jbaldwin#242 * Release v0.11 (jbaldwin#250) Closes jbaldwin#241 * Revert cmake PROJECT_IS_TOP_LEVEL (jbaldwin#252) This only appears to work with cmake >= 3.25 which isn't ready for most of the CI runs, so it will be reverted for now and re-worked moving forward. * Release v0.11.1 hotfix (jbaldwin#253) * Added missing header '<atomic>' when compiling with clang and c++23 (jbaldwin#254) * CI explicitly run CMAKE_CXX_STANDARD=20|23 (jbaldwin#257) CI Added explicit 20/23 standards for: * ci-fedora * ci-macos * ci-ubuntu * ci-windows Did not add 23 builds for * ci-opensuse * ci_emscripten Closes jbaldwin#256 * Initial try to include Conan in the CI Signed-off-by: Uilian Ries <[email protected]> --------- Signed-off-by: Uilian Ries <[email protected]> Co-authored-by: Josh Baldwin <[email protected]> Co-authored-by: ripose-jp <[email protected]> Co-authored-by: Bruno Nicoletti <[email protected]> Co-authored-by: LEE KYOUNGHEON <[email protected]>
When calling
coro::sync_wait()
with the result ofcoro::ring_buffer::consume()
it is possible forcoro::sync_wait()
to return a default constructed object instead of the actual object. This only seems to happen with complex objects as simple structs and basic types don't exhibit this behavior.From what I can tell, this happens somewhere in
tl_expected.hpp
after returning fromring_buffer::consume_operation::await_return()
.I've only ever been able to reproduce this bug using a protobuf message that contains a oneof field, so if this bug report seems highly specific, I apologize.
Here's a minimal example along with some build scripts to simplify the process of testing:
main.cpp
example.proto
CMakeLists.txt
The text was updated successfully, but these errors were encountered: