-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disconnection of network for workflow execution causes status stock as 'running' due to main thread kill while actual execution completes at backend thread #12798
Comments
I believe the root causes of this problem are as follows:
|
Sorry I forgot to mention about if stream or not for API case. What I tested was only streaming mode. I have not tested for non-streaming case but I believe it works well as non-streaming mode doesn't write anything on streaming channel. |
I've ran into this issue too. Connection may be closed during stream resonding, then main thread will raise a GeneratorExit by flask For the reason that the main thread is also responsible for DB operations, once it exits the workflow record(as well as workflow_node_execution record, message record, etc) will no longer be updated. This only happens in streaming mode. I think it will be better to decouple the db-operation and response from the main thread. |
Sounds good. Here is my understandings: main thread
backend thread
Better way is to move no.3 to additional other new thread. |
agreed |
Hi, @kazuhisa-wada. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale. Issue Summary:
Next Steps:
Thank you for your understanding and contribution! |
Definitely Yes. I also wanted to emphasize that this issue is a bit critical in terms of customer experience and so should be dealt with for v1.0.0 although this may be a difficult problem. |
@takatost, the user @kazuhisa-wada has confirmed that this issue is still relevant and critical for customer experience, especially for v1.0.0. Could you please assist them with this matter? |
I encountered the same issue and sincerely hope that Dify can resolve it as soon as possible; otherwise, it will significantly impact our product’s user experience. My situation might be somewhat unique, yet it makes reproducing the issue even easier—basically, it occurs over 50% of the time. The business logic is as follows: As a result, the problem mentioned in “thread #12798” keeps occurring, causing the workflow status to remain stuck on “running.” In reality, however, the workflow has completed (log analysis shows that the final step has produced results), and the tokens for the large model have already been consumed, which is extremely frustrating. BTW, I'm using SaaS Dify 0.15.3. |
Confirmation just in case, closed as it's been consolidated to #14362 ??? |
so far as I experienced, the issue remains in Cloud version 0.15.3. |
For my understanding, "running" forever is not a good ending nevertheless. when the API caller END the connection (abnormal or in purperse), the workflow status should reflex the TURE state of workflow. which means, if it runs to the end (in fact it is, and tokens have been consumed), the status should be "succeed". |
update: I tried Ali cloud functions yesterday, unfortunately the issue REMAINS the same. it seems that this issue affects ALL BAAS type of backend caller, including weChat cloud functions, Ali cloud functions, Amazon cloud functions, etc. Our business stuck in that bug now, please fix it or any suggestions , thanks a lot! |
update2: call back (when workflow is done) would be the perfect solution. Of cause that means a lot of work for dify engineer... |
Self Checks
Dify version
v.0.15.1
Cloud or Self Hosted
Cloud, Self Hosted (Docker), Self Hosted (Source)
Steps to reproduce
Disconnecting network(*) between client and dify In the execution of workflow via API or web UI (both debug run and webApp run) kills main thread and then causes status stuck as 'running', while its execution, generated by main thread, actually completes. Once workflow falls into this situation, there is no way to stop the workflow execution e.g. stop API doesn't work.
Disconnecting networks here means as follows:
This issue then causes following bad behaviors:
This issue seems serious as the actions which causes this issue can very easily be conducted and actually this issue frequently happens in our use.
I believe this is the result of not perfect handling of network disconnection which can cause main thread killing right now.
✔️ Expected Behavior
❌ Actual Behavior
Described above.
The text was updated successfully, but these errors were encountered: