Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Implement scheduled handling for end status transaction #7133

Merged
merged 33 commits into from
Feb 18, 2025

Conversation

YongGoose
Copy link
Contributor

@YongGoose YongGoose commented Jan 31, 2025

  • I have registered the PR changes.

Ⅰ. Describe what this PR did

  • Add compensation handling for TimeoutRollbacked and Rollbacked states

Ⅱ. Does this pull request fix one issue?

fixes #6875

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

Comment on lines 194 to 195
GlobalStatus.TimeoutRollbacked,
GlobalStatus.Rollbacked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

由于rollback的概率是非常小的,并且rollbacked和TimeoutRollbacked的状态残留的概率更低,故这两种状态是特意放置在server重启时进行处理。
Since the probability of rollback is very low, and the chances of the rollbacked and TimeoutRollbacked states persisting are even lower, these two states are specifically handled during server restart.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

由于rollback的概率是非常小的,并且rollbacked和TimeoutRollbacked的状态残留的概率更低,故这两种状态是特意放置在server重启时进行处理。 Since the probability of rollback is very low, and the chances of the rollbacked and TimeoutRollbacked states persisting are even lower, these two states are specifically handled during server restart.

Thank you for your comment :)

I didn’t fully understand.

Currently, in DefaultCoordinator, there is compensation handling for the committed state, but there is no compensation handling for the TimeoutRollbacked and Rollbacked states.

Based on the comment above, I added these two cases to the array for management.
If any changes are needed, could you please provide additional explanations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the probability of a transaction rollback occurring is very low, the likelihood of the rollbacked intermediate state persisting is even lower. After the rollbacking state, it transitions to rollbacked and is then deleted. These two actions are separated by just one I/O operation, meaning that once the status is successfully updated to rollbacked, the transaction will be immediately deleted, unless a network or storage-side exception occurs during deletion, preventing the transaction from being deleted. Otherwise, there will be no rollbacked transactions left.
The commit of a transaction is a high-probability event, so the likelihood of the committed state persisting is much higher than that of rollbacked. To prevent storage-side pressure and meet performance requirements, committed transactions need to be processed during the server's runtime, while the low-probability rollbacked transactions are handled during the server startup.
Since both of these probabilities are very low, the final result is that they are both deleted. If you think it is necessary to handle rollbacked transactions during runtime, I suggest that you create a separate scheduled task to batch process transactions in the rollbacked and committed states, rather than blocking transactions that urgently need to be rolled back and retried.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think it is necessary to handle rollbacked transactions during runtime, I suggest that you create a separate scheduled task to batch process transactions in the rollbacked and committed states, rather than blocking transactions that urgently need to be rolled back and retried.

Thank you for your suggestion!

The idea you proposed is very interesting, and I’m planning to work on it. :)

Additionally, I’ve posted a message on the Seata project’s issue list and mailing list regarding this topic.
I’ll leave the links below—could you kindly take a look when you have a chance?

Thanks again! 🚀

Copy link
Contributor Author

@YongGoose YongGoose Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@funky-eyes

Done in fc28796 !
PTAL ⚡️

@YongGoose
Copy link
Contributor Author

@funky-eyes

In the issue description, it was mentioned that the state remains unchanged unless the server is restarted.
This means that the data will persist unless the server is restarted.

So, aside from handling it specifically during a server restart, I believe additional measures should be taken.

WDYT?

@funky-eyes funky-eyes added first-time contributor first-time contributor module/server server module and removed first-time contributor first-time contributor labels Feb 5, 2025
@funky-eyes
Copy link
Contributor

Do you use the DingTalk app?

@YongGoose
Copy link
Contributor Author

Do you use the DingTalk app?

Unfortunately, no..

However, if using DingTalk allows for better communication, I’m willing to install it!

@funky-eyes
Copy link
Contributor

Do you use the DingTalk app?

Unfortunately, no..

However, if using DingTalk allows for better communication, I’m willing to install it!

Of course, email is also a great communication channel, but I would like to invite you to join the Seata contributors' DingTalk group if you're willing to install it. If you install it, please send your DingTalk ID to my email, and I will invite you to join the Seata contributors' group.

Copy link

codecov bot commented Feb 8, 2025

Codecov Report

Attention: Patch coverage is 34.54545% with 36 lines in your changes missing coverage. Please review.

Project coverage is 52.17%. Comparing base (2c31bdd) to head (bd9f7dd).
Report is 1 commits behind head on 2.x.

Files with missing lines Patch % Lines
...e/seata/server/coordinator/DefaultCoordinator.java 34.21% 23 Missing and 2 partials ⚠️
...org/apache/seata/server/session/SessionHelper.java 0.00% 7 Missing ⚠️
...org/apache/seata/server/session/GlobalSession.java 33.33% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##                2.x    #7133      +/-   ##
============================================
- Coverage     52.20%   52.17%   -0.03%     
- Complexity     6808     6811       +3     
============================================
  Files          1154     1154              
  Lines         41058    41112      +54     
  Branches       4813     4818       +5     
============================================
+ Hits          21434    21451      +17     
- Misses        17585    17621      +36     
- Partials       2039     2040       +1     
Files with missing lines Coverage Δ
...ava/org/apache/seata/common/ConfigurationKeys.java 0.00% <ø> (ø)
...c/main/java/org/apache/seata/common/Constants.java 100.00% <ø> (ø)
...in/java/org/apache/seata/common/DefaultValues.java 0.00% <ø> (ø)
...oconfigure/properties/server/ServerProperties.java 100.00% <100.00%> (ø)
...org/apache/seata/server/session/GlobalSession.java 68.32% <33.33%> (-0.77%) ⬇️
...org/apache/seata/server/session/SessionHelper.java 46.10% <0.00%> (-2.02%) ⬇️
...e/seata/server/coordinator/DefaultCoordinator.java 40.05% <34.21%> (-0.86%) ⬇️

... and 5 files with indirect coverage changes

@YongGoose YongGoose requested a review from funky-eyes February 12, 2025 10:09
@YongGoose YongGoose changed the title Add compensation handling for TimeoutRollbacked and Rollbacked states feature: Implement Scheduled Handling for Rollbacked and TimeoutRollbacked Transaction Feb 12, 2025
* @return time to dead session. if not greater than 0, then deadSession
*/
public long timeToDeadSession() {
if (this.status == GlobalStatus.Rollbacked || this.status == GlobalStatus.TimeoutRollbacked) {
return beginTime - System.currentTimeMillis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The beginTime does not change after the creation of the global session, which will cause any global session to be added to the needDoRollbackedSessions list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented it this way to set a slightly shorter interval for sessions in the end state.

If possible, could you please provide a more detailed explanation? 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The beginTime is set when the transaction is created. When the Seata server's business thread changes the transaction status from "Rollbacking" to "Rollbacked," your timeToDeadSession method directly subtracts the current timestamp from beginTime, which will inevitably result in a negative value. As a result, the business thread will delete the transaction, and your scheduled task will also delete the transaction, leading to a concurrency issue and thread safety problem. I believe that beginTime should be increased by the corresponding transaction timeout, plus an additional buffer time to ensure thread safety.

Copy link
Contributor Author

@YongGoose YongGoose Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, would it be appropriate to create a RETRY_DEAD_THRESHOLD specifically for the end state?

Of course, this timeToDeadSession method is more relevant for ongoing statuses like "rollbacking" or "committing." For transactions in the "end" status, I believe the interval can be shortened.

As suggested in the discussion, since we can reduce the interval in the end state, I believe decreasing the size of RETRY_DEAD_THRESHOLD could be a reasonable approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The beginTime is set when the transaction is created. When the Seata server's business thread changes the transaction status from "Rollbacking" to "Rollbacked," your timeToDeadSession method directly subtracts the current timestamp from beginTime, which will inevitably result in a negative value. As a result, the business thread will delete the transaction, and your scheduled task will also delete the transaction, leading to a concurrency issue and thread safety problem. I believe that beginTime should be increased by the corresponding transaction timeout, plus an additional buffer time to ensure thread safety.

Since our comments were written at the same time, I didn’t see yours when I was writing mine. 😅

Thank you for the additional explanation!
In that case, I will adjust the execution interval for sessions in the end state to be shorter using an if condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RETRY_DEAD_THRESHOLD

I think this can be done this way

Copy link
Contributor Author

@YongGoose YongGoose Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RETRY_DEAD_THRESHOLD

I think this can be done this way

Then no changes required?

Then, would it be appropriate to create a RETRY_DEAD_THRESHOLD specifically for the end state?

or you mean this way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a corresponding RETRY_DEAD_THRESHOLD for the end state

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 997775c !

PTAL ⚡️


private final GlobalStatus[] retryCommittingStatuses = new GlobalStatus[] {GlobalStatus.CommitRetrying, GlobalStatus.Committed};

private final GlobalStatus[] rollbackingStatuses = new GlobalStatus[] {GlobalStatus.Rollbacking};
private final GlobalStatus[] committingStatuses = new GlobalStatus[] {GlobalStatus.Committing};
private final GlobalStatus[] rollbackedStatuses = new GlobalStatus[] {GlobalStatus.Rollbacked, GlobalStatus.TimeoutRollbacked};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not process transactions in other end states together?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will include the CommitFailed, RollbackFailed, TimeoutRollbackFailed, Finished, CommitRetryTimeout, and RollbackRetryTimeout states :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a9ec30f !

@YongGoose YongGoose requested a review from funky-eyes February 13, 2025 06:07
/**
* The constant ROLLBACKED_RETRY_PERIOD.
*/
String ROLLBACKED_RETRY_PERIOD = RECOVERY_PREFIX + "rollbackedRetryPeriod";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 932c51f !

Comment on lines 676 to 683

// commit retry timeout event
SessionHelper.endRollbackFailed(rollbackedSession, true, true);

//The function of this 'return' is 'continue'.
return;
}
core.doGlobalRollback(rollbackedSession, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should no longer perform rollback processing. For different states of transactions in the end phase, call the SessionHelper.endRollbacked and SessionHelper.endCommitted methods respectively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a7a9e52 !

Copy link
Contributor Author

@YongGoose YongGoose Feb 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@funky-eyes

When you have time, I would appreciate it if you could review this comment and the issue below.

@YongGoose YongGoose changed the title feature: Implement Scheduled Handling for Rollbacked and TimeoutRollbacked Transaction feature: Implement Scheduled Handling for end status Transaction Feb 15, 2025
@YongGoose YongGoose requested a review from funky-eyes February 16, 2025 07:09
private void handleEndStateSession(GlobalSession globalSession) throws TransactionException {
GlobalStatus status = globalSession.getStatus();

if (status == GlobalStatus.CommitFailed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit state is not limited to just one, and the rollback state is the same. I suggest putting the handling of various end states directly in the SessionHelper, and then calling it from handleEndStateSession. The only input parameter needed is a globalsession

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, instead of directly managing the transaction end states with conditional statements in handleEndStateSession, the responsibility is being delegated to SessionHelper.

Is my understanding correct?

Below is the implementation based on my understanding.

private void handleEndStateSession(GlobalSession globalSession) throws TransactionException {
    SessionHelper.processEndState(globalSession);
}

The state values are just examples.

public class SessionHelper {

    public static void processEndState(GlobalSession globalSession) throws TransactionException {
        GlobalStatus status = globalSession.getStatus();

        switch (status) {
            case Committed:
            case CommitSuccess:
                endCommitted(globalSession, false);
                break;

            case CommitFailed:
            case CommitPending:
                endCommitted(globalSession, true);
                break;

            case Rollbacked:
            case RollbackSuccess:
                endRollbacked(globalSession, false);
                break;

            case RollbackFailed:
            case RollbackPending:
                endRollbacked(globalSession, true);
                break;

            default:
                log.warn("Unknown transaction state: {}", status);
        }
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it should be this way, encapsulating the processing logic in the SessionHelper to hide the internal implementation from the outside.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5837ae6 !

@YongGoose
Copy link
Contributor Author

@funky-eyes

You can see in the comment that the word rollbacked has been changed to status. 113a7fd

Please take a look and provide your feedback.

Copy link
Contributor

@funky-eyes funky-eyes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the committed state should be handled within the end state task, and transactions in a failed state should not be retried. Failed transactions need to be addressed by the operations team, rather than being handled by scheduled tasks.

@YongGoose
Copy link
Contributor Author

YongGoose commented Feb 18, 2025

I believe the committed state should be handled within the end state task, and transactions in a failed state should not be retried. Failed transactions need to be addressed by the operations team, rather than being handled by scheduled tasks.

That sounds like a good point.

However, while going through your comment, a few questions came to mind:

  1. Could you explain the situations where retrying the endStatus session would be necessary?
  2. If that approach is taken, does it still fail to resolve the issue mentioned?
  1. Lastly, was this PR ultimately decided to simply retry the end status session, or was there another approach considered?

I completely agree that failed transactions should not be retried but should instead be handled by the operations team. However, I’m curious about how the issue is addressed, so I wanted to ask.

@YongGoose YongGoose requested a review from funky-eyes February 18, 2025 01:54
@funky-eyes
Copy link
Contributor

funky-eyes commented Feb 18, 2025

For the reasons I mentioned earlier:

  1. IO exceptions, such as disk or network issues
  2. Process crashes (whether it's the seata-server or the database, etc.)

In the above points or some examples I did not mention, transactions are not deleted in the end state, leaving residual data.

Copy link
Contributor Author

@YongGoose YongGoose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@funky-eyes

Please take a look :)
If everything looks good, please proceed with committing the suggestion.

@YongGoose
Copy link
Contributor Author

@funky-eyes

Is there anything else left to do?

Copy link
Contributor

@funky-eyes funky-eyes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@funky-eyes funky-eyes merged commit 46c9b63 into apache:2.x Feb 18, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module/server server module type: feature Category issues or prs related to feature request.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TimeoutRollbacked and Rollbacked residues
3 participants