Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove uncached reader in eviction controller #1044

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Arvindthiru
Copy link
Contributor

@Arvindthiru Arvindthiru commented Feb 14, 2025

Description of your changes

Fixes #

I have:

  • Run make reviewable to ensure this PR is ready for review.

How has this code been tested

Special notes for your reviewer

Removed the uncached reader from eviction controller, was originally added to account for cache miss when attempting to get PDB in the controller, E2Es were failing

Steps taken to remedy issue,

  • Moved the creation of PDB to BeforeAll block to give more time for cache to be populated
  • Added FlakeAttempts decorator to tests which run in parallel and requires the creation of PDB to account for cache miss in the controller

These steps aim to optimistically ensure that the E2Es are retried even on a cache and hopefully the cache is populated on the retry to successfully get PDB and block eviction

@michaelawyu
Copy link
Contributor

Hi Arvind! To summarize some of the approaches we have gone through (specific to the test part):

a) use the FlakeAttempts=N decorator on the container node; optionally we could use a random suffix on the resources we create to simplify the whole flow

Please be aware of how Ginkgo handles the `FlakeAttempts` decorator on ordered nodes:

```
In brief, Ginkgo generally guarantees that BeforeAll and AfterAll node closures only run once - but FlakeAttempts can modify this behavior. If a failure occurs within a subject node in an Ordered container (i.e. in an It) then Ginkgo will rerun that It but not the BeforeAll or AfterAll. However, if a failure occurs in a BeforeAll, Ginkgo will immediately run the AfterAll (to clean up) and then rerun the BeforeAll.
```

b) use the wait (k8s.io/apimachinery/pkg/util/wait) package (there are similar alternatives) to achieve the nested Eventually block

As mentioned in the doc, Ginkgo will handle the Eventually func calls correctly so with right timeout/interval settings this option should work just fine. It's also possible to control the timeout dynamically if we prefer not setting the timeout/interval explicitly (or let the inner block decide when to give up -> this would be functionally very similar to the `FlakeAttempts` decorator).

1 similar comment
@michaelawyu
Copy link
Contributor

Hi Arvind! To summarize some of the approaches we have gone through (specific to the test part):

a) use the FlakeAttempts=N decorator on the container node; optionally we could use a random suffix on the resources we create to simplify the whole flow

Please be aware of how Ginkgo handles the `FlakeAttempts` decorator on ordered nodes:

```
In brief, Ginkgo generally guarantees that BeforeAll and AfterAll node closures only run once - but FlakeAttempts can modify this behavior. If a failure occurs within a subject node in an Ordered container (i.e. in an It) then Ginkgo will rerun that It but not the BeforeAll or AfterAll. However, if a failure occurs in a BeforeAll, Ginkgo will immediately run the AfterAll (to clean up) and then rerun the BeforeAll.
```

b) use the wait (k8s.io/apimachinery/pkg/util/wait) package (there are similar alternatives) to achieve the nested Eventually block

As mentioned in the doc, Ginkgo will handle the Eventually func calls correctly so with right timeout/interval settings this option should work just fine. It's also possible to control the timeout dynamically if we prefer not setting the timeout/interval explicitly (or let the inner block decide when to give up -> this would be functionally very similar to the `FlakeAttempts` decorator).

@Arvindthiru Arvindthiru force-pushed the removeUncachedReaderEviction branch from 48386c2 to e948d7e Compare February 18, 2025 23:18
@Arvindthiru Arvindthiru force-pushed the removeUncachedReaderEviction branch from e948d7e to 8cd9ee9 Compare February 18, 2025 23:19
@Arvindthiru Arvindthiru marked this pull request as ready for review February 19, 2025 00:18
@Arvindthiru Arvindthiru changed the title test: remove uncached reader in eviction controller fix: remove uncached reader in eviction controller Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants