Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration fixes #79

Closed
wants to merge 144 commits into from
Closed

Migration fixes #79

wants to merge 144 commits into from

Conversation

Lun4m
Copy link
Collaborator

@Lun4m Lun4m commented Mar 17, 2025

This PR contains a lot of stuff (probably too much!):

  1. I reworked a bit the code organization in the migration directory:

    • Now dump and import packages define their own set of tables/data they act on
    • Indices are dropped and created in a separate package, which I found easier to manage in case something goes wrong during the import step
    • Switched to zerolog
  2. Added labels for kdvh and kvalobs data, so it's easier to delete specific timeseries in case we need to migrate again.

  3. Added Frost quality codes (which are not exhaustive, but what can I do?)

  4. Added handling of restricted data for migrations

  5. Renamed SQL files so that we can automate database setup without having to update the file names every time

  6. Significantly improved the just file, so it's a bit less of a pain to use

  7. Figured out we have to set GOMEMLIMIT (or should I say GOMEM*E*LIMIT, I love golang) to avoid OOM when importing data

I don't expect you to check everything, but if you quickly go through the changes and spot something weird, please let me know!

Edit: I need to add an ansible script to disable/enable the replication before/after the migration

@Lun4m Lun4m force-pushed the migration_fixes branch from b0e845c to ff845f2 Compare March 17, 2025 15:21
@Lun4m Lun4m marked this pull request as ready for review March 17, 2025 16:11
Copy link
Member

@intarga intarga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: remaining work on migrations is just testing restricted data?

We should try to be better about structuring PR's. Migrations are already hard to review because the code is not heavily trafficked, and when semantic changes are buried under refactoring + tooling changes, it becomes even harder to separate things and identify what needs attention. This ideally should have been a series of 3+ PRs, though I appreciate that may have been hard to do as you were fixing issues as they came up.

@Lun4m
Copy link
Collaborator Author

Lun4m commented Mar 18, 2025

To be clear: remaining work on migrations is just testing restricted data?

Yes, which I actually am doing right now.
There are two other problems, but I would not qualify them as pressing issues:

  • Importing data requires connections to both KDVH and Kvalobs, so we definitely need to dump all the auxiliary metadata
  • There are other tables in KDVH that I haven't touched (I've left a TODO somewhere), but I feel like we want to dump everything we can

We should try to be better about structuring PR's. Migrations are already hard to review because the code is not heavily trafficked, and when semantic changes are buried under refactoring + tooling changes, it becomes even harder to separate things and identify what needs attention. This ideally should have been a series of 3+ PRs, though I appreciate that may have been hard to do as you were fixing issues as they came up.

I agree, a 5000+ changes PR is not easily reviewable. I can try to break this down into smaller PRs if you are okay with it!

@intarga
Copy link
Member

intarga commented Mar 18, 2025

I can try to break this down into smaller PRs if you are okay with it!

Not sure if it would be worth the effort, I meant the comment more as something to keep in mind in the future, structuring work with PRs a bit in mind, and filing them more frequently

Lun4m added 7 commits March 18, 2025 14:08
…only in legacy.data
@intarga intarga mentioned this pull request Mar 20, 2025
@intarga intarga linked an issue Mar 20, 2025 that may be closed by this pull request
@Lun4m Lun4m changed the base branch from trunk to obsinn_label_unique_constraint March 20, 2025 14:15
@Lun4m Lun4m changed the base branch from obsinn_label_unique_constraint to trunk March 20, 2025 14:15
@Lun4m
Copy link
Collaborator Author

Lun4m commented Mar 20, 2025

Closing because somehow Github is refusing to merge even though the branch is already rebased on trunk 😕

@Lun4m Lun4m closed this Mar 20, 2025
@Lun4m Lun4m mentioned this pull request Mar 20, 2025
@Lun4m Lun4m deleted the migration_fixes branch March 20, 2025 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trim down kvalobs QC info
3 participants