Migration fixes #79

Lun4m · 2025-03-17T13:40:11Z

This PR contains a lot of stuff (probably too much!):

I reworked a bit the code organization in the migration directory:
- Now dump and import packages define their own set of tables/data they act on
- Indices are dropped and created in a separate package, which I found easier to manage in case something goes wrong during the import step
- Switched to zerolog
Added labels for kdvh and kvalobs data, so it's easier to delete specific timeseries in case we need to migrate again.
Added Frost quality codes (which are not exhaustive, but what can I do?)
Added handling of restricted data for migrations
Renamed SQL files so that we can automate database setup without having to update the file names every time
Significantly improved the just file, so it's a bit less of a pain to use
Figured out we have to set GOMEMLIMIT (or should I say GOMEM*E*LIMIT, I love golang) to avoid OOM when importing data

I don't expect you to check everything, but if you quickly go through the changes and spot something weird, please let me know!

Edit: I need to add an ansible script to disable/enable the replication before/after the migration

…ts in LARD

This reverts commit 92e1ac0.

intarga

To be clear: remaining work on migrations is just testing restricted data?

We should try to be better about structuring PR's. Migrations are already hard to review because the code is not heavily trafficked, and when semantic changes are buried under refactoring + tooling changes, it becomes even harder to separate things and identify what needs attention. This ideally should have been a series of 3+ PRs, though I appreciate that may have been hard to do as you were fixing issues as they came up.

ansible/roles/pg/templates/repmgr.j2

db/004-legacy.sql

ingestion/src/kvkafka/xml_types.rs

migrations/tests/files/T_MDATA/elements.txt

Lun4m · 2025-03-18T12:38:06Z

To be clear: remaining work on migrations is just testing restricted data?

Yes, which I actually am doing right now.
There are two other problems, but I would not qualify them as pressing issues:

Importing data requires connections to both KDVH and Kvalobs, so we definitely need to dump all the auxiliary metadata
There are other tables in KDVH that I haven't touched (I've left a TODO somewhere), but I feel like we want to dump everything we can

We should try to be better about structuring PR's. Migrations are already hard to review because the code is not heavily trafficked, and when semantic changes are buried under refactoring + tooling changes, it becomes even harder to separate things and identify what needs attention. This ideally should have been a series of 3+ PRs, though I appreciate that may have been hard to do as you were fixing issues as they came up.

I agree, a 5000+ changes PR is not easily reviewable. I can try to break this down into smaller PRs if you are okay with it!

intarga · 2025-03-18T12:46:39Z

I can try to break this down into smaller PRs if you are okay with it!

Not sure if it would be worth the effort, I meant the comment more as something to keep in mind in the future, structuring work with PRs a bit in mind, and filing them more frequently

…eparately

…only in legacy.data

…ature

Lun4m · 2025-03-20T14:21:25Z

Closing because somehow Github is refusing to merge even though the branch is already rebased on trunk 😕

Lun4m added 30 commits March 12, 2025 13:54

Don't require env variables when printing help messages

c595de8

Fix dumpByYear

1538386

Fix missing error message

105b35d

Fix elem table queries

3734eae

Remove comment from main

bb25a9f

Skip dump if station dir already exists

a07a0b8

Add rudimentary report script for dumps

2aae7c6

Update fetchYearRange

392ecbe

Fix another typo

8287fa5

Log error for fetchYearRange

69a738b

Add option to print report for single table

8a5f151

Fix another bug in dumpByYear

34de5c3

Fix wrong query for t_homogen_month

2e69932

Don't use dumpByYear for T_SECOND, T_MINUTE, T_10MINUTE

9e1b66d

Comment out unavailable tables

2bcb150

Directly insert new timeseries instead of checking if it already exis…

40c8f79

…ts in LARD

Add elem table for T_METARDATA

17ffb8e

Comment out dumpByYear

6a647a0

Update date_diff

e94129b

Don't log error if no rows are returned

6829b35

Add log info for timespan

ecfaf08

Progress bar over stations instead of elements

a566f92

Catch sigint

d167167

Update logging

d5b9207

Fix bar size

563c820

Make timing consistent

08525a0

Need to pass parameters to query

e35838a

Revert to progress bar over elements

7ebaa43

Comment out query check

0739277

Revert "Revert to progress bar over elements"

2c003d9

This reverts commit 92e1ac0.

Lun4m force-pushed the migration_fixes branch from b0e845c to ff845f2 Compare March 17, 2025 15:21

Lun4m added 5 commits March 17, 2025 16:25

Fix function doc string

bb1a268

Update docs

2b35489

Update list package

dabc3a6

Remove comment

532d27f

Those should not use the logger

Loading
Loading status checks…

a8fdc58

Lun4m marked this pull request as ready for review March 17, 2025 16:11

Lun4m requested review from intarga and jo-asplin-met-no March 17, 2025 16:11

intarga reviewed Mar 18, 2025

View reviewed changes

Remove empty files

Loading
Loading status checks…

192bb72

Lun4m added 7 commits March 18, 2025 14:08

Uncomment use_replication_slots

Loading
Loading status checks…

e8eed31

Add ability to drop/create indices for open and restricted database s…

Loading
Loading status checks…

cdba8b2

…eparately

Add missing new lines

Loading
Loading status checks…

2ed17a2

Select only the columns we need

Loading
Loading status checks…

7fa6d26

Make runtime parameters stick during index operations

7748a29

Remove references to public.data and insert all numeric observations …

Loading
Loading status checks…

b1cb756

…only in legacy.data

Remove progress bar length hack

Loading
Loading status checks…

ca12a04

intarga approved these changes Mar 20, 2025

View reviewed changes

intarga mentioned this pull request Mar 20, 2025

Trim down kvalobs QC info #53

Closed

intarga linked an issue Mar 20, 2025 that may be closed by this pull request

Trim down kvalobs QC info #53

Closed

Remove KDVH fromtime/totime caching since those values have dubious n…

Loading
Loading status checks…

106cef7

…ature

Lun4m changed the base branch from trunk to obsinn_label_unique_constraint March 20, 2025 14:15

Lun4m changed the base branch from obsinn_label_unique_constraint to trunk March 20, 2025 14:15

Lun4m closed this Mar 20, 2025

Lun4m mentioned this pull request Mar 20, 2025

Migration fixes #81

Merged

Lun4m deleted the migration_fixes branch March 20, 2025 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migration fixes #79

Migration fixes #79

Lun4m commented Mar 17, 2025 •

edited

Loading

intarga left a comment

Lun4m commented Mar 18, 2025 •

edited

Loading

intarga commented Mar 18, 2025

Lun4m commented Mar 20, 2025

Migration fixes #79

Migration fixes #79

Conversation

Lun4m commented Mar 17, 2025 • edited Loading

intarga left a comment

Choose a reason for hiding this comment

Lun4m commented Mar 18, 2025 • edited Loading

intarga commented Mar 18, 2025

Lun4m commented Mar 20, 2025

Lun4m commented Mar 17, 2025 •

edited

Loading

Lun4m commented Mar 18, 2025 •

edited

Loading