-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prerelease documentation #77
base: trunk
Are you sure you want to change the base?
Conversation
60d7a91
to
5072955
Compare
e8658b1
to
112644c
Compare
We no longer have a need for it and aren't effectively maintaining it
api seems a bit too generic of a name, especially since ingestion technically contains an API. The name egress better reflects what the api is for
b584cc0
to
603c168
Compare
|
||
Lard is built around a Postgres database with two services that interact with it, one focused on ingestion, and one providing an API to access the data. | ||
|
||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the SVGs have a solid white background? They are unreadable in dark mode 😢 And maybe this first one could be horizontal, (it takes a lot of screen space)? Otherwise we could condense them in a single image (left, single node vs right, horizontal scaling)
|
||
Here, one node takes responsiblity for ingestion, using [Postgres replication](https://www.postgresql.org/docs/current/high-availability.html) to sync the others. Meanwhile, the others focus on serving read-only requests from the API service, allowing read throughput to scale linearly with the number of replicas. Replicas are also able to take over from the primary in case of outages, minimising downtime. | ||
|
||
In addition to read throughput, previous experience with database systems at Met has taught us that as our dataset grows (think past 1 billion observations) write throughput begins to slow to a problematic degree. This happens because the [indexes](https://www.postgresql.org/docs/current/indexes.html) (structures needed speed up queries on large tables) become resource intensive to maintain as they grow larger. Particularly the BTree indices we use to represent time need to remain a balanced, but as we always add data on one side of the tree (the present is one extreme of the time range our dataset covers), we are constantly unbalancing it, and the expense of balancing a tree scales with its size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to read throughput, previous experience with database systems at Met has taught us that as our dataset grows (think past 1 billion observations) write throughput begins to slow to a problematic degree. This happens because the [indexes](https://www.postgresql.org/docs/current/indexes.html) (structures needed speed up queries on large tables) become resource intensive to maintain as they grow larger. Particularly the BTree indices we use to represent time need to remain a balanced, but as we always add data on one side of the tree (the present is one extreme of the time range our dataset covers), we are constantly unbalancing it, and the expense of balancing a tree scales with its size. | |
In addition to read throughput, previous experience with database systems at MET has taught us that as our dataset grows (think past 1 billion observations) write throughput begins to slow down to a problematic degree. This happens because the [indexes](https://www.postgresql.org/docs/current/indexes.html) (structures needed to speed up queries on large tables) become resource intensive to maintain as they grow larger. Particularly the BTree indices we use to represent time need to remain balanced, but as we always add data on one side of the tree (the present is one extreme of the time range our dataset covers), we are constantly unbalancing it, and the expense of balancing a tree scales with its size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slow down? And should it be Met
or MET
?
@@ -0,0 +1,64 @@ | |||
# Database | |||
|
|||
The database is defined by a set of schemas defined in [/db](/db) each prefixed by a number defining the order in which they should be applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The database is defined by a set of schemas defined in [/db](/db) each prefixed by a number defining the order in which they should be applied. | |
The database is defined by a set of schemas found in [/db](/db), each prefixed by a number representing the order in which they should be applied. |
|
||
## Kafka and legacy | ||
|
||
Right now as we aren't ready with confident QC, we are ingesting from KvKafka into the `legacy.data` table that has a different structure. As a result, that ingestor has it's own equivalent to `Datum` and `insert_data` that match `legacy.data`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now as we aren't ready with confident QC, we are ingesting from KvKafka into the `legacy.data` table that has a different structure. As a result, that ingestor has it's own equivalent to `Datum` and `insert_data` that match `legacy.data`. | |
Right now as we aren't ready with confident QC, we are ingesting from KvKafka into the `legacy.data` table that has a different structure. As a result, that ingestor has its own equivalent to `Datum` and `insert_data` that match `legacy.data`. |
```rust | ||
async fn handle_kldata( | ||
State(pools): State<DbPools>, | ||
State(param_conversions): State<ParamConversions>, | ||
State(permit_table): State<Arc<RwLock<(ParamPermitTable, StationPermitTable)>>>, | ||
State(rove_connector): State<Arc<rove_connector::Connector>>, | ||
State(qc_pipelines): State<Arc<HashMap<(i32, RelativeDuration), rove::Pipeline>>>, | ||
body: String, | ||
) -> Json<KldataResp> { | ||
metrics::counter!(KLDATA_MESSAGES_RECEIVED).increment(1); | ||
|
||
let result: Result<usize, Error> = async { | ||
let mut open_conn = pools.open.get().await?; | ||
let mut restricted_conn = pools.restricted.get().await?; | ||
|
||
let (message_id, obsinn_chunk) = parse_kldata(&body, param_conversions.clone())?; | ||
|
||
let (mut open_data, mut restricted_data) = filter_and_label_kldata( | ||
obsinn_chunk, | ||
&mut open_conn, | ||
&mut restricted_conn, | ||
param_conversions, | ||
permit_table, | ||
) | ||
.await?; | ||
|
||
qc_and_insert_data( | ||
&mut open_data, | ||
&rove_connector, | ||
&qc_pipelines, | ||
&mut open_conn, | ||
) | ||
.await?; | ||
qc_and_insert_data( | ||
&mut restricted_data, | ||
&rove_connector, | ||
&qc_pipelines, | ||
&mut restricted_conn, | ||
) | ||
.await?; | ||
|
||
Ok(message_id) | ||
} | ||
.await; | ||
|
||
match result { | ||
Ok(message_id) => Json(KldataResp { | ||
message: "".into(), | ||
message_id, | ||
res: 0, | ||
retry: false, | ||
}), | ||
Err(e) => { | ||
metrics::counter!(KLDATA_FAILURES).increment(1); | ||
error!("failed to ingest kldata message: {}, body: {}", e, body); | ||
// TODO: log errors? | ||
Json(KldataResp { | ||
message: e.to_string(), | ||
message_id: 0, // TODO: some clever way to get the message id still if possible? | ||
res: 1, | ||
retry: !matches!(e, Error::Parse(_)), | ||
}) | ||
} | ||
} | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just be a link to the lines in the file? We have to fix it here too when the underlying code changes? But I also kinda like seeing the code directly in the README 🤔
// with them | ||
let client = reqwest::Client::new(); | ||
|
||
// Use helper fucntion `ingest_data` to send a request with the obsinn message generated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Use helper fucntion `ingest_data` to send a request with the obsinn message generated | |
// Use helper function `ingest_data` to send a request with the obsinn message generated |
} | ||
} | ||
``` | ||
Here we expect the test_result to complete first, as the server tasks should run indefinitely unless they crash or until we tell them to shut down, so we panic and fail the test if this does not happen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we expect the test_result to complete first, as the server tasks should run indefinitely unless they crash or until we tell them to shut down, so we panic and fail the test if this does not happen. | |
Here we expect the `test_result` to complete first, as the server tasks should run indefinitely unless they crash or until we tell them to shut down, so we panic and fail the test if this does not happen. |
|
||
In the handler for the test closure, we make sure to clean up the database before moving on, so the next test can start on a blank slate. We make sure to catch any panics so we will perform this cleanup even if the closure fails. | ||
|
||
It's worth noting that e2e_test_wrapper does not set up a postgres instance, just connects to it, so a postgres instance must be available to run these tests. To help with this, we've set up a justfile to use instead of running `cargo test` directly, which sets up a postgres instance in a docker container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth noting that e2e_test_wrapper does not set up a postgres instance, just connects to it, so a postgres instance must be available to run these tests. To help with this, we've set up a justfile to use instead of running `cargo test` directly, which sets up a postgres instance in a docker container. | |
It's worth noting that `e2e_test_wrapper` does not set up a postgres instance, just connects to it, so a postgres instance must be available to run these tests. To help with this, we've set up a justfile to use instead of running `cargo test` directly, which sets up a postgres instance in a docker container. |
Maybe we can also remove the |
This PR includes
README
for the projectapi
toegress
asapi
is a rather redundant and non-descriptive name, especially sinceingestion
technically contains an API. I feel the nameegress
better reflects its purpose.fake_data_generator
as it is disused and not really maintained