733 review config newest #182

donadviser · 2025-03-05T18:17:24Z

Pull Request Title

Review config handling: Split config.json and implement merge utility.

Summary

Add your summary here - keep it brief, to the point, and in plain English.
This PR splits the existing config.json into two files: config_user.json and config_constants.json. A new utility script is added to merge these files back into config.json. Additionally, updates are made to copy_script_and_config.py and helper_functions.py to handle the merging, and logger setup is introduced in `mbs_results/init.py for use in the code pipeline.

Type of Change

Checklists

This pull request meets the following requirements:

Creator Checklist

Installable with all dependencies recorded
Runs without error
Follows PEP8 and project-specific conventions
Appropriate use of comments, for example, no descriptive comments
Functions documented using Numpy style docstrings
Assumptions and decisions log considered and updated if appropriate
Unit tests have been updated to cover essential functionality for a reasonable range of inputs and conditions
Other forms of testing such as end-to-end and user-interface testing have been considered and updated as required

If you feel some of these conditions do not apply for this pull request, please
add a comment to explain why.

Reviewer Checklist

Test suite passes (locally as a minimum)
Peer reviewed with review recorded

Additional Information

Please provide any additional information or context that would help the reviewer understand the changes in this pull request.

Related Issues

Link any related issues or pull requests here.

… them to config.json

NathanKelly-ONS

Hey Derrick, good work so far! I definitely can see how this is going to be useful and simplify the configs quite a lot. I have a few suggestions and a couple of questions. I think some aspects will be useful to get people's input at the show-and-tell, so looking forward to that.

The only thing I haven't been able to test yet is the logger functionality, I'm not sure why I can't seem to get it working, but I'll keep testing - it could be a me problem.

One more general thing is I think it makes sense to put the two configs into a separate folder. We'll need to check it still works as expected and doesn't break the pipeline since the filepaths will be slightly different, but it will allow us to have a README for that specific folder which will be useful - otherwise the config README that I'm working on won't be particularly visible.

mbs_results/__init__.py

mbs_results/config_constants.json

mbs_results/utilities/merge_two_config_files.py

NathanKelly-ONS · 2025-03-07T15:24:11Z

mbs_results/utilities/merge_two_config_files.py

+import json
+
+
+def merge_two_config_files(


Two points about this function more generally:

I think it's worthwhile calling this function as part of the load_config function and testing whether it works - the new configs currently aren't used anywhere in the script as far as I can tell. I can't currently tell whether if we replace the current config with the combined config, whether this will work in place without any additional changes - but I think that would be ideal, so it's a good idea to test this to see whether it works.

Also, relevant to the above, this function is essentially loading the configs and merging them. I think it makes sense to separate this out - so perhaps update the load_config function to read in both configs, then once the configs have been read in, pass the actual variables into merge_two_config_files, then strip out the with open(...) parts of merge_two_config_files, so the merge function is literally just merging and the load function is literally just loading.

Yes, I have updated the function merge_two_config_files to perform the merging and return the config object. I have also updated the create_testing_config in the helper_functions.py to call it.

I will retain the option of the creation of config.json from merging of config_user and config_dev. This keeping the current structure of running and testing of the pipeline.

(Related to my previous comment) I think the load_config function is essentially redundant with this function, because it's only designed to read in a single config, which we won't have any more once we move towards having a user_config and dev_config.

I also don't believe that we're permanently writing out the combined config anywhere (by design, as we discussed yesterday) so there's no way for the load_config function to actually pick up the correct config when you're running the pipeline.

Whilst I think this function would do the job, I don't think it's best practice to have a function doing two things like this is (loading AND merging) - it's generally better for each function to do a single thing.

So I think the best thing to do is to update load_config so it takes two filepaths, and returns dev_config and user_config, and then you can pass those objects into merge_two_configs. That way, load_config is literally just loading the configs, and merge_two_configs is literally just merging two configs.

Also, this will mean that the new configs are actually being used in main too, so we can make sure the pipeline still runs as expected and it's passing the integration tests.

Also, sorry I missed this in my first review: could you add a unit test for this too please?

I have updated this part now. It's working fine as demoed in the show and tell earlier today. I have also added test_merge_two_config_files.py to test the merging of the two config files

mbs_results/utilities/merge_two_config_files.py

tests/helper_functions.py

NathanKelly-ONS · 2025-03-07T15:38:05Z

mbs_results/config_user.json

+    "threshold_filepath":"",
+
+    "back_data_type":"type",
+    "imputation_marker_col":"imputation_flags_adjustedresponse",


One more thing - do we think this makes sense to have in the user config? Or do we think it belongs with the other column names in the constants_config? I don't have a strong opinion either way, I'm just thinking that it lines up nicely with the various other column names in the constants config.

If there's a particular reason why a user might need to change this column more so than any other columns though, then I'm happy for it to stay here.

You're correct. back_data_type and imputation_marker_col should be in constants_config. As part of this review, constants_config has been changed to config_dev in the interest of consistency with the other one, config_user. I have move both to the config_dev

NathanKelly-ONS

Hey Derrick, this looks a lot better, thanks for making those changes!

There's a couple of things I think still need to be changed. I don't believe, currently, that the new configs are being used in the pipeline at all - it's still using config.json, so it's difficult to tell if the pipeline/the integration tests will run on the new configs. Could you make sure the new configs are being run in main? (Probably the easiest way is to amend the load_config function as per my comment below, as that function is already being called in main), otherwise you'll need to make a change directly to main.py.

Aside from that, I think it'd be worthwhile creating a unit test for merge_two_config_files, just as good practice and to make sure we're maximising test coverage.

I'll be happy to merge after that, once I've seen it works as expected and passes the integration tests when incorporated into main :)

NathanKelly-ONS · 2025-03-10T17:38:52Z

mbs_results/utilities/merge_two_config_files.py

+import json
+
+
+def merge_two_config_files(


(Related to my previous comment) I think the load_config function is essentially redundant with this function, because it's only designed to read in a single config, which we won't have any more once we move towards having a user_config and dev_config.

I also don't believe that we're permanently writing out the combined config anywhere (by design, as we discussed yesterday) so there's no way for the load_config function to actually pick up the correct config when you're running the pipeline.

Whilst I think this function would do the job, I don't think it's best practice to have a function doing two things like this is (loading AND merging) - it's generally better for each function to do a single thing.

So I think the best thing to do is to update load_config so it takes two filepaths, and returns dev_config and user_config, and then you can pass those objects into merge_two_configs. That way, load_config is literally just loading the configs, and merge_two_configs is literally just merging two configs.

Also, this will mean that the new configs are actually being used in main too, so we can make sure the pipeline still runs as expected and it's passing the integration tests.

Also, sorry I missed this in my first review: could you add a unit test for this too please?

NathanKelly-ONS · 2025-03-11T17:55:23Z

mbs_results/configs/config_dev.json

@@ -1,32 +1,9 @@
 {
    "platform" : "network",


I think we discussed with Jordan earlier (not sure why I can't tag him) that having platform set to network by default is likely going to cause issues - we don't want the user to have to change (or, ideally, even see) the dev config. But users will be running the pipeline on DAP, so this should be set to s3. I'm not sure if we want to set this to s3 by default or potentially explore other options (otherwise, the devs will need to change this from s3 to network every time they want to test run the pipeline locally).

Might be worthwhile asking the team to see what they think?

I see what you mean here. I have updated the platform to s3 in config_dev.json. This will now be the default value for the users. Dev team can switch between s3 and network.

In addition, platform": "network has been added to the test_config dictionary in test_main.py. This will ensure that the test is carried out within the network option.

NathanKelly-ONS · 2025-03-12T13:05:33Z

mbs_results/utilities/inputs.py

+    if config_user_dict is not None:
+        config.update(config_user_dict)
+        logger.info("config dictionary updated with config user dictionary")
+    config["mbs_results_path"] = config["folder_path"] + config["mbs_file_name"]


What's this "mbs_results_path" parameter? I don't think this is used anywhere in the pipeline so I don't think we need to create it here - unless there's a reason why you're creating it?

The mbs_results_path key was their in the original load_config() in inputs.py. So, I left it. However, as you mentioned, I have also searched and I couldn't find anywhere it was used. It is correct to tag it redundant, hence, can be removed. Should I remove it?

…tal/monthly-business-survey-results into 733-review-config-newest

…-survey-results into 733-review-config-newest

…tal/monthly-business-survey-results into 733-review-config-newest

donadviser added 2 commits March 5, 2025 13:10

add logger setup in __init__.py for consistent logging across the module

c01948e

split config.json into two, create merge_two_config_files.py to merge…

dd46266

… them to config.json

donadviser requested review from robertswh, Jday7879 and giuliag92 as code owners March 5, 2025 18:17

fix formatting issues identified by pre-commit

1944854

NathanKelly-ONS reviewed Mar 7, 2025

View reviewed changes

donadviser added 2 commits March 10, 2025 13:24

update files and code for review config

e3bee57

apply pre-commit fixes

a01a268

NathanKelly-ONS reviewed Mar 10, 2025

View reviewed changes

donadviser and others added 4 commits March 11, 2025 09:47

add pytest for the created logger in mbs_results/__init__.py

affb2f2

update slitting config files and creating pytest for new functions

dfef800

apply pre-commit fixes

46d8b42

update test_config with key bucket

21b497e

NathanKelly-ONS reviewed Mar 11, 2025

View reviewed changes

AntonZogk mentioned this pull request Mar 11, 2025

Se wrapper fixes #178

Merged

15 tasks

update test_config with key bucket

6d4036e

NathanKelly-ONS reviewed Mar 12, 2025

View reviewed changes

donadviser and others added 8 commits March 13, 2025 11:14

add mbs_results.configs in setup.cfg

9ee0608

Merge branch '733-review-config-newest' of https://github.com/ONSdigi…

0457760

…tal/monthly-business-survey-results into 733-review-config-newest

Create __init__.py

2f534a4

Update MANIFEST.in

904aef3

Update MANIFEST.in

6e1de8c

remove redundant mbs_results_path key from config dictionary

2352b4e

Merge branch 'main' of https://github.com/ONSdigital/monthly-business…

53e38e8

…-survey-results into 733-review-config-newest

Merge branch '733-review-config-newest' of https://github.com/ONSdigi…

dfa560a

…tal/monthly-business-survey-results into 733-review-config-newest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

733 review config newest #182

733 review config newest #182

donadviser commented Mar 5, 2025 •

edited

Loading

NathanKelly-ONS left a comment •

edited

Loading

NathanKelly-ONS Mar 7, 2025

donadviser Mar 10, 2025

NathanKelly-ONS Mar 10, 2025 •

edited

Loading

donadviser Mar 11, 2025

NathanKelly-ONS Mar 7, 2025

donadviser Mar 10, 2025

NathanKelly-ONS left a comment

NathanKelly-ONS Mar 10, 2025 •

edited

Loading

NathanKelly-ONS Mar 11, 2025

donadviser Mar 12, 2025

NathanKelly-ONS Mar 12, 2025

donadviser Mar 13, 2025

		import json


		def merge_two_config_files(

		import json


		def merge_two_config_files(

733 review config newest #182

Are you sure you want to change the base?

733 review config newest #182

Conversation

donadviser commented Mar 5, 2025 • edited Loading

Pull Request Title

Summary

Type of Change

Checklists

Creator Checklist

Reviewer Checklist

Additional Information

Related Issues

NathanKelly-ONS left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NathanKelly-ONS Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NathanKelly-ONS left a comment

Choose a reason for hiding this comment

NathanKelly-ONS Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

donadviser commented Mar 5, 2025 •

edited

Loading

NathanKelly-ONS left a comment •

edited

Loading

NathanKelly-ONS Mar 10, 2025 •

edited

Loading

NathanKelly-ONS Mar 10, 2025 •

edited

Loading