resource watcher, log watcher, libcloud, object_storage #31

vlerkin · 2024-09-10T19:18:29Z

As a separate thread, resource watcher was added. When a new pod with specific labels (job labels) are aded to the cluster and are in a running state, another watcher starts to follow it's logs and write it to a file stored on a managing pod with scrapyd. Then all files are uploaded to an object storage of a preferred provider, which is implemented via Apache Libcloud library. Persistent Volume and PVC are commented out in the Kubernetes yaml file and were not tested in the cloud, but it can be used to add them to the managing pod to store log files in a persistent volume.

wvengen

Wow, very very nice to see a not-gigantic, mostly understandable piece of code already! I'll need to read it a few times to fully grasp what is happening, and what we'd want for the project, but I've already added some comments that may help shape the direction.

kubernetes.yaml

scrapyd_k8s/log_handler.py

scrapyd_k8s/object_storage/libcloud_driver.py

kubernetes.yaml

requirements.txt

scrapyd_k8s.sample-k8s.conf

scrapyd_k8s/api.py

scrapyd_k8s/log_handler.py

scrapyd_k8s/api.py

wvengen

I've reviewed it a bit more. Still happy that what is actually happening, is not that complex to understand. That is a sign of clear coding, well done!

I think the next steps would be:

Move more job logging related code to its own module (incl. objectstorage)
Polish api.py related to this change
Consistently use joblogs terminology (or whatever we finally settle on)
Find a clean solution for storage configuration
Error handling and/or logging
The other smaller security- and performance-related things
Make sure CI fully passes

Regarding integration tests, that is up to you where and if you'd like to include it in this PR.

wvengen · 2024-09-13T07:56:34Z

And regarding naming: I would propose to call this feature joblogs (or maybe job logging).

I think that ideally we'd move all files related to this feature to a subdirectory joblogs, which would include object storage (if we would start using object storage in another component as well, then we would move it to its own subdirector). In that way, all things related to this feature can be found there, and as a developer that is working on the code, you can easily ignore it if you don't want to dive into this.

wvengen

Thanks for the changes. I think it's getting into a good shape, well done! Some questions pending, but nothing that big, I think.

One general thought is that most of the code uses classes, with exception from the http handlers, and utility functions. This is a larger piece of functionality, that, from the outside, I would think would fit the use of a class well.

And thinking more generally about procedural vs. (semi) object-oriented, I think the latter has benefits in code re-use. For example, consider the possibility that scrapyd-k8s would somehow evolve to talk to different clusters at the same time. Then having these global variables with state, would prohibit the current code to be used. When using classes, this state (and setting up connections to k8s etc.) naturally happens within the class instance context, and it would be much easier to get this working. Now this is just a thought exercise, but would nudge me towards preferring the use of a class for job log handling. And I also realize, that when a Docker-implementation would appear, this would naturally be configured by setting a joblog implementation, which would be a class as well.

So if you manage, would you be able to use a class for the Kubernetes joblogs implementation?

kubernetes.yaml

scrapyd_k8s.sample-k8s.conf

scrapyd_k8s/config.py

scrapyd_k8s/object_storage/libcloud_driver.py

scrapyd_k8s/log_handler.py

scrapyd_k8s/object_storage/libcloud_driver.py

vlerkin · 2024-10-18T08:54:35Z

Thank you for your feedback!

So if you manage, would you be able to use a class for the Kubernetes joblogs implementation?

I do agree with your reasoning and I will work on that, also a great moment for me to learn a bit more about design patterns that may be beneficial for this case.

vlerkin · 2024-10-28T11:50:13Z

I've addressed the comments and also changed the structure to class implementation. The feature contains many methods and lines of code, so I decided to add docstrings to give a short overview on what is going on there. I think it takes less energy to simply read a description and then decide if you want to enable the feature than to read through the entire code.

wvengen

Nice, changes look good!
Great you added docstrings, very helpful. I'm not fully up-to-speed with Python docstring styles, perhaps it's good to check this with others in the team?

scrapyd_k8s/object_storage/libcloud_driver.py

scrapyd_k8s/__main__.py

…ting instance of the Config class

wvengen

Super! I think this PR is almost finished.
Ideally, this feature would also be tested on CI. But that is still quite some work, perhaps, requires starting e.g. a minio server, and such. But that would really be helpful. Maybe we can create a new issue for this?

wvengen · 2024-10-29T09:00:15Z

scrapyd_k8s.sample-k8s.conf

+
+;[joblogs]
+;# Choose storage provider
+;storage_provider = s3
+;container_name   = <container_name>
+
+;# Choose number of unique logs, but at least 2
+;num_lines_to_check = 2
+
+;[joblogs.storage.s3]
+;key    = ${S3_KEY}
+;secret = ${S3_SECRET}
+;region = ${S3_REGION}


Can use we # to comment this out instead? ; looks like it's fine, but it's less common to use for commenting out.
Also, let's include a real-world (commented-out) example here, so use a container_name like scrapyd-k8s-example-bucket, and explain in a comment with the storage, that one needs to set the S3_KEY and S3_SECRET environment variables. Please include region, so it's clear to readers that one can mix and match between environment and string, also in the storage.s3 section

wvengen · 2024-10-29T09:02:21Z

scrapyd_k8s/api.py

@@ -148,5 +146,7 @@ def run():
    if config_username is not None and config_password is not None:
        enable_authentication(app, config_username, config_password)

+    launcher.enable_joblogs(config)


What about putting this in a conditional like if config.joblogs() is not None, I think that makes it more readable, now I think each time I read it: hey, is joblogs always enabled or not?
This would also avoid a warning about joblogs not being enabled when not configured. I think of this feature as an add-on, not as something that everyone will use.

scrapyd_k8s/config.py

scrapyd_k8s/joblogs/log_handler_k8s.py

wvengen · 2024-10-29T09:04:54Z

scrapyd_k8s/joblogs/log_handler_k8s.py

+            logger.warning(f"File not found: {file_path}")
+            return []
+
+    def concatenate_and_delete_files(self, main_file_path, temp_file_path, block_size=6144):


same block size here, this really could use a class variable

wvengen · 2024-10-29T09:05:26Z

scrapyd_k8s/launcher/docker.py

@@ -66,6 +65,10 @@ def cancel(self, project_id, job_id, signal):
            c.kill(signal='SIG' + signal)
        return prevstate

+    def enable_joblogs(self, config):
+        # Do nothing, since joblogs is not supported on Docker
+        pass


the very least we can do is log an error message here, right?

… not work for docker; refactored code to address minor changes

leewesleyv

Overal looks OK, interesting approach for handling the logging challenges!

There are quite a few functions that suffer from a being very lengthy. If you have the chance maybe see if you can refactor this a bit. One thing that I would definitely change is the use of the config class. Introduce some properties there that make it a lot easier to check for certain things in the config. This is something that happens in multiple places throughout the code, and would be nice to have just in 1 location.

It seems like there are also no QL check in this project. Even though that is the case, it would still be beneficial to use these tools during your own development to see how you can improve your changes to the codebase. This of course keeping in mind with other decisions that were made beforehand.

leewesleyv · 2024-10-30T13:10:12Z

scrapyd_k8s/api.py

@@ -148,5 +148,11 @@ def run():
    if config_username is not None and config_password is not None:
        enable_authentication(app, config_username, config_password)

+    if config.joblogs() is not None:


Lets introduce a property in the config class that returns a bool and use that to check here instead of calling a function that is supposed to return something that we check. I would suggest splitting it up like

@property def joblogs_is_configured(self): return self._config.has_section('joblogs') def joblogs(self): return self._config['joblogs'] if self.joblogs_is_configured else None

Then we can improve this line here by checking the property

if config.joblogs_is_configured: ...

Not sure if joblogs_is_configured is the right property name, but just as a suggestion. What do you think?

Hey Wesley, thanks for your review. I don't have anything against property or decorator pattern, but since we don't use it, it really belongs to another PR about refactoring of the whole project, not my logging feature.

scrapyd_k8s/__main__.py

scrapyd_k8s/config.py

scrapyd_k8s/joblogs/__init__.py

leewesleyv · 2024-10-30T13:21:32Z

scrapyd_k8s/joblogs/__init__.py

+    None
+    """
+    joblogs_config = config.joblogs()
+    if joblogs_config and joblogs_config.get('storage_provider') is not None:


I'd also move the logic for checking the storage provider to the config class so if we need it somewhere else we would only need to change the logic within the config class in the future.

Logging feature is an additional "package" you can use, it's not default, so cluttering core code components with it not a good way, in my opinion.

leewesleyv · 2024-10-30T13:36:14Z

scrapyd_k8s/joblogs/log_handler_k8s.py

+                block_size = self.DEFAULT_BLOCK_SIZE
+                data = b''
+                remaining_lines = num_lines
+                while remaining_lines > 0 and file_size > 0:


Might be good to have some description of the algorithm somewhere in the docstring or as inline comments. Perhaps we could also split this up a few different functions, perhaps entirely as a separate utility?

There are several comments across the algorithm, otherwise a special description that explains how it works belongs to wiki or something like that. If you have specific lines where you expect a specific comment, let me know, because to me the algorithm is clear.

leewesleyv · 2024-10-30T13:38:58Z

scrapyd_k8s/joblogs/log_handler_k8s.py

+                            log_lines_counter = 0
+
+                if len(last_n_lines) > log_lines_counter:
+                    self.concatenate_and_delete_files(log_file_path, temp_file_path)


What should happen if an exception is raised here?

It will be caught because the whole thing is wrapped inside try/except. Is this the answer you were looking for?

leewesleyv · 2024-10-30T13:42:26Z

scrapyd_k8s/joblogs/log_handler_k8s.py

+        -------
+        None
+        """
+        try:


Wrapping this entire logic inside a try except and catching a bare exception is not very clean. Perhaps wrap the function where we call it and handle what needs to happen when an exception is raised. But first lets see if we can make the except more specific.

Good point, I changed it so only specific exceptions are caught

leewesleyv · 2024-10-30T13:48:02Z

scrapyd_k8s/joblogs/log_handler_k8s.py

+        if self.pod_tmp_mapping.get(job_name) is not None:
+            return self.pod_tmp_mapping[job_name]
+        else:
+            temp_dir = tempfile.gettempdir()
+            app_temp_dir = os.path.join(temp_dir, 'job_logs')
+            os.makedirs(app_temp_dir, exist_ok=True)
+            fd, path = tempfile.mkstemp(prefix=f"{job_name}_logs_", suffix=".txt", dir=app_temp_dir)
+            os.close(fd)
+            self.pod_tmp_mapping[job_name] = path
+            return path


Suggested change

if self.pod_tmp_mapping.get(job_name) is not None:

return self.pod_tmp_mapping[job_name]

else:

temp_dir = tempfile.gettempdir()

app_temp_dir = os.path.join(temp_dir, 'job_logs')

os.makedirs(app_temp_dir, exist_ok=True)

fd, path = tempfile.mkstemp(prefix=f"{job_name}_logs_", suffix=".txt", dir=app_temp_dir)

os.close(fd)

self.pod_tmp_mapping[job_name] = path

return path

if self.pod_tmp_mapping.get(job_name) is not None:

return self.pod_tmp_mapping[job_name]

temp_dir = tempfile.gettempdir()

app_temp_dir = os.path.join(temp_dir, 'job_logs')

os.makedirs(app_temp_dir, exist_ok=True)

fd, path = tempfile.mkstemp(prefix=f"{job_name}_logs_", suffix=".txt", dir=app_temp_dir)

os.close(fd)

self.pod_tmp_mapping[job_name] = path

return path

leewesleyv · 2024-10-30T14:04:23Z

scrapyd_k8s/object_storage/libcloud_driver.py

+            return False
+        except Exception as e:
+            logger.exception(f"An unexpected error occurred while checking for object '{object_name}': {e}")
+            return False


We can remove 4 of returns by moving the return False to the end of the logic

Good point, thank you, I was adding exceptions sequentially while reading docs and experimenting with different scenarios, so I ended up in this ugly piece of code:D Fixed!

wvengen · 2024-10-31T07:12:00Z

Thanks for your comments, @leewesleyv!
The use of properties in the config class is a nice idea, which we don't currently do. I think that could perhaps be tackled in a different issue, because I would expect the code to use that consistently on other places too.

I think moving config file handling to the config class would make sense (though I there are also arguments to keep joblogs-related code in the joblogs parts). Then if the joblogs-related code in the config class would grow too much (which I don't think will happen with this PR, but I might be wrong), then it could warrant its own joblog-config class. So I'd say: go ahead with @leewesleyv's suggestions, if you think it's a good idea.

…otes in all logging statements; refactor to single False statement in object exists

vlerkin · 2024-10-31T08:39:06Z

I think all the comments were addressed, is there anything else I can do or we are ready to merge?:)
@wvengen

wvengen · 2024-10-31T09:57:35Z

I think it would be ready to merge - and it's time to try this in production 👍
@leewesleyv if you see any remaining points, feel free to submit a PR with suggestions!

vlerkin added 2 commits September 10, 2024 21:14

resource watcher, log watcher, libcloud, object_storage

652b70c

add fake lines as credentials for libcloud

eac5930

wvengen reviewed Sep 11, 2024

View reviewed changes