1
- ## About
2
- This file provides you with the detailed description of parameters listed in the config file, and explaining why they are used
3
- and when you are expected to provide or change them.
1
+ # scrapyd-k8s configuration
4
2
5
- ## Configuration file
3
+ scrapyd-k8s is configured with the file ` scrapyd_k8s.conf ` . The file format is meant to
4
+ stick to [ scrapyd's configuration] ( https://scrapyd.readthedocs.io/en/latest/config.html ) where possible.
5
+
6
+ ## ` [scrapyd] ` section
6
7
7
8
* ` http_port ` - defaults to ` 6800 ` ([ ➽] ( https://scrapyd.readthedocs.io/en/latest/config.html#http-port ) )
8
9
* ` bind_address ` - defaults to ` 127.0.0.1 ` ([ ➽] ( https://scrapyd.readthedocs.io/en/latest/config.html#bind-address ) )
@@ -14,25 +15,57 @@ and when you are expected to provide or change them.
14
15
15
16
The Docker and Kubernetes launchers have their own additional options.
16
17
17
- ## [ scrapyd] section, reconnection_attempts, backoff_time, backoff_coefficient
18
+ ## project sections
19
+
20
+ Each project you want to be able to run, gets its own section, prefixed with ` project. ` . For example,
21
+ consider an ` example ` spider, this would be defined in a ` [project.example] ` section.
22
+
23
+ * ` repository ` - container repository for the project, e.g. ` ghcr.io/q-m/scrapyd-k8s-spider-example `
24
+
25
+ ## Docker
26
+
27
+ This section describes Docker-specific options.
28
+ See [ ` scrapyd_k8s.sample-docker.conf ` ] ( scrapyd_k8s.sample-docker.conf ) for an example.
29
+
30
+ * ` [scrapyd] ` ` launcher ` - set this to ` scrapyd_k8s.launcher.Docker `
31
+ * ` [scrapyd] ` ` repository ` - choose between ` scrapyd_k8s.repository.Local ` and ` scrapyd_k8s.repository.Remote `
32
+
33
+ TODO: explain ` Local ` and ` Remote ` repository, and how to use them
34
+
35
+ ## Kubernetes
36
+
37
+ This section describes Kubernetes-specific options.
38
+ See [ ` scrapyd_k8s.sample-k8s.conf ` ] ( scrapyd_k8s.sample-k8s.conf ) for an example.
18
39
19
- ### Context
20
- The Kubernetes event watcher is used in the code as part of the joblogs feature and is also utilized for limiting the
21
- number of jobs running in parallel on the cluster. Both features are not enabled by default and can be activated if you
40
+ * ` [scrapyd] ` ` launcher ` - set this to ` scrapyd_k8s.launcher.K8s `
41
+ * ` [scrapyd] ` ` repository ` - set this to ` scrapyd_k8s.repository.Remote `
42
+
43
+ For Kubernetes, it is important to set resource limits.
44
+
45
+ TODO: explain how to set limits, with default, project and spider specificity.
46
+
47
+
48
+ ### Kubernetes API interaction
49
+
50
+ The Kubernetes event watcher is used in the code as part of the joblogs feature and is also utilized for limiting the
51
+ number of jobs running in parallel on the cluster. Both features are not enabled by default and can be activated if you
22
52
choose to use them.
23
53
24
- The event watcher establishes a connection to the Kubernetes API and receives a stream of events from it. However, the
25
- nature of this long-lived connection is unstable; it can be interrupted by network issues, proxies configured to terminate
26
- long-lived connections, and other factors. For this reason, a mechanism was implemented to re-establish the long-lived
27
- connection to the Kubernetes API. To achieve this, three parameters were introduced: ` reconnection_attempts ` ,
54
+ The event watcher establishes a connection to the Kubernetes API and receives a stream of events from it. However, the
55
+ nature of this long-lived connection is unstable; it can be interrupted by network issues, proxies configured to terminate
56
+ long-lived connections, and other factors. For this reason, a mechanism was implemented to re-establish the long-lived
57
+ connection to the Kubernetes API. To achieve this, three parameters were introduced: ` reconnection_attempts ` ,
28
58
` backoff_time ` and ` backoff_coefficient ` .
29
59
30
- ### What are these parameters about?
31
- - ` reconnection_attempts ` - defines how many consecutive attempts will be made to reconnect if the connection fails;
32
- - ` backoff_time ` and ` backoff_coefficient ` - are used to gradually slow down each subsequent attempt to establish a
33
- connection with the Kubernetes API, preventing the API from becoming overloaded with requests. The ` backoff_time ` increases
34
- exponentially and is calculated as ` backoff_time *= self.backoff_coefficient ` .
60
+ #### What are these parameters about?
61
+
62
+ * ` reconnection_attempts ` - defines how many consecutive attempts will be made to reconnect if the connection fails;
63
+ * ` backoff_time ` , ` backoff_coefficient ` - are used to gradually slow down each subsequent attempt to establish a
64
+ connection with the Kubernetes API, preventing the API from becoming overloaded with requests.
65
+ The ` backoff_time ` increases exponentially and is calculated as ` backoff_time *= self.backoff_coefficient ` .
66
+
67
+ #### When do I need to change it in the config file?
68
+
69
+ Default values for these parameters are provided in the code and are tuned to an "average" cluster setting. If your network
70
+ requirements or other conditions are unusual, you may need to adjust these values to better suit your specific setup.
35
71
36
- ### When do I need to change it in the config file?
37
- Default values for these parameters are provided in the code and are tuned to an "average" cluster setting. If your network
38
- requirements or other conditions are unusual, you may need to adjust these values to better suit your specific setup.
0 commit comments