Skip to content

Commit 6fdc520

Browse files
committed
Improve documentation, fix running with Docker
1 parent d42e5cd commit 6fdc520

File tree

3 files changed

+118
-8
lines changed

3 files changed

+118
-8
lines changed

README.md

+115-5
Original file line numberDiff line numberDiff line change
@@ -32,26 +32,46 @@ You will need to provide a configuration file, use one of the sample configurati
3232
files as a template ([`scrapyd_k8s.sample-k8s.conf`](./scrapyd_k8s.sample-k8s.conf)
3333
or [`scrapyd_k8s.sample-docker.conf`](./scrapyd_k8s.sample-docker.conf)).
3434

35+
The next section explains how to get this running Docker, Kubernetes or Local.
36+
Then read on for an example of how to use the API.
37+
3538
### Docker
3639

3740
```
41+
cp scrapyd_k8s.sample-docker.conf scrapyd_k8s.conf
42+
docker build -t ghcr.io/q-m/scrapyd-k8s:latest .
3843
docker run \
44+
--rm \
3945
-v ./scrapyd_k8s.conf:/opt/app/scrapyd_k8s.conf:ro \
4046
-v /var/run/docker.sock:/var/run/docker.sock \
4147
-v $HOME/.docker/config.json:/root/.docker/config.json:ro \
4248
-u 0 \
49+
-p 127.0.0.1:6800:6800 \
4350
ghcr.io/q-m/scrapyd-k8s:latest
4451
```
4552

46-
This is not really recommended for production, as it exposes the Docker socket and
47-
runs as root. It may be useful to try things out.
53+
You'll be able to talk to localhost on port `6800`.
54+
55+
Make sure to pull the spider image so it is known locally.
56+
In case of the default example spider:
57+
58+
```sh
59+
docker pull ghcr.io/q-m/scrapyd-k8-spider-example
60+
```
61+
62+
Note that running like this in Docker is not really recommended for production,
63+
as it exposes the Docker socket and runs as root. It may be useful to try
64+
things out.
65+
4866

4967
### Kubernetes
5068

5169
1. Create the spider namespace: `kubectl create namespace scrapyd`
5270
2. Adapt the spider configuration in [`kubernetes.yaml`](./kubernetes.yaml) (`scrapyd_k8s.conf` in configmap)
5371
3. Create the resources: `kubectl create -f kubernetes.yaml`
5472

73+
You'll be able to talk to the `scrapyd-k8s` service on port `6800`.
74+
5575
### Local
5676

5777
For development, or just a quick start, you can also run this application locally.
@@ -62,15 +82,105 @@ Requirements:
6282
- Either [Docker](https://www.docker.com/) or [Kubernetes](https://kubernetes.io/) setup and accessible
6383
(scheduling will require Kubernetes 1.24+)
6484

65-
Copy a sample configuration to `scrapyd_k8s.conf` and specify your project details.
85+
This will work with either Docker or Kubernetes (provided it is setup).
86+
For example, for Docker:
87+
88+
```sh
89+
cp scrapyd_k8s.sample_docker.conf scrapyd_k8s.conf
90+
python3 app.py
91+
```
92+
93+
You'll be able to talk to localhost on port `6800`.
6694

67-
For Docker, you probably need to pull the image
95+
For Docker, make sure to pull the spider image so it is known locally.
96+
In case of the default example spider:
6897

6998
```sh
7099
docker pull ghcr.io/q-m/scrapyd-k8-spider-example
71100
```
72101

73-
TODO finish this section
102+
103+
## Accessing the API
104+
105+
With `scrapyd-k8s` running and setup, you can access it. Here we assume that
106+
it listens on `localhost:6800` (for Kubernetes, you would use
107+
the service name `scrapyd-k8s:6800` instead).
108+
109+
```sh
110+
curl http://localhost:6800/daemonstatus.json
111+
```
112+
113+
> ```json
114+
> {"spiders":0,"status":"ok"}
115+
> ```
116+
117+
```sh
118+
curl http://localhost:6800/listprojects.json
119+
```
120+
121+
> ```json
122+
> {"projects":["example"],"status":"ok"}
123+
> ```
124+
125+
```sh
126+
curl 'http://localhost:6800/listversions.json?project=example'
127+
```
128+
129+
> ```json
130+
> {"status":"ok","versions":["latest"]}
131+
> ```
132+
133+
```sh
134+
curl 'http://localhost:6800/listspiders.json?project=example&_version=latest'
135+
```
136+
137+
> ```json
138+
> {"spiders":["quotes"],"status":"ok"}
139+
> ```
140+
141+
```sh
142+
curl 'http://localhost:6800/schedule.json?project=example&_version=latest'
143+
```
144+
145+
> ```json
146+
> {"spiders":["quotes"],"status":"ok"}
147+
> ```
148+
149+
```sh
150+
curl http://localhost:6800/listjobs.json
151+
```
152+
```json
153+
{
154+
"finished":[],
155+
"pending":[],
156+
"running":[{"id":"e9b81fccbec211eeb3b109f30f136c01","project":"example","spider":"quotes","state":"pending"}],
157+
"status":"ok"
158+
}
159+
```
160+
161+
To see what the spider has done, look at the container logs:
162+
163+
```sh
164+
docker ps -a
165+
```
166+
167+
> ```
168+
> CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES
169+
> 8c514a7ac917 ghcr.io/q-m/scrapyd-k8s-spider-example:latest "scrapy crawl quotes" 42s ago Exited (0) 30s ago scrapyd_example_cb50c27cbec311eeb3b109f30f136c01
170+
> ```
171+
172+
```sh
173+
docker logs 8c514a7ac917
174+
```
175+
176+
> ```
177+
> [scrapy.utils.log] INFO: Scrapy 2.11.0 started (bot: example)
178+
> ...
179+
> [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
180+
> {'text': 'The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.', 'author': 'Albert Einstein', 'tags': 'change'}
181+
> ...
182+
> [scrapy.core.engine] INFO: Spider closed (finished)
183+
> ```
74184
75185
76186
## Spider as Docker image

scrapyd_k8s/launcher/docker.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ def schedule(self, repository, project, version, spider, job_id, env_config, env
3030
'SCRAPYD_SPIDER': spider,
3131
'SCRAPYD_JOB': job_id,
3232
} # TODO env_source handling
33-
c = self._docker.containers.create(
33+
c = self._docker.containers.run(
3434
image=repository + ':' + version,
3535
command=['scrapy', 'crawl', spider, *_args, *_settings],
3636
environment=env,

scrapyd_k8s/repository/local.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,14 @@ class Local:
66
def __init__(self, config):
77
pass
88

9-
def listtags(repo):
9+
def listtags(self, repo):
1010
"""Returns available tags from local docker images"""
1111
r = subprocess.check_output(['docker', 'image', 'ls', repo, '--format', '{{ .Tag }}']).decode('utf-8')
1212
tags = r.split('\n')
1313
# TODO error handling
1414
return [t for t in tags if t and t != '<none>']
1515

16-
def listspiders(repo, project, version):
16+
def listspiders(self, repo, project, version):
1717
"""Returns available spiders from a local docker image"""
1818
r = subprocess.check_output(['docker', 'image', 'inspect', repo + ':' + version, '--format', '{{ index .Config.Labels "org.scrapy.spiders" }}']).decode('utf-8')
1919
spiders = r.split(',')

0 commit comments

Comments
 (0)