Contributing¶

Setup¶

Install PostgreSQL and RabbitMQ
Create a Python 3.11 virtual environment

Install development dependencies:

pip install pip-tools
pip-sync requirements_dev.txt

Set up the git pre-commit hook:
```
pre-commit install
```
Create the database (your user should have access without requiring a password):
```
createdb kingfisher_process
```
Run database migrations:
```
./manage.py migrate
```

Development¶

The default values in the settings.py file should be appropriate as-is. You can override them by setting Environment variables.

Run the server (API):
```
./manage.py runserver
```
Run tests:
```
./manage.py test
```

API documentation¶

Database concurrency¶

Kingfisher Process works concurrently. As such, it is important to understand Transaction Isolation and Explicit Locking, to guarantee that work isn’t duplicated or missed. As appropriate:

Use optimistic locking to not overwrite data, for example:

updated = Collection.objects.filter(pk=collection.pk, completed_at=None).update(completed_at=Now())

Use optimistic locking to not repeat work, for example:

updated = Collection.objects.filter(pk=collection.pk, compilation_started=False).update(compilation_started=True)
if not updated:
    return

Specify which fields to save on a Collection instance
Lock rows using SELECT … FOR UPDATE on the collection table

Message broker patterns¶

Enterprise Integration Patterns describes many patterns used in this project and in RabbitMQ. We use:

Process Manager: The collection’s configuration determines how messages are routed through a series of steps. See also Routing Slip.
Idempotent Receiver: Each worker should be able to safely receive the same message multiple times.
Claim Check: Instead of putting OCDS data in messages, we write it to disk and put a claim check in messages.
Splitter: For example, one message to load a large file (e.g. record package) might lead to many messages to process each part of the file (e.g. record).
Aggregator: For example, the step to merge releases from release packages needs to wait for loading to be completed.

History¶

Legacy database¶

Kingfisher Process was rewritten to use Django and RabbitMQ, instead of Flask and SQLAlchemy.

You can compare models.py to the output of:

env DATABASE_URL=postgresql://user@host/dbname ./manage.py inspectdb