Contributing¶
Setup¶
Install PostgreSQL and RabbitMQ
Create a Python 3.11 virtual environment
Install development dependencies:
pip install pip-tools pip-sync requirements_dev.txt
Set up the git pre-commit hook:
pre-commit install
Create the database (your user should have access without requiring a password):
createdb kingfisher_process
Run database migrations:
./manage.py migrate
Development¶
The default values in the settings.py
file should be appropriate as-is. You can override them by setting Environment variables.
Run the server (API):
./manage.py runserver
Run tests:
./manage.py test
API documentation¶
See also
If you edit views.py
, regenerate the OpenAPI document by running the server and:
curl http://127.0.0.1:8000/api/schema/ -o docs/_static/openapi.yaml
Database concurrency¶
Kingfisher Process works concurrently. As such, it is important to understand Transaction Isolation and Explicit Locking, to guarantee that work isn’t duplicated or missed. As appropriate:
Use optimistic locking to not overwrite data, for example:
updated = Collection.objects.filter(pk=collection.pk, completed_at=None).update(completed_at=Now())
Use optimistic locking to not repeat work, for example:
updated = Collection.objects.filter(pk=collection.pk, compilation_started=False).update(compilation_started=True) if not updated: return
Specify which fields to save on a
Collection
instanceLock rows using SELECT … FOR UPDATE on the
collection
table
Message broker patterns¶
Enterprise Integration Patterns describes many patterns used in this project and in RabbitMQ. We use:
Process Manager: The collection’s configuration determines how messages are routed through a series of steps. See also Routing Slip.
Idempotent Receiver: Each worker should be able to safely receive the same message multiple times.
Claim Check: Instead of putting OCDS data in messages, we write it to disk and put a claim check in messages.
Splitter: For example, one message to load a large file (e.g. record package) might lead to many messages to process each part of the file (e.g. record).
Aggregator: For example, the step to merge releases from release packages needs to wait for loading to be completed.
History¶
Legacy database¶
Kingfisher Process was rewritten to use Django and RabbitMQ, instead of Flask and SQLAlchemy.
You can compare models.py
to the output of:
env DATABASE_URL=postgresql://user@host/dbname ./manage.py inspectdb
Note
Although OCP typically uses an en_US.UTF-8
collation, the database has an en_GB.UTF-8
collation, for no particular reason.