Contributing

See also

Django and RabbitMQ in the Software Development Handbook

Setup

  1. Install PostgreSQL and RabbitMQ

  2. Create a Python 3.11 virtual environment

  3. Install development dependencies:

    pip install pip-tools
    pip-sync requirements_dev.txt
    
  4. Set up the git pre-commit hook:

    pre-commit install
    
  5. Create the database (your user should have access without requiring a password):

    createdb kingfisher_process
    
  6. Run database migrations:

    ./manage.py migrate
    

Development

The default values in the settings.py file should be appropriate as-is. You can override them by setting Environment variables.

  • Run the server (API):

    ./manage.py runserver
    
  • Run tests:

    ./manage.py test
    

API documentation

See also

API

If you edit views.py, regenerate the OpenAPI document by running the server and:

curl http://127.0.0.1:8000/api/schema/ -o docs/_static/openapi.yaml

Database concurrency

Kingfisher Process works concurrently. As such, it is important to understand Transaction Isolation and Explicit Locking, to guarantee that work isn’t duplicated or missed. As appropriate:

  • Use optimistic locking to not overwrite data, for example:

    updated = Collection.objects.filter(pk=collection.pk, completed_at=None).update(completed_at=Now())
    
  • Use optimistic locking to not repeat work, for example:

    updated = Collection.objects.filter(pk=collection.pk, compilation_started=False).update(compilation_started=True)
    if not updated:
        return
    
  • Specify which fields to save on a Collection instance

  • Lock rows using SELECT … FOR UPDATE on the collection table

Message broker patterns

Enterprise Integration Patterns describes many patterns used in this project and in RabbitMQ. We use:

  • Process Manager: The collection’s configuration determines how messages are routed through a series of steps. See also Routing Slip.

  • Idempotent Receiver: Each worker should be able to safely receive the same message multiple times.

  • Claim Check: Instead of putting OCDS data in messages, we write it to disk and put a claim check in messages.

  • Splitter: For example, one message to load a large file (e.g. record package) might lead to many messages to process each part of the file (e.g. record).

  • Aggregator: For example, the step to merge releases from release packages needs to wait for loading to be completed.

History

Legacy database

Kingfisher Process was rewritten to use Django and RabbitMQ, instead of Flask and SQLAlchemy.

You can compare models.py to the output of:

env DATABASE_URL=postgresql://user@host/dbname ./manage.py inspectdb

Note

Although OCP typically uses an en_US.UTF-8 collation, the database has an en_GB.UTF-8 collation, for no particular reason.