Data Model

Collections

Collections are distinct sets of OCDS data. They are the largest unit on which this tool operates.

A collection is uniquely identified by the combination of:

  • Name (source_id): A string. If the collection was created by Kingfisher Collect, this is the name attribute of the spider.

  • Date (data_version): The date and time at which the collection was created. If the collection was created by Kingfisher Collect, this is the start_time statistic of the crawl.

  • Sample (sample): A boolean. Whether the collection is only a sample of the data from the source.

  • Base collection (transform_from_collection_id): An integer. The ID of the collection that was transformed into this collection.

  • Transform type (transform_type): A string. The identifier of the transformer that was used to produce this collection.

Each collection is given an integer ID; this is used to refer to the collection in the Command-line tool and the database.

Collections are created by Kingfisher Collect, the web API, or the new-collection command.

Schema check flags

Collections have flags that indicate what operations to perform on them. These are:

check_data

Run CoVE schema checks on the data in this collection

check_older_data_with_schema_version_1_1

Force OCDS 1.1 checks to be run on OCDS 1.0 data (instead of OCDS 1.0 checks)

To configure the default values for these flags, see Configuration.

Transformed collections

Presently, the tool offers two transformers:

upgrade-1-0-to-1-1

upgrade a collection’s data from OCDS 1.0 to OCDS 1.1

compile-releases

merge a collection’s releases into compiled releases

To transform a collection, create a new collection that refers to the base collection, with either the new-transform-compile-releases or new-transform-upgrade-1-0-to-1-1 command, then run the transform-collection command.

Files

A collection contains one or more files. A file is uniquely identified by its collection and filename. Files can have:

errors

The file could not be retrieved. Presently, errors are either reported by Kingfisher Collect or caught by the local-load command.

warnings

The file contents had to be modified in order to be stored. Presently, the only warning is about the removal of control characters.

File types

The local-load command must be given the type of the file to load:

record

A single record

release

A single release

record_list

A JSON array of records, like [ { record-1 }, { record-2 } ]

release_list

A JSON array of releases

record_package

A single record package

release_package

A single release package

record_package_list

A JSON array of record packages, like [ { record-package-1 }, { record-package-2 } ]

release_package_list

A JSON array of release packages

record_package_json_lines

Line-delimited JSON, in which each line is a record package

release_package_json_lines

As above, but release packages

record_package_list_in_results

A JSON object with a results key whose value is a JSON array of record packages, like { "results": [ { record-package-1 }, { record-package-2 } ] }

release_package_list_in_results

As above, but release packages

release_package_in_ocdsReleasePackage_in_list_in_results

A JSON object has a results key whose value is a list. Every item in that list is a JSON object. The object has a ocdsReleasePackage key who’s value is a release package

release_in_Release_json_lines

Line-delimited JSON, in which each line is a JSON object. The object has a Release key who’s value is a release

Items

A file contains one or more items. An item is an OCDS resource: a release, record, release package or record package. An item is uniquely identified by its index (number) within the file. Indices are 0-based.

Files of the type record, release, record_package, or release_package have one item only. Files of other types have one or more items.

Kingfisher Process writes errors to the collection_file_item_errors table when it cannot load an item.