Data Model¶
Collections¶
Collections are distinct sets of OCDS data. They are the largest unit on which this tool operates.
A collection is uniquely identified by the combination of:
Name (
source_id
): A string. If the collection was created by Kingfisher Collect, this is thename
attribute of the spider.Date (
data_version
): The date and time at which the collection was created. If the collection was created by Kingfisher Collect, this is thestart_time
statistic of the crawl.Sample (
sample
): A boolean. Whether the collection is only a sample of the data from the source.Base collection (
transform_from_collection_id
): An integer. The ID of the collection that was transformed into this collection.Transform type (
transform_type
): A string. The identifier of the transformer that was used to produce this collection.
Each collection is given an integer ID; this is used to refer to the collection in the Command-line tool and the database.
Collections are created by Kingfisher Collect, the web API, or the new-collection command.
Schema check flags¶
Collections have flags that indicate what operations to perform on them. These are:
- check_data
Run CoVE schema checks on the data in this collection
- check_older_data_with_schema_version_1_1
Force OCDS 1.1 checks to be run on OCDS 1.0 data (instead of OCDS 1.0 checks)
To configure the default values for these flags, see Configuration.
Transformed collections¶
Presently, the tool offers two transformers:
- upgrade-1-0-to-1-1
upgrade a collection’s data from OCDS 1.0 to OCDS 1.1
- compile-releases
merge a collection’s releases into compiled releases
To transform a collection, create a new collection that refers to the base collection, with either the new-transform-compile-releases or new-transform-upgrade-1-0-to-1-1 command, then run the transform-collection command.
Files¶
A collection contains one or more files. A file is uniquely identified by its collection and filename. Files can have:
- errors
The file could not be retrieved. Presently, errors are either reported by Kingfisher Collect or caught by the local-load command.
- warnings
The file contents had to be modified in order to be stored. Presently, the only warning is about the removal of control characters.
File types¶
The local-load command must be given the type of the file to load:
- record
A single record
- release
A single release
- record_list
A JSON array of records, like
[ { record-1 }, { record-2 } ]
- release_list
A JSON array of releases
- record_package
A single record package
- release_package
A single release package
- record_package_list
A JSON array of record packages, like
[ { record-package-1 }, { record-package-2 } ]
- release_package_list
A JSON array of release packages
- record_package_json_lines
Line-delimited JSON, in which each line is a record package
- release_package_json_lines
As above, but release packages
- record_package_list_in_results
A JSON object with a
results
key whose value is a JSON array of record packages, like{ "results": [ { record-package-1 }, { record-package-2 } ] }
- release_package_list_in_results
As above, but release packages
- release_package_in_ocdsReleasePackage_in_list_in_results
A JSON object has a
results
key whose value is a list. Every item in that list is a JSON object. The object has aocdsReleasePackage
key who’s value is a release package- release_in_Release_json_lines
Line-delimited JSON, in which each line is a JSON object. The object has a
Release
key who’s value is a release
Items¶
A file contains one or more items. An item is an OCDS resource: a release, record, release package or record package. An item is uniquely identified by its index (number
) within the file. Indices are 0
-based.
Files of the type record
, release
, record_package
, or release_package
have one item only. Files of other types have one or more items.
Kingfisher Process writes errors to the collection_file_item_errors table when it cannot load an item.