API reference¶
Utilities¶
- process.util.decorator(decode, callback, state, channel, method, properties, body)[source]¶
Close the database connections opened by the callback, before returning.
If the callback raises an exception, shut down the client in the main thread, without acknowledgment. For some exceptions, assume that the same message was delivered twice, log an error, and nack the message.
Models¶
- class process.models.Collection(*args, **kwargs)[source]¶
A collection of data from a source.
There should be at most one collection of a given source (
source_id) at a given time (data_version) of a given scope (sampleor not). A unique constraint therefore covers these fields.A collection can be a sample of a source. For example, an analyst can load a sample of a bulk download, run manual queries to check whether it serves their needs, and then load the full file. To avoid the overhead of deleting the sample, we instead make
samplepart of the unique constraint, along withsource_idanddata_version.- clean_fields(exclude=None)[source]¶
Clean all fields and raise a ValidationError containing a dict of all validation errors if any occur.
- class process.models.CollectionNote(*args, **kwargs)[source]¶
A note an analyst made about the collection.
- class process.models.ProcessingStep(*args, **kwargs)[source]¶
A step in the lifecycle of collection file.
- class process.models.Data(*args, **kwargs)[source]¶
The contents of a release, record or compiled release.
Loader¶
- process.processors.loader.file_or_directory(path)[source]¶
Check whether the path exists. Raise an exception if not.
- process.processors.loader.create_collection_file(collection, filename=None, url=None)[source]¶
Create file for a collection and steps for this file.
- Parameters:
collection (Collection) – collection
filename (str) – path to file data
- Returns:
created collection file
- Raises:
InvalidFormError – if there is a validation error
- Return type:
- process.processors.loader.create_collections(source_id, data_version, *, sample=False, upgrade=False, compile=False, check=False, scrapyd_job='', note='', force=False)[source]¶
Create the root collection, derived collections and notes.
- Parameters:
source_id (str) – collection source
data_version (str) – data version in ISO format
sample (boolean) – is this sample only
upgrade (boolean) – whether to plan collection upgrade
compile (boolean) – whether to plan collection compile
check (boolean) – whether to plan schema-based checks
scrapyd_job (str) – Scrapyd job ID
note (str) – text description
force (boolean) – skip validation of the source_id against the Scrapyd project
- Returns:
the root collection, upgraded collection and compiled_collection
- Return type:
tuple[Collection, Collection, Collection]