API reference¶
Utilities¶
- process.util.decorator(decode, callback, state, channel, method, properties, body)[source]¶
Close the database connections opened by the callback, before returning.
If the callback raises an exception, shut down the client in the main thread, without acknowledgment. For some exceptions, assume that the same message was delivered twice, log an error, and nack the message.
Models¶
- class process.models.Collection(*args, **kwargs)[source]¶
A collection of data from a source.
There should be at most one collection of a given source (
source_id
) at a given time (data_version
) of a given scope (sample
or not). A unique constraint therefore covers these fields.A collection can be a sample of a source. For example, an analyst can load a sample of a bulk download, run manual queries to check whether it serves their needs, and then load the full file. To avoid the overhead of deleting the sample, we instead make
sample
part of the unique constraint, along withsource_id
anddata_version
.- class Transform(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
- clean_fields(exclude=None)[source]¶
Clean all fields and raise a ValidationError containing a dict of all validation errors if any occur.
- get_upgraded_collection()[source]¶
Returns existing upgraded collection or None.
- Returns:
upgraded collection
- Return type:
- get_compiled_collection()[source]¶
Returns existing compiled collection or None.
- Returns:
compiled collection
- Return type:
- class process.models.CollectionNote(*args, **kwargs)[source]¶
A note an analyst made about the collection.
- class process.models.ProcessingStep(*args, **kwargs)[source]¶
A step in the lifecycle of collection file.
- class process.models.CollectionFileItem(*args, **kwargs)[source]¶
An item within a file in the collection.
- class process.models.Data(*args, **kwargs)[source]¶
The contents of a release, record or compiled release.
Loader¶
- process.processors.loader.file_or_directory(string)[source]¶
Checks whether the path is existing file or directory. Raises an exception if not
- process.processors.loader.create_collection_file(collection, filename=None, url=None, errors=None)[source]¶
Creates file for a collection and steps for this file.
- Parameters:
collection (Collection) – collection
filename (str) – path to file data
errors (json) – errors to be stored
- Returns:
created collection file
- Return type:
- Raises:
InvalidFormError – if there is a validation error
- process.processors.loader.create_collections(source_id, data_version, sample=False, upgrade=False, compile=False, check=False, scrapyd_job='', note='', force=False)[source]¶
Creates main collection, note, upgraded collection, compiled collection etc. based on provided data
- Parameters:
source_id (str) – collection source
data_version (str) – data version in ISO format
sample (boolean) – is this sample only
upgrade (boolean) – whether to plan collection upgrade
compile (boolean) – whether to plan collection compile
check (boolean) – whether to plan schema-based checks
scrapyd_job (str) – Scrapyd job ID
note (str) – text description
force (boolean) – skip validation of the source_id against the Scrapyd project
- Returns:
created main collection, upgraded collection, compiled_collection
- Return type: