Since most analysis is much easier to perform on compiled releases, we recommend working with compiled release collections to begin with.
Get a list of compiled release collections from a particular source¶
The following query returns a list of compiled release collections downloaded from the State Procurement Agency of Georgia’s OCDS API:
SELECT * FROM collection -- the `collection` table contains a list of all collections in the database WHERE source_id = 'georgia_releases' -- filter by collections from the 'georgia_releases' source. AND cached_compiled_releases_count > 0 -- filter by collections containing compiled releases ORDER BY id DESC; -- collection ids are sequential, order by newest first
To find collections from a different source, change the
source_id condition. The
source_id in Kingfisher Process is based on the name of the spider in Kingfisher Collect.
See the list of spiders in the Kingfisher Collect documentation for a list of possible sources.
Get the JSON data stored in a collection¶
Use the collection
id returned by the previous query to restrict your analysis to a single collection.
The following query returns the full JSON data for the first 3 compiled releases in collection 584:
SELECT data FROM data -- raw OCDS JSON data is stored as jsonb blobs in the `data` column of the `data` table JOIN compiled_release ON data.id = compiled_release.data_id -- join to the `compiled_release` table to filter data from a specific collection WHERE collection_id = 584 LIMIT 3;
To get data from a different collection, change the
To get data from a collection containing releases or records, join to the
record tables rather than the
Rendering JSON using Redash
If you are using OCP’s Redash instance, you can render the results of a query as pretty printed and collapsible JSON by clicking the ‘+ New Visualization’ button, setting the visualization type to ‘table’ and setting the data column to display as JSON.
Calculate the total value of completed tenders in a collection¶
In OCDS, the tender value is stored in the
tender.value Value object which consists of a numeric
.amount field and a string
.currency field. The tender status is stored in the
To access the properties of a JSON object use the PostgreSQL
-> operator. The
-> operator takes a JSONB object and a property’s name as input, and returns the property’s value as a JSONB value. The
->> operator returns the value as text.
The following query calculates the total value of completed tenders in collection 584:
SELECT sum((data -> 'tender' -> 'value' -> 'amount')::numeric) AS tender_value, data -> 'tender' -> 'value' ->> 'currency' AS currency FROM data JOIN compiled_release ON data.id = compiled_release.data_id WHERE collection_id = 584 AND data -> 'tender' ->> 'status' = 'complete' GROUP BY currency;
Filtering on status fields
contracts objects in OCDS all have a
Consider which statuses you want to include or exclude from your analysis; for example, you might want to exclude pending and cancelled contracts when calculating the total value of contracts for each buyer.
The OCDS codelist documentation describes the meaning of the statuses for each object.
Calculate the top 10 buyers by award value¶
Since a single contracting process can have many awards, e.g. when divided into lots, the
awards section in OCDS is an array. The award value is stored in the
The following query calculates the top 10 buyers by the value of awards for collection 584.
jsonb_array_elements function used in this query expands the
awards array to a set of JSONB blobs, one for each award.
CROSS JOIN in this query joins each row of the data table with each result of the
jsonb_array_elements function for that row.
SELECT data -> 'buyer' ->> 'name' AS buyer_name, sum((awards -> 'value' -> 'amount')::numeric) AS award_value, awards -> 'value' ->> 'currency' AS currency FROM data JOIN compiled_release ON data.id = compiled_release.data_id CROSS JOIN jsonb_array_elements(data -> 'awards') AS awards WHERE collection_id = 584 AND (awards -> 'value' -> 'amount')::numeric > 0 -- filter out awards with no value AND awards ->> 'status' = 'active' GROUP BY buyer_name, currency ORDER BY award_value DESC LIMIT 10;
Use the PostgreSQL documentation to learn more about operators and functions for working with JSON data.
For simplicity, the above query groups by the
buyer_name column. Using organization names as a dimension in your analysis can be unreliable, since spellings and abbreviations of the same organization name can differ.
OCDS recommends that publishers provide organization identifiers so that the legal entities involved in a contracting process can be reliably identified.
The identifier for an organization in OCDS is stored in the
.identifier field of the entry in the
parties section for the organization.
Querying other collections and fields¶
Coverage of the OCDS schema varies by publisher.
To identify the fields needed for your analysis and how to answer them, use the OCDS schema documentation to understand the meaning, structure and format of the fields in OCDS.
To check whether the fields needed for your analysis are available for a particular collection, you can use the field counts table from Kingfisher Summarize.
To learn more, refer to the querying data in Kingfisher Summarize documentation.