API Reference

Connection Class

class crunch.client.connections.Connection(base_url: Optional[str] = None, token: Optional[str] = None, verbose: bool = False)

An object to manage calls to the REST API of a crunch hosted site.

__init__(base_url: Optional[str] = None, token: Optional[str] = None, verbose: bool = False)

An object to manage calls to the REST API of a crunch hosted site.

Parameters
  • base_url (str, optional) – The URL for the endpoint for the project on the crunch hosted site. If not provided then it attempts to use the ‘CRUNCH_URL’ environment variable.

  • token (str, optional) – An access token for a user on the crunch hosted site. If not provided then it attempts to use the ‘CRUNCH_TOKEN’ environment variable.

Raises
  • CrunchAPIException – If the base_url is not provided and it is not available using the ‘CRUNCH_URL’ environment variable.

  • CrunchAPIException – If the token is not provided and it is not available using the ‘CRUNCH_TOKEN’ environment variable.

add_attributes(item: str, **kwargs)

Adds multiple attributes as a key/value pairs on a dataset.

Each type is inferred from the type of the value.

Parameters
  • item (str) – The slug for the item.

  • **kwargs – key/value pairs to add as char attributes.

add_boolean_attribute(item: str, key: str, value: bool) requests.models.Response

Adds an attribute as a key/value pair on a dataset when the value is a boolean.

Parameters
  • item (str) – The slug for the item.

  • key (str) – The key for this attribute.

  • value (bool) – The integer value for this attribute.

Returns

The request object from posting to the crunch API.

Return type

requests.Response

add_char_attribute(item: str, key: str, value: str) requests.models.Response

Adds an attribute as a key/value pair on a dataset when the value is a string of characters.

Parameters
  • project (str) – The slug for the project.

  • dataset (str) – The slug for the dataset.

  • key (str) – The key for this attribute.

  • value (str) – The string of characters for this attribute.

Returns

The request object from posting to the crunch API.

Return type

requests.Response

add_dataset(project: str, dataset: str, description: str = '', details: str = '') requests.models.Response

Creates a new dataset and adds it to a project on a hosted django-crunch site.

Parameters
  • project (str) – The slug of the project that this dataset is to be added to.

  • dataset (str) – The name of the new dataset.

  • description (str, optional) – A brief description of this new dataset. Defaults to “”.

  • details (str, optional) – A long description of this dataset in Markdown format. Defaults to “”.

Returns

The request object from posting to the crunch API.

Return type

requests.Response

add_date_attribute(item: str, key: str, value: Union[datetime.date, str], format: str = '') requests.models.Response

Adds an attribute as a key/value pair on a dataset when the value is a datetime.

Parameters
  • item (str) – The slug for the item.

  • key (str) – The key for this attribute.

  • value (Union[date,str]) – The value for this attribute as a date or a string.

  • format (str) – If the value is a string then this format string can be used with datetime.strptime to convert to a date object. If no format is given then the string is interpreted using dateutil.parser.

Returns

The request object from posting to the crunch API.

Return type

requests.Response

add_datetime_attribute(item: str, key: str, value: Union[datetime.datetime, str], format: str = '') requests.models.Response

Adds an attribute as a key/value pair on a dataset when the value is a datetime.

Parameters
  • item (str) – The slug for the item.

  • key (str) – The key for this attribute.

  • value (Union[datetime,str]) – The value for this attribute as a datetime or a string.

  • format (str) – If the value is a string then this format string can be used with datetime.strptime to convert to a datetime object. If no format is given then the string is interpreted using dateutil.parser.

Returns

The request object from posting to the crunch API.

Return type

requests.Response

add_filesize_attribute(item: str, key: str, value: int) requests.models.Response

Adds an attribute as a key/value pair on a dataset when the value is an filesize.

Parameters
  • item (str) – The slug for the item.

  • key (str) – The key for this attribute.

  • value (int) – The size of the file in bytes.

Returns

The request object from posting to the crunch API.

Return type

requests.Response

add_float_attribute(item: str, key: str, value: float) requests.models.Response

Adds an attribute as a key/value pair on a dataset when the value is a float.

Parameters
  • item (str) – The slug for the item.

  • key (str) – The key for this attribute.

  • value (str) – The float value for this attribute.

Returns

The request object from posting to the crunch API.

Return type

requests.Response

add_integer_attribute(item: str, key: str, value: int) requests.models.Response

Adds an attribute as a key/value pair on a dataset when the value is an integer.

Parameters
  • item (str) – The slug for the item.

  • key (str) – The key for this attribute.

  • value (int) – The integer value for this attribute.

Returns

The request object from posting to the crunch API.

Return type

requests.Response

add_item(parent: str, item: str, description: str = '', details: str = '') requests.models.Response

Creates a new item on a hosted django-crunch site.

Parameters
  • parent (str) – The slug of the parent item that this item is to be added to.

  • item (str) – The name of the new item.

  • description (str, optional) – A brief description of this new dataset. Defaults to “”.

  • details (str, optional) – A long description of this dataset in Markdown format. Defaults to “”.

Returns

The request object from posting to the crunch API.

Return type

requests.Response

add_key_value_attribute(url: str, item: str, key: str, value) requests.models.Response

Adds an attribute as a key/value pair on an item.

This is mainly used by other methods on this class to add attributes with specific types.

Parameters
  • url (str) – The relative URL for adding this type of attribute on the crunch site. For this, see urls.py in the crunch Django app.

  • item (str) – The slug for the item.

  • key (str) – The key for this attribute.

  • value – The data to be used for this attribute. The object needs to be serializable.

Returns

The request object from posting to the crunch API.

Return type

requests.Response

add_lat_long_attribute(item: str, key: str, latitude: Union[str, float, unicodedata.decimal], longitude: Union[str, float, unicodedata.decimal]) requests.models.Response

Adds an attribute as a key/value pair on a dataset when the value is a coordinate with latitude and longitude.

Parameters
  • item (str) – The slug for the item.

  • key (str) – The key for this attribute.

  • latitude (Union[str,float,decimal]) – The latitude for this coordinate.

  • longitude (Union[str,float,decimal]) – The longitude for this coordinate.

Returns

The request object from posting to the crunch API.

Return type

requests.Response

add_project(project: str, description: str = '', details: str = '') requests.models.Response

Creates a new project on a hosted django-crunch site.

Parameters
  • project (str) – The name of the new crunch project.

  • description (str, optional) – A brief description of this new project. Defaults to “”.

  • details (str, optional) – A long description of this project in Markdown format. Defaults to “”.

Returns

The request object from posting to the crunch API.

Return type

requests.Response

add_url_attribute(item: str, key: str, value: str) requests.models.Response

Adds an attribute as a key/value pair on a dataset when the value is a URL.

Parameters
  • item (str) – The slug for the item.

  • key (str) – The key for this attribute.

  • value (str) – The str value for this attribute.

Returns

The request object from posting to the crunch API.

Return type

requests.Response

get_headers() dict

Creates the headers needed to API calls to the REST API on a crunch hosted site.

Used internally when making GET and POST requests using this class.

Raises

CrunchAPIException – Raised if no valid token is available.

Returns

The headers for API calls as a Python dictionary.

Return type

dict

get_json_response(relative_url: str) Dict

Requests JSON data from the API of a crunch hosted site and returns it as a dictionary.

Parameters

relative_url (str) – The URL path relative to the base URL of the endpoint for the project on the crunch hosted site.

Raises

CrunchAPIException – Raises exception if there is an error getting a JSON response from the API.

Returns

The JSON data from the API encoded as a dictionary.

Return type

Dict

send_status(dataset_id: str, stage: crunch.django.app.enums.Stage, state: crunch.django.app.enums.State, note: str = '') requests.models.Response

Sends an update of the status of one stage in processing a dataset.

Parameters
  • dataset_id (str) – The ID of the dataset for this status update.

  • stage (Stage) – The stage of this status update.

  • state (State) – The state of this status update.

  • note (str, optional) – A note which gives more information to this status update. Defaults to “”.

Raises

CrunchAPIException – If there was an error posting this status update to the API.

Returns

The resulting response from the request to the API.

Return type

requests.Response

exception crunch.client.connections.CrunchAPIException

Raised when there is an error getting information from the API of a crunch site.

Run class

class crunch.client.run.Run(connection: crunch.client.connections.Connection, dataset_slug: str, storage_settings: Union[Dict, pathlib.Path], working_directory: pathlib.Path, workflow_type: crunch.client.enums.WorkflowType, workflow_path: Optional[pathlib.Path] = None, download_from_storage: bool = True, upload_to_storage: bool = True, cleanup: bool = False, cores: str = '1')

An object to manage processing a crunch dataset.

__init__(connection: crunch.client.connections.Connection, dataset_slug: str, storage_settings: Union[Dict, pathlib.Path], working_directory: pathlib.Path, workflow_type: crunch.client.enums.WorkflowType, workflow_path: Optional[pathlib.Path] = None, download_from_storage: bool = True, upload_to_storage: bool = True, cleanup: bool = False, cores: str = '1')
property crunch_subdir: pathlib.Path

Returns the path to the .crunch subdirectory in the working directory for this dataset.

send_status(state, note: str = '') requests.models.Response

Sends a status update about the processing of this dataset.

setup() crunch.client.enums.RunResult

Sets up this dataset for processing.

This involves:

  • Copying the initial data from storage

  • Saving the MD5 checksums for all the initial data in .crunch/setup_md5_checksums.json

  • Saves the metadata for the dataset in .crunch/dataset.json

  • Saves the metadata for the project in .crunch/project.json

  • Creates the script to run the workflow (either a bash script or a Snakefile for Snakemake)

Returns

Whether or not this stage was successful.

Return type

RunResult

property storage: django.core.files.storage.DefaultStorage

Gets the default storage object.

upload() crunch.client.enums.RunResult

Uploads new or modified files to the storage for the dataset.

It also creates the following files: - .crunch/upload_md5_checksums.json which lists all MD5 checksums after the dataset has finished. - .crunch/deleted.txt which lists all files that were present after setup but which were deleted as the workflow ran.

Returns

Whether or not this stage was successful.

Return type

RunResult

workflow() crunch.client.enums.RunResult

Runs the workflow on a dataset that has been set up.

This involves running a bash script as a subprocess or running Snakemake with a Snakefile.

Returns

Whether or not this stage was successful.

Return type

RunResult

Diagnostics Class

crunch.client.diagnostics.get_diagnostics() dict

Gets diagnostic information about the current environment.

Used when sending status updates to a crunch hosted site.

Returns

A dictionary with the diagnostic information.

Return type

dict

crunch.client.diagnostics.git_revision(directory: Optional[pathlib.Path] = None) str

Gets the git revision hash for a directory.

Adapted from https://stackoverflow.com/a/40170206 which was taken from NumPy.

Parameters

directory (Path, optional) – The directory we are interested in. Defaults to None in which case it uses the directory of the current source file.

Returns

The git hash for the current revision.

Return type

str

crunch.client.diagnostics.version() str

Gets the version number of the django-crunch module.

Returns

The current version.

Return type

str