API Reference
Connection Class
- class crunch.client.connections.Connection(base_url: Optional[str] = None, token: Optional[str] = None, verbose: bool = False)
An object to manage calls to the REST API of a crunch hosted site.
- __init__(base_url: Optional[str] = None, token: Optional[str] = None, verbose: bool = False)
An object to manage calls to the REST API of a crunch hosted site.
- Parameters
base_url (str, optional) – The URL for the endpoint for the project on the crunch hosted site. If not provided then it attempts to use the ‘CRUNCH_URL’ environment variable.
token (str, optional) – An access token for a user on the crunch hosted site. If not provided then it attempts to use the ‘CRUNCH_TOKEN’ environment variable.
- Raises
CrunchAPIException – If the base_url is not provided and it is not available using the ‘CRUNCH_URL’ environment variable.
CrunchAPIException – If the token is not provided and it is not available using the ‘CRUNCH_TOKEN’ environment variable.
- add_attributes(item: str, **kwargs)
Adds multiple attributes as a key/value pairs on a dataset.
Each type is inferred from the type of the value.
- Parameters
item (str) – The slug for the item.
**kwargs – key/value pairs to add as char attributes.
- add_boolean_attribute(item: str, key: str, value: bool) requests.models.Response
Adds an attribute as a key/value pair on a dataset when the value is a boolean.
- Parameters
item (str) – The slug for the item.
key (str) – The key for this attribute.
value (bool) – The integer value for this attribute.
- Returns
The request object from posting to the crunch API.
- Return type
requests.Response
- add_char_attribute(item: str, key: str, value: str) requests.models.Response
Adds an attribute as a key/value pair on a dataset when the value is a string of characters.
- Parameters
project (str) – The slug for the project.
dataset (str) – The slug for the dataset.
key (str) – The key for this attribute.
value (str) – The string of characters for this attribute.
- Returns
The request object from posting to the crunch API.
- Return type
requests.Response
- add_dataset(project: str, dataset: str, description: str = '', details: str = '') requests.models.Response
Creates a new dataset and adds it to a project on a hosted django-crunch site.
- Parameters
project (str) – The slug of the project that this dataset is to be added to.
dataset (str) – The name of the new dataset.
description (str, optional) – A brief description of this new dataset. Defaults to “”.
details (str, optional) – A long description of this dataset in Markdown format. Defaults to “”.
- Returns
The request object from posting to the crunch API.
- Return type
requests.Response
- add_date_attribute(item: str, key: str, value: Union[datetime.date, str], format: str = '') requests.models.Response
Adds an attribute as a key/value pair on a dataset when the value is a datetime.
- Parameters
item (str) – The slug for the item.
key (str) – The key for this attribute.
value (Union[date,str]) – The value for this attribute as a date or a string.
format (str) – If the value is a string then this format string can be used with datetime.strptime to convert to a date object. If no format is given then the string is interpreted using dateutil.parser.
- Returns
The request object from posting to the crunch API.
- Return type
requests.Response
- add_datetime_attribute(item: str, key: str, value: Union[datetime.datetime, str], format: str = '') requests.models.Response
Adds an attribute as a key/value pair on a dataset when the value is a datetime.
- Parameters
item (str) – The slug for the item.
key (str) – The key for this attribute.
value (Union[datetime,str]) – The value for this attribute as a datetime or a string.
format (str) – If the value is a string then this format string can be used with datetime.strptime to convert to a datetime object. If no format is given then the string is interpreted using dateutil.parser.
- Returns
The request object from posting to the crunch API.
- Return type
requests.Response
- add_filesize_attribute(item: str, key: str, value: int) requests.models.Response
Adds an attribute as a key/value pair on a dataset when the value is an filesize.
- Parameters
item (str) – The slug for the item.
key (str) – The key for this attribute.
value (int) – The size of the file in bytes.
- Returns
The request object from posting to the crunch API.
- Return type
requests.Response
- add_float_attribute(item: str, key: str, value: float) requests.models.Response
Adds an attribute as a key/value pair on a dataset when the value is a float.
- Parameters
item (str) – The slug for the item.
key (str) – The key for this attribute.
value (str) – The float value for this attribute.
- Returns
The request object from posting to the crunch API.
- Return type
requests.Response
- add_integer_attribute(item: str, key: str, value: int) requests.models.Response
Adds an attribute as a key/value pair on a dataset when the value is an integer.
- Parameters
item (str) – The slug for the item.
key (str) – The key for this attribute.
value (int) – The integer value for this attribute.
- Returns
The request object from posting to the crunch API.
- Return type
requests.Response
- add_item(parent: str, item: str, description: str = '', details: str = '') requests.models.Response
Creates a new item on a hosted django-crunch site.
- Parameters
parent (str) – The slug of the parent item that this item is to be added to.
item (str) – The name of the new item.
description (str, optional) – A brief description of this new dataset. Defaults to “”.
details (str, optional) – A long description of this dataset in Markdown format. Defaults to “”.
- Returns
The request object from posting to the crunch API.
- Return type
requests.Response
- add_key_value_attribute(url: str, item: str, key: str, value) requests.models.Response
Adds an attribute as a key/value pair on an item.
This is mainly used by other methods on this class to add attributes with specific types.
- Parameters
url (str) – The relative URL for adding this type of attribute on the crunch site. For this, see urls.py in the crunch Django app.
item (str) – The slug for the item.
key (str) – The key for this attribute.
value – The data to be used for this attribute. The object needs to be serializable.
- Returns
The request object from posting to the crunch API.
- Return type
requests.Response
- add_lat_long_attribute(item: str, key: str, latitude: Union[str, float, unicodedata.decimal], longitude: Union[str, float, unicodedata.decimal]) requests.models.Response
Adds an attribute as a key/value pair on a dataset when the value is a coordinate with latitude and longitude.
- Parameters
item (str) – The slug for the item.
key (str) – The key for this attribute.
latitude (Union[str,float,decimal]) – The latitude for this coordinate.
longitude (Union[str,float,decimal]) – The longitude for this coordinate.
- Returns
The request object from posting to the crunch API.
- Return type
requests.Response
- add_project(project: str, description: str = '', details: str = '') requests.models.Response
Creates a new project on a hosted django-crunch site.
- Parameters
project (str) – The name of the new crunch project.
description (str, optional) – A brief description of this new project. Defaults to “”.
details (str, optional) – A long description of this project in Markdown format. Defaults to “”.
- Returns
The request object from posting to the crunch API.
- Return type
requests.Response
- add_url_attribute(item: str, key: str, value: str) requests.models.Response
Adds an attribute as a key/value pair on a dataset when the value is a URL.
- Parameters
item (str) – The slug for the item.
key (str) – The key for this attribute.
value (str) – The str value for this attribute.
- Returns
The request object from posting to the crunch API.
- Return type
requests.Response
- get_headers() dict
Creates the headers needed to API calls to the REST API on a crunch hosted site.
Used internally when making GET and POST requests using this class.
- Raises
CrunchAPIException – Raised if no valid token is available.
- Returns
The headers for API calls as a Python dictionary.
- Return type
dict
- get_json_response(relative_url: str) Dict
Requests JSON data from the API of a crunch hosted site and returns it as a dictionary.
- Parameters
relative_url (str) – The URL path relative to the base URL of the endpoint for the project on the crunch hosted site.
- Raises
CrunchAPIException – Raises exception if there is an error getting a JSON response from the API.
- Returns
The JSON data from the API encoded as a dictionary.
- Return type
Dict
- send_status(dataset_id: str, stage: crunch.django.app.enums.Stage, state: crunch.django.app.enums.State, note: str = '') requests.models.Response
Sends an update of the status of one stage in processing a dataset.
- Parameters
dataset_id (str) – The ID of the dataset for this status update.
stage (Stage) – The stage of this status update.
state (State) – The state of this status update.
note (str, optional) – A note which gives more information to this status update. Defaults to “”.
- Raises
CrunchAPIException – If there was an error posting this status update to the API.
- Returns
The resulting response from the request to the API.
- Return type
requests.Response
- exception crunch.client.connections.CrunchAPIException
Raised when there is an error getting information from the API of a crunch site.
Run class
- class crunch.client.run.Run(connection: crunch.client.connections.Connection, dataset_slug: str, storage_settings: Union[Dict, pathlib.Path], working_directory: pathlib.Path, workflow_type: crunch.client.enums.WorkflowType, workflow_path: Optional[pathlib.Path] = None, download_from_storage: bool = True, upload_to_storage: bool = True, cleanup: bool = False, cores: str = '1')
An object to manage processing a crunch dataset.
- __init__(connection: crunch.client.connections.Connection, dataset_slug: str, storage_settings: Union[Dict, pathlib.Path], working_directory: pathlib.Path, workflow_type: crunch.client.enums.WorkflowType, workflow_path: Optional[pathlib.Path] = None, download_from_storage: bool = True, upload_to_storage: bool = True, cleanup: bool = False, cores: str = '1')
- property crunch_subdir: pathlib.Path
Returns the path to the .crunch subdirectory in the working directory for this dataset.
- send_status(state, note: str = '') requests.models.Response
Sends a status update about the processing of this dataset.
- setup() crunch.client.enums.RunResult
Sets up this dataset for processing.
This involves:
Copying the initial data from storage
Saving the MD5 checksums for all the initial data in
.crunch/setup_md5_checksums.json
Saves the metadata for the dataset in
.crunch/dataset.json
Saves the metadata for the project in
.crunch/project.json
Creates the script to run the workflow (either a bash script or a Snakefile for Snakemake)
- Returns
Whether or not this stage was successful.
- Return type
RunResult
- property storage: django.core.files.storage.DefaultStorage
Gets the default storage object.
- upload() crunch.client.enums.RunResult
Uploads new or modified files to the storage for the dataset.
It also creates the following files: - .crunch/upload_md5_checksums.json which lists all MD5 checksums after the dataset has finished. - .crunch/deleted.txt which lists all files that were present after setup but which were deleted as the workflow ran.
- Returns
Whether or not this stage was successful.
- Return type
RunResult
- workflow() crunch.client.enums.RunResult
Runs the workflow on a dataset that has been set up.
This involves running a bash script as a subprocess or running Snakemake with a Snakefile.
- Returns
Whether or not this stage was successful.
- Return type
RunResult
Diagnostics Class
- crunch.client.diagnostics.get_diagnostics() dict
Gets diagnostic information about the current environment.
Used when sending status updates to a crunch hosted site.
- Returns
A dictionary with the diagnostic information.
- Return type
dict
- crunch.client.diagnostics.git_revision(directory: Optional[pathlib.Path] = None) str
Gets the git revision hash for a directory.
Adapted from https://stackoverflow.com/a/40170206 which was taken from NumPy.
- Parameters
directory (Path, optional) – The directory we are interested in. Defaults to None in which case it uses the directory of the current source file.
- Returns
The git hash for the current revision.
- Return type
str
- crunch.client.diagnostics.version() str
Gets the version number of the django-crunch module.
- Returns
The current version.
- Return type
str