Command Line Interface Reference

corgi

corgi [OPTIONS]

Options

--gpu, --no-gpu

Whether or not to use a GPU for processing if available.

Default:: True

--pretrained <pretrained>: The location (URL or filepath) of a pretrained model.

--reload, --no-reload

Should the pretrained model be downloaded again if it is online and already present locally.

Default:: False

--file <file>: A fasta file with sequences to be classified.

--max-seqs <max_seqs>

--batch-size <batch_size>

Default:: 1

--max-length <max_length>

Default:: 5000

--min-length <min_length>

Default:: 128

--output-dir <output_dir>: A path to output the results as a CSV.

--csv <csv>: A path to output the results as a CSV. If not given then a default name is chosen inside the output directory.

--save-filtered, --no-save-filtered

Whether or not to save the filtered sequences.

Default:: True

--threshold <threshold>: The threshold to use for filtering. If not given, then only the most likely category used for filtering.

corgi-train

corgi-train [OPTIONS] COMMAND [ARGS]...

Options

-v, --version: Prints the current version.

--install-completion: Install completion for the current shell.

--show-completion: Show completion for the current shell, to copy it or customize the installation.

bibliography

corgi-train bibliography [OPTIONS]

bibtex

corgi-train bibtex [OPTIONS]

export

corgi-train export [OPTIONS] MODEL_PATH

Options

--fp16, --no-fp16

Whether or not the floating-point precision of learner should be set to 16 bit.

Default:: True

--output-dir <output_dir>

The location of the output directory.

Default:: ./outputs

--weight-decay <weight_decay>: The amount of weight decay. If None then it uses the default amount of weight decay in fastai.

--csv <csv>: The CSV which has the sequences to use.

--base-dir <base_dir>: The base directory with the RefSeq HDF5 files.

--batch-size <batch_size>

The batch size.

Default:: 32

--dataloader-type <dataloader_type>

Default:: DataloaderType.PLAIN
Options:: PLAIN | WEIGHTED | STRATIFIED

--validation-seq-length <validation_seq_length>

Default:: 1000

--deform-lambda <deform_lambda>: The lambda for the deform transform.

--embedding-dim <embedding_dim>

The size of the embeddings for the nucleotides (N, A, G, C, T).

Default:: 8

--filters <filters>

The number of filters in each of the 1D convolution layers. These are concatenated together

Default:: 256

--cnn-layers <cnn_layers>

The number of 1D convolution layers.

Default:: 6

--kernel-size-maxpool <kernel_size_maxpool>

The size of the pooling before going to the LSTM.

Default:: 2

--lstm-dims <lstm_dims>

The size of the hidden layers in the LSTM in both directions.

Default:: 256

--final-layer-dims <final_layer_dims>

The size of a dense layer after the LSTM. If this is zero then this layer isn’t used.

Default:: 0

--dropout <dropout>

The amount of dropout to use. (not currently enabled)

Default:: 0.2

--final-bias, --no-final-bias

Whether or not to use bias in the final layer.

Default:: True

--cnn-only, --no-cnn-only

Default:: True

--kernel-size <kernel_size>

The size of the kernels for CNN only classifier.

Default:: 3

--cnn-dims-start <cnn_dims_start>: The size of the number of filters in the first CNN layer. If not set then it is derived from the MACC

--factor <factor>

The factor to multiply the number of filters in the CNN layers each time it is downscaled.

Default:: 2.0

--penultimate-dims <penultimate_dims>

The factor to multiply the number of filters in the CNN layers each time it is downscaled.

Default:: 1024

--include-length, --no-include-length

Default:: False

--transformer-heads <transformer_heads>

The number of heads in the transformer.

Default:: 8

--transformer-layers <transformer_layers>

The number of layers in the transformer. If zero then no transformer is used.

Default:: 0

--macc <macc>

The approximate number of multiply or accumulate operations in the model. Used to set cnn_dims_start if not provided explicitly.

Default:: 10000000

--project-name <project_name>: The name for this project for logging purposes.

--run-name <run_name>: The name for this particular run for logging purposes.

--run-id <run_id>: A unique ID for this particular run for logging purposes.

--notes <notes>: A longer description of the run for logging purposes.

--tag <tag>: A tag for logging purposes. Multiple tags can be added each introduced with –tag.

--wandb, --no-wandb

Whether or not to use ‘Weights and Biases’ for logging.

Default:: False

--wandb-mode <wandb_mode>

The mode for ‘Weights and Biases’.

Default:: online

--wandb-dir <wandb_dir>: The location for ‘Weights and Biases’ output.

--wandb-entity <wandb_entity>: An entity is a username or team name where you’re sending runs.

--wandb-group <wandb_group>: Specify a group to organize individual runs into a larger experiment.

--wandb-job-type <wandb_job_type>: Specify the type of run, which is useful when you’re grouping runs together into larger experiments using group.

--mlflow, --no-mlflow

Whether or not to use MLflow for logging.

Default:: False

Arguments

MODEL_PATH: Required argument <click.types.Path object at 0x7fdaf8f069d0>

infer

corgi-train infer [OPTIONS]

Options

--gpu, --no-gpu

Whether or not to use a GPU for processing if available.

Default:: True

--pretrained <pretrained>: The location (URL or filepath) of a pretrained model.

--reload, --no-reload

Should the pretrained model be downloaded again if it is online and already present locally.

Default:: False

--file <file>: A fasta file with sequences to be classified.

--max-seqs <max_seqs>

--batch-size <batch_size>

Default:: 1

--max-length <max_length>

Default:: 5000

--min-length <min_length>

Default:: 128

--output-dir <output_dir>: A path to output the results as a CSV.

--csv <csv>: A path to output the results as a CSV. If not given then a default name is chosen inside the output directory.

--save-filtered, --no-save-filtered

Whether or not to save the filtered sequences.

Default:: True

--threshold <threshold>: The threshold to use for filtering. If not given, then only the most likely category used for filtering.

lr-finder

corgi-train lr-finder [OPTIONS]

Options

--plot-filename <plot_filename>

--start-lr <start_lr>

Default:: 1e-07

--end-lr <end_lr>

Default:: 10

--iterations <iterations>

Default:: 100

--fp16, --no-fp16

Whether or not the floating-point precision of learner should be set to 16 bit.

Default:: True

--output-dir <output_dir>

The location of the output directory.

Default:: ./outputs

--weight-decay <weight_decay>: The amount of weight decay. If None then it uses the default amount of weight decay in fastai.

--csv <csv>: The CSV which has the sequences to use.

--base-dir <base_dir>: The base directory with the RefSeq HDF5 files.

--batch-size <batch_size>

The batch size.

Default:: 32

--dataloader-type <dataloader_type>

Default:: DataloaderType.PLAIN
Options:: PLAIN | WEIGHTED | STRATIFIED

--validation-seq-length <validation_seq_length>

Default:: 1000

--deform-lambda <deform_lambda>: The lambda for the deform transform.

--embedding-dim <embedding_dim>

The size of the embeddings for the nucleotides (N, A, G, C, T).

Default:: 8

--filters <filters>

The number of filters in each of the 1D convolution layers. These are concatenated together

Default:: 256

--cnn-layers <cnn_layers>

The number of 1D convolution layers.

Default:: 6

--kernel-size-maxpool <kernel_size_maxpool>

The size of the pooling before going to the LSTM.

Default:: 2

--lstm-dims <lstm_dims>

The size of the hidden layers in the LSTM in both directions.

Default:: 256

--final-layer-dims <final_layer_dims>

The size of a dense layer after the LSTM. If this is zero then this layer isn’t used.

Default:: 0

--dropout <dropout>

The amount of dropout to use. (not currently enabled)

Default:: 0.2

--final-bias, --no-final-bias

Whether or not to use bias in the final layer.

Default:: True

--cnn-only, --no-cnn-only

Default:: True

--kernel-size <kernel_size>

The size of the kernels for CNN only classifier.

Default:: 3

--cnn-dims-start <cnn_dims_start>: The size of the number of filters in the first CNN layer. If not set then it is derived from the MACC

--factor <factor>

The factor to multiply the number of filters in the CNN layers each time it is downscaled.

Default:: 2.0

--penultimate-dims <penultimate_dims>

The factor to multiply the number of filters in the CNN layers each time it is downscaled.

Default:: 1024

--include-length, --no-include-length

Default:: False

--transformer-heads <transformer_heads>

The number of heads in the transformer.

Default:: 8

--transformer-layers <transformer_layers>

The number of layers in the transformer. If zero then no transformer is used.

Default:: 0

--macc <macc>

The approximate number of multiply or accumulate operations in the model. Used to set cnn_dims_start if not provided explicitly.

Default:: 10000000

show-batch

corgi-train show-batch [OPTIONS]

Options

--output-path <output_path>: A location to save the HTML which summarizes the batch.

--csv <csv>: The CSV which has the sequences to use.

--base-dir <base_dir>: The base directory with the RefSeq HDF5 files.

--batch-size <batch_size>

The batch size.

Default:: 32

--dataloader-type <dataloader_type>

Default:: DataloaderType.PLAIN
Options:: PLAIN | WEIGHTED | STRATIFIED

--validation-seq-length <validation_seq_length>

Default:: 1000

--deform-lambda <deform_lambda>: The lambda for the deform transform.

train

corgi-train train [OPTIONS]

Options

--distributed, --no-distributed

If the learner is distributed.

Default:: False

--fp16, --no-fp16

Whether or not the floating-point precision of learner should be set to 16 bit.

Default:: True

--output-dir <output_dir>

The location of the output directory.

Default:: ./outputs

--weight-decay <weight_decay>: The amount of weight decay. If None then it uses the default amount of weight decay in fastai.

--csv <csv>: The CSV which has the sequences to use.

--base-dir <base_dir>: The base directory with the RefSeq HDF5 files.

--batch-size <batch_size>

The batch size.

Default:: 32

--dataloader-type <dataloader_type>

Default:: DataloaderType.PLAIN
Options:: PLAIN | WEIGHTED | STRATIFIED

--validation-seq-length <validation_seq_length>

Default:: 1000

--deform-lambda <deform_lambda>: The lambda for the deform transform.

--embedding-dim <embedding_dim>

The size of the embeddings for the nucleotides (N, A, G, C, T).

Default:: 8

--filters <filters>

The number of filters in each of the 1D convolution layers. These are concatenated together

Default:: 256

--cnn-layers <cnn_layers>

The number of 1D convolution layers.

Default:: 6

--kernel-size-maxpool <kernel_size_maxpool>

The size of the pooling before going to the LSTM.

Default:: 2

--lstm-dims <lstm_dims>

The size of the hidden layers in the LSTM in both directions.

Default:: 256

--final-layer-dims <final_layer_dims>

The size of a dense layer after the LSTM. If this is zero then this layer isn’t used.

Default:: 0

--dropout <dropout>

The amount of dropout to use. (not currently enabled)

Default:: 0.2

--final-bias, --no-final-bias

Whether or not to use bias in the final layer.

Default:: True

--cnn-only, --no-cnn-only

Default:: True

--kernel-size <kernel_size>

The size of the kernels for CNN only classifier.

Default:: 3

--cnn-dims-start <cnn_dims_start>: The size of the number of filters in the first CNN layer. If not set then it is derived from the MACC

--factor <factor>

The factor to multiply the number of filters in the CNN layers each time it is downscaled.

Default:: 2.0

--penultimate-dims <penultimate_dims>

The factor to multiply the number of filters in the CNN layers each time it is downscaled.

Default:: 1024

--include-length, --no-include-length

Default:: False

--transformer-heads <transformer_heads>

The number of heads in the transformer.

Default:: 8

--transformer-layers <transformer_layers>

The number of layers in the transformer. If zero then no transformer is used.

Default:: 0

--macc <macc>

The approximate number of multiply or accumulate operations in the model. Used to set cnn_dims_start if not provided explicitly.

Default:: 10000000

--epochs <epochs>

The number of epochs.

Default:: 20

--freeze-epochs <freeze_epochs>

The number of epochs to train when the learner is frozen and the last layer is trained by itself. Only if fine_tune is set on the app.

Default:: 3

--learning-rate <learning_rate>

The base learning rate (when fine tuning) or the max learning rate otherwise.

Default:: 0.0001

--project-name <project_name>: The name for this project for logging purposes.

--run-name <run_name>: The name for this particular run for logging purposes.

--run-id <run_id>: A unique ID for this particular run for logging purposes.

--notes <notes>: A longer description of the run for logging purposes.

--tag <tag>: A tag for logging purposes. Multiple tags can be added each introduced with –tag.

--wandb, --no-wandb

Whether or not to use ‘Weights and Biases’ for logging.

Default:: False

--wandb-mode <wandb_mode>

The mode for ‘Weights and Biases’.

Default:: online

--wandb-dir <wandb_dir>: The location for ‘Weights and Biases’ output.

--wandb-entity <wandb_entity>: An entity is a username or team name where you’re sending runs.

--wandb-group <wandb_group>: Specify a group to organize individual runs into a larger experiment.

--wandb-job-type <wandb_job_type>: Specify the type of run, which is useful when you’re grouping runs together into larger experiments using group.

--mlflow, --no-mlflow

Whether or not to use MLflow for logging.

Default:: False

tune

corgi-train tune [OPTIONS]

Options

--runs <runs>

The number of runs to attempt to train the model.

Default:: 1

--engine <engine>

The optimizer to use to perform the hyperparameter tuning. Options: wandb, optuna, skopt.

Default:: skopt

--id <id>

The ID of this hyperparameter tuning job. If using wandb, then this is the sweep id. If using optuna, then this is the storage. If using skopt, then this is the file to store the results.

Default:

--name <name>

An informative name for this hyperparameter tuning job. If empty, then it creates a name from the project name.

Default:

--method <method>

The sampling method to use to perform the hyperparameter tuning. By default it chooses the default method of the engine.

Default:

--min-iter <min_iter>: The minimum number of iterations if using early termination. If left empty, then early termination is not used.

--seed <seed>: A seed for the random number generator.

--distributed, --no-distributed

If the learner is distributed.

Default:: False

--fp16, --no-fp16

Whether or not the floating-point precision of learner should be set to 16 bit.

Default:: True

--output-dir <output_dir>

The location of the output directory.

Default:: ./outputs

--weight-decay <weight_decay>: The amount of weight decay. If None then it uses the default amount of weight decay in fastai.

--csv <csv>: The CSV which has the sequences to use.

--base-dir <base_dir>: The base directory with the RefSeq HDF5 files.

--batch-size <batch_size>

The batch size.

Default:: 32

--dataloader-type <dataloader_type>

Default:: DataloaderType.PLAIN
Options:: PLAIN | WEIGHTED | STRATIFIED

--validation-seq-length <validation_seq_length>

Default:: 1000

--deform-lambda <deform_lambda>: The lambda for the deform transform.

--embedding-dim <embedding_dim>: The size of the embeddings for the nucleotides (N, A, G, C, T).

--filters <filters>

The number of filters in each of the 1D convolution layers. These are concatenated together

Default:: 256

--cnn-layers <cnn_layers>: The number of 1D convolution layers.

--kernel-size-maxpool <kernel_size_maxpool>

The size of the pooling before going to the LSTM.

Default:: 2

--lstm-dims <lstm_dims>

The size of the hidden layers in the LSTM in both directions.

Default:: 256

--final-layer-dims <final_layer_dims>

The size of a dense layer after the LSTM. If this is zero then this layer isn’t used.

Default:: 0

--dropout <dropout>: The amount of dropout to use. (not currently enabled)

--final-bias, --no-final-bias: Whether or not to use bias in the final layer.

--cnn-only, --no-cnn-only

Default:: True

--kernel-size <kernel_size>: The size of the kernels for CNN only classifier.

--cnn-dims-start <cnn_dims_start>: The size of the number of filters in the first CNN layer. If not set then it is derived from the MACC

--factor <factor>: The factor to multiply the number of filters in the CNN layers each time it is downscaled.

--penultimate-dims <penultimate_dims>: The factor to multiply the number of filters in the CNN layers each time it is downscaled.

--include-length, --no-include-length

Default:: False

--transformer-heads <transformer_heads>

The number of heads in the transformer.

Default:: 8

--transformer-layers <transformer_layers>

The number of layers in the transformer. If zero then no transformer is used.

Default:: 0

--macc <macc>

The approximate number of multiply or accumulate operations in the model. Used to set cnn_dims_start if not provided explicitly.

Default:: 10000000

--epochs <epochs>

The number of epochs.

Default:: 20

--freeze-epochs <freeze_epochs>

The number of epochs to train when the learner is frozen and the last layer is trained by itself. Only if fine_tune is set on the app.

Default:: 3

--learning-rate <learning_rate>

The base learning rate (when fine tuning) or the max learning rate otherwise.

Default:: 0.0001

--project-name <project_name>: The name for this project for logging purposes.

--run-name <run_name>: The name for this particular run for logging purposes.

--run-id <run_id>: A unique ID for this particular run for logging purposes.

--notes <notes>: A longer description of the run for logging purposes.

--tag <tag>: A tag for logging purposes. Multiple tags can be added each introduced with –tag.

--wandb, --no-wandb

Whether or not to use ‘Weights and Biases’ for logging.

Default:: False

--wandb-mode <wandb_mode>

The mode for ‘Weights and Biases’.

Default:: online

--wandb-dir <wandb_dir>: The location for ‘Weights and Biases’ output.

--wandb-entity <wandb_entity>: An entity is a username or team name where you’re sending runs.

--wandb-group <wandb_group>: Specify a group to organize individual runs into a larger experiment.

--wandb-job-type <wandb_job_type>: Specify the type of run, which is useful when you’re grouping runs together into larger experiments using group.

--mlflow, --no-mlflow

Whether or not to use MLflow for logging.

Default:: False

validate

corgi-train validate [OPTIONS]

Options

--gpu, --no-gpu

Whether or not to use a GPU for processing if available.

Default:: True

--pretrained <pretrained>: The location (URL or filepath) of a pretrained model.

--reload, --no-reload

Should the pretrained model be downloaded again if it is online and already present locally.

Default:: False

--csv <csv>: The CSV which has the sequences to use.

--base-dir <base_dir>: The base directory with the RefSeq HDF5 files.

--batch-size <batch_size>

The batch size.

Default:: 32

--dataloader-type <dataloader_type>

Default:: DataloaderType.PLAIN
Options:: PLAIN | WEIGHTED | STRATIFIED

--validation-seq-length <validation_seq_length>

Default:: 1000

--deform-lambda <deform_lambda>: The lambda for the deform transform.