Command Line Interface Reference

corgi

corgi [OPTIONS]

Options

--gpu, --no-gpu

Whether or not to use a GPU for processing if available.

Default

True

--pretrained <pretrained>

The location (URL or filepath) of a pretrained model.

--reload, --no-reload

Should the pretrained model be downloaded again if it is online and already present locally.

Default

False

--file <file>

A fasta file with sequences to be classified.

--max-seqs <max_seqs>
--batch-size <batch_size>
Default

1

--max-length <max_length>
Default

5000

--min-length <min_length>
Default

128

--output-dir <output_dir>

A path to output the results as a CSV.

--csv <csv>

A path to output the results as a CSV. If not given then a default name is chosen inside the output directory.

--save-filtered, --no-save-filtered

Whether or not to save the filtered sequences.

Default

True

--threshold <threshold>

The threshold to use for filtering. If not given, then only the most likely category used for filtering.

corgi-train

corgi-train [OPTIONS] COMMAND [ARGS]...

Options

-v, --version

Prints the current version.

--install-completion <install_completion>

Install completion for the specified shell.

Options

bash | zsh | fish | powershell | pwsh

--show-completion <show_completion>

Show completion for the specified shell, to copy it or customize the installation.

Options

bash | zsh | fish | powershell | pwsh

bibliography

corgi-train bibliography [OPTIONS]

bibtex

corgi-train bibtex [OPTIONS]

infer

corgi-train infer [OPTIONS]

Options

--gpu, --no-gpu

Whether or not to use a GPU for processing if available.

Default

True

--pretrained <pretrained>

The location (URL or filepath) of a pretrained model.

--reload, --no-reload

Should the pretrained model be downloaded again if it is online and already present locally.

Default

False

--file <file>

A fasta file with sequences to be classified.

--max-seqs <max_seqs>
--batch-size <batch_size>
Default

1

--max-length <max_length>
Default

5000

--min-length <min_length>
Default

128

--output-dir <output_dir>

A path to output the results as a CSV.

--csv <csv>

A path to output the results as a CSV. If not given then a default name is chosen inside the output directory.

--save-filtered, --no-save-filtered

Whether or not to save the filtered sequences.

Default

True

--threshold <threshold>

The threshold to use for filtering. If not given, then only the most likely category used for filtering.

lr-finder

corgi-train lr-finder [OPTIONS]

Options

--plot-filename <plot_filename>
--start-lr <start_lr>
Default

1e-07

--end-lr <end_lr>
Default

10

--iterations <iterations>
Default

100

--fp16, --no-fp16

Whether or not the floating-point precision of learner should be set to 16 bit.

Default

True

--output-dir <output_dir>

The location of the output directory.

Default

./outputs

--weight-decay <weight_decay>

The amount of weight decay. If None then it uses the default amount of weight decay in fastai.

--csv <csv>

The CSV which has the sequences to use.

--base-dir <base_dir>

The base directory with the RefSeq HDF5 files.

--batch-size <batch_size>

The batch size.

Default

32

--dataloader-type <dataloader_type>
Default

DataloaderType.PLAIN

Options

PLAIN | WEIGHTED | STRATIFIED

--validation-seq-length <validation_seq_length>
Default

1000

--deform-lambda <deform_lambda>

The lambda for the deform transform.

--embedding-dim <embedding_dim>

The size of the embeddings for the nucleotides (N, A, G, C, T).

Default

8

--filters <filters>

The number of filters in each of the 1D convolution layers. These are concatenated together

Default

256

--cnn-layers <cnn_layers>

The number of 1D convolution layers.

Default

6

--kernel-size-maxpool <kernel_size_maxpool>

The size of the pooling before going to the LSTM.

Default

2

--lstm-dims <lstm_dims>

The size of the hidden layers in the LSTM in both directions.

Default

256

--final-layer-dims <final_layer_dims>

The size of a dense layer after the LSTM. If this is zero then this layer isn’t used.

Default

0

--dropout <dropout>

The amount of dropout to use. (not currently enabled)

Default

0.2

--final-bias, --no-final-bias

Whether or not to use bias in the final layer.

Default

True

--cnn-only, --no-cnn-only
Default

True

--kernel-size <kernel_size>

The size of the kernels for CNN only classifier.

Default

3

--cnn-dims-start <cnn_dims_start>

The size of the number of filters in the first CNN layer. If not set then it is derived from the MACC

--factor <factor>

The factor to multiply the number of filters in the CNN layers each time it is downscaled.

Default

2.0

--penultimate-dims <penultimate_dims>

The factor to multiply the number of filters in the CNN layers each time it is downscaled.

Default

1024

--include-length, --no-include-length
Default

False

--transformer-heads <transformer_heads>

The number of heads in the transformer.

Default

8

--transformer-layers <transformer_layers>

The number of layers in the transformer. If zero then no transformer is used.

Default

0

--macc <macc>

The approximate number of multiply or accumulate operations in the model. Used to set cnn_dims_start if not provided explicitly.

Default

10000000

show-batch

corgi-train show-batch [OPTIONS]

Options

--output-path <output_path>

A location to save the HTML which summarizes the batch.

--csv <csv>

The CSV which has the sequences to use.

--base-dir <base_dir>

The base directory with the RefSeq HDF5 files.

--batch-size <batch_size>

The batch size.

Default

32

--dataloader-type <dataloader_type>
Default

DataloaderType.PLAIN

Options

PLAIN | WEIGHTED | STRATIFIED

--validation-seq-length <validation_seq_length>
Default

1000

--deform-lambda <deform_lambda>

The lambda for the deform transform.

train

corgi-train train [OPTIONS]

Options

--distributed, --no-distributed

If the learner is distributed.

Default

False

--fp16, --no-fp16

Whether or not the floating-point precision of learner should be set to 16 bit.

Default

True

--output-dir <output_dir>

The location of the output directory.

Default

./outputs

--weight-decay <weight_decay>

The amount of weight decay. If None then it uses the default amount of weight decay in fastai.

--csv <csv>

The CSV which has the sequences to use.

--base-dir <base_dir>

The base directory with the RefSeq HDF5 files.

--batch-size <batch_size>

The batch size.

Default

32

--dataloader-type <dataloader_type>
Default

DataloaderType.PLAIN

Options

PLAIN | WEIGHTED | STRATIFIED

--validation-seq-length <validation_seq_length>
Default

1000

--deform-lambda <deform_lambda>

The lambda for the deform transform.

--embedding-dim <embedding_dim>

The size of the embeddings for the nucleotides (N, A, G, C, T).

Default

8

--filters <filters>

The number of filters in each of the 1D convolution layers. These are concatenated together

Default

256

--cnn-layers <cnn_layers>

The number of 1D convolution layers.

Default

6

--kernel-size-maxpool <kernel_size_maxpool>

The size of the pooling before going to the LSTM.

Default

2

--lstm-dims <lstm_dims>

The size of the hidden layers in the LSTM in both directions.

Default

256

--final-layer-dims <final_layer_dims>

The size of a dense layer after the LSTM. If this is zero then this layer isn’t used.

Default

0

--dropout <dropout>

The amount of dropout to use. (not currently enabled)

Default

0.2

--final-bias, --no-final-bias

Whether or not to use bias in the final layer.

Default

True

--cnn-only, --no-cnn-only
Default

True

--kernel-size <kernel_size>

The size of the kernels for CNN only classifier.

Default

3

--cnn-dims-start <cnn_dims_start>

The size of the number of filters in the first CNN layer. If not set then it is derived from the MACC

--factor <factor>

The factor to multiply the number of filters in the CNN layers each time it is downscaled.

Default

2.0

--penultimate-dims <penultimate_dims>

The factor to multiply the number of filters in the CNN layers each time it is downscaled.

Default

1024

--include-length, --no-include-length
Default

False

--transformer-heads <transformer_heads>

The number of heads in the transformer.

Default

8

--transformer-layers <transformer_layers>

The number of layers in the transformer. If zero then no transformer is used.

Default

0

--macc <macc>

The approximate number of multiply or accumulate operations in the model. Used to set cnn_dims_start if not provided explicitly.

Default

10000000

--epochs <epochs>

The number of epochs.

Default

20

--freeze-epochs <freeze_epochs>

The number of epochs to train when the learner is frozen and the last layer is trained by itself. Only if fine_tune is set on the app.

Default

3

--learning-rate <learning_rate>

The base learning rate (when fine tuning) or the max learning rate otherwise.

Default

0.0001

--project-name <project_name>

The name for this project for logging purposes.

--run-name <run_name>

The name for this particular run for logging purposes.

--run-id <run_id>

A unique ID for this particular run for logging purposes.

--notes <notes>

A longer description of the run for logging purposes.

--tag <tag>

A tag for logging purposes. Multiple tags can be added each introduced with –tag.

--wandb, --no-wandb

Whether or not to use ‘Weights and Biases’ for logging.

Default

False

--wandb-mode <wandb_mode>

The mode for ‘Weights and Biases’.

Default

online

--wandb-dir <wandb_dir>

The location for ‘Weights and Biases’ output.

--wandb-entity <wandb_entity>

An entity is a username or team name where you’re sending runs.

--wandb-group <wandb_group>

Specify a group to organize individual runs into a larger experiment.

--wandb-job-type <wandb_job_type>

Specify the type of run, which is useful when you’re grouping runs together into larger experiments using group.

--mlflow, --no-mlflow

Whether or not to use MLflow for logging.

Default

False

tune

corgi-train tune [OPTIONS]

Options

--runs <runs>

The number of runs to attempt to train the model.

Default

1

--engine <engine>

The optimizer to use to perform the hyperparameter tuning. Options: wandb, optuna, skopt.

Default

skopt

--id <id>

The ID of this hyperparameter tuning job. If using wandb, then this is the sweep id. If using optuna, then this is the storage. If using skopt, then this is the file to store the results.

Default

--name <name>

An informative name for this hyperparameter tuning job. If empty, then it creates a name from the project name.

Default

--method <method>

The sampling method to use to perform the hyperparameter tuning. By default it chooses the default method of the engine.

Default

--min-iter <min_iter>

The minimum number of iterations if using early termination. If left empty, then early termination is not used.

--seed <seed>

A seed for the random number generator.

--distributed, --no-distributed

If the learner is distributed.

Default

False

--fp16, --no-fp16

Whether or not the floating-point precision of learner should be set to 16 bit.

Default

True

--output-dir <output_dir>

The location of the output directory.

Default

./outputs

--weight-decay <weight_decay>

The amount of weight decay. If None then it uses the default amount of weight decay in fastai.

--csv <csv>

The CSV which has the sequences to use.

--base-dir <base_dir>

The base directory with the RefSeq HDF5 files.

--batch-size <batch_size>

The batch size.

Default

32

--dataloader-type <dataloader_type>
Default

DataloaderType.PLAIN

Options

PLAIN | WEIGHTED | STRATIFIED

--validation-seq-length <validation_seq_length>
Default

1000

--deform-lambda <deform_lambda>

The lambda for the deform transform.

--embedding-dim <embedding_dim>

The size of the embeddings for the nucleotides (N, A, G, C, T).

--filters <filters>

The number of filters in each of the 1D convolution layers. These are concatenated together

Default

256

--cnn-layers <cnn_layers>

The number of 1D convolution layers.

--kernel-size-maxpool <kernel_size_maxpool>

The size of the pooling before going to the LSTM.

Default

2

--lstm-dims <lstm_dims>

The size of the hidden layers in the LSTM in both directions.

Default

256

--final-layer-dims <final_layer_dims>

The size of a dense layer after the LSTM. If this is zero then this layer isn’t used.

Default

0

--dropout <dropout>

The amount of dropout to use. (not currently enabled)

--final-bias, --no-final-bias

Whether or not to use bias in the final layer.

--cnn-only, --no-cnn-only
Default

True

--kernel-size <kernel_size>

The size of the kernels for CNN only classifier.

--cnn-dims-start <cnn_dims_start>

The size of the number of filters in the first CNN layer. If not set then it is derived from the MACC

--factor <factor>

The factor to multiply the number of filters in the CNN layers each time it is downscaled.

--penultimate-dims <penultimate_dims>

The factor to multiply the number of filters in the CNN layers each time it is downscaled.

--include-length, --no-include-length
Default

False

--transformer-heads <transformer_heads>

The number of heads in the transformer.

Default

8

--transformer-layers <transformer_layers>

The number of layers in the transformer. If zero then no transformer is used.

Default

0

--macc <macc>

The approximate number of multiply or accumulate operations in the model. Used to set cnn_dims_start if not provided explicitly.

Default

10000000

--epochs <epochs>

The number of epochs.

Default

20

--freeze-epochs <freeze_epochs>

The number of epochs to train when the learner is frozen and the last layer is trained by itself. Only if fine_tune is set on the app.

Default

3

--learning-rate <learning_rate>

The base learning rate (when fine tuning) or the max learning rate otherwise.

Default

0.0001

--project-name <project_name>

The name for this project for logging purposes.

--run-name <run_name>

The name for this particular run for logging purposes.

--run-id <run_id>

A unique ID for this particular run for logging purposes.

--notes <notes>

A longer description of the run for logging purposes.

--tag <tag>

A tag for logging purposes. Multiple tags can be added each introduced with –tag.

--wandb, --no-wandb

Whether or not to use ‘Weights and Biases’ for logging.

Default

False

--wandb-mode <wandb_mode>

The mode for ‘Weights and Biases’.

Default

online

--wandb-dir <wandb_dir>

The location for ‘Weights and Biases’ output.

--wandb-entity <wandb_entity>

An entity is a username or team name where you’re sending runs.

--wandb-group <wandb_group>

Specify a group to organize individual runs into a larger experiment.

--wandb-job-type <wandb_job_type>

Specify the type of run, which is useful when you’re grouping runs together into larger experiments using group.

--mlflow, --no-mlflow

Whether or not to use MLflow for logging.

Default

False

validate

corgi-train validate [OPTIONS]

Options

--gpu, --no-gpu

Whether or not to use a GPU for processing if available.

Default

True

--pretrained <pretrained>

The location (URL or filepath) of a pretrained model.

--reload, --no-reload

Should the pretrained model be downloaded again if it is online and already present locally.

Default

False

--csv <csv>

The CSV which has the sequences to use.

--base-dir <base_dir>

The base directory with the RefSeq HDF5 files.

--batch-size <batch_size>

The batch size.

Default

32

--dataloader-type <dataloader_type>
Default

DataloaderType.PLAIN

Options

PLAIN | WEIGHTED | STRATIFIED

--validation-seq-length <validation_seq_length>
Default

1000

--deform-lambda <deform_lambda>

The lambda for the deform transform.