Command Line Interface Reference
corgi
corgi [OPTIONS]
Options
- --gpu, --no-gpu
Whether or not to use a GPU for processing if available.
- Default
True
- --pretrained <pretrained>
The location (URL or filepath) of a pretrained model.
- --reload, --no-reload
Should the pretrained model be downloaded again if it is online and already present locally.
- Default
False
- --file <file>
A fasta file with sequences to be classified.
- --max-seqs <max_seqs>
- --batch-size <batch_size>
- Default
1
- --max-length <max_length>
- Default
5000
- --min-length <min_length>
- Default
128
- --output-dir <output_dir>
A path to output the results as a CSV.
- --csv <csv>
A path to output the results as a CSV. If not given then a default name is chosen inside the output directory.
- --save-filtered, --no-save-filtered
Whether or not to save the filtered sequences.
- Default
True
- --threshold <threshold>
The threshold to use for filtering. If not given, then only the most likely category used for filtering.
corgi-train
corgi-train [OPTIONS] COMMAND [ARGS]...
Options
- -v, --version
Prints the current version.
- --install-completion <install_completion>
Install completion for the specified shell.
- Options
bash | zsh | fish | powershell | pwsh
- --show-completion <show_completion>
Show completion for the specified shell, to copy it or customize the installation.
- Options
bash | zsh | fish | powershell | pwsh
bibliography
corgi-train bibliography [OPTIONS]
bibtex
corgi-train bibtex [OPTIONS]
infer
corgi-train infer [OPTIONS]
Options
- --gpu, --no-gpu
Whether or not to use a GPU for processing if available.
- Default
True
- --pretrained <pretrained>
The location (URL or filepath) of a pretrained model.
- --reload, --no-reload
Should the pretrained model be downloaded again if it is online and already present locally.
- Default
False
- --file <file>
A fasta file with sequences to be classified.
- --max-seqs <max_seqs>
- --batch-size <batch_size>
- Default
1
- --max-length <max_length>
- Default
5000
- --min-length <min_length>
- Default
128
- --output-dir <output_dir>
A path to output the results as a CSV.
- --csv <csv>
A path to output the results as a CSV. If not given then a default name is chosen inside the output directory.
- --save-filtered, --no-save-filtered
Whether or not to save the filtered sequences.
- Default
True
- --threshold <threshold>
The threshold to use for filtering. If not given, then only the most likely category used for filtering.
lr-finder
corgi-train lr-finder [OPTIONS]
Options
- --plot-filename <plot_filename>
- --start-lr <start_lr>
- Default
1e-07
- --end-lr <end_lr>
- Default
10
- --iterations <iterations>
- Default
100
- --fp16, --no-fp16
Whether or not the floating-point precision of learner should be set to 16 bit.
- Default
True
- --output-dir <output_dir>
The location of the output directory.
- Default
./outputs
- --weight-decay <weight_decay>
The amount of weight decay. If None then it uses the default amount of weight decay in fastai.
- --csv <csv>
The CSV which has the sequences to use.
- --base-dir <base_dir>
The base directory with the RefSeq HDF5 files.
- --batch-size <batch_size>
The batch size.
- Default
32
- --dataloader-type <dataloader_type>
- Default
DataloaderType.PLAIN
- Options
PLAIN | WEIGHTED | STRATIFIED
- --validation-seq-length <validation_seq_length>
- Default
1000
- --deform-lambda <deform_lambda>
The lambda for the deform transform.
- --embedding-dim <embedding_dim>
The size of the embeddings for the nucleotides (N, A, G, C, T).
- Default
8
- --filters <filters>
The number of filters in each of the 1D convolution layers. These are concatenated together
- Default
256
- --cnn-layers <cnn_layers>
The number of 1D convolution layers.
- Default
6
- --kernel-size-maxpool <kernel_size_maxpool>
The size of the pooling before going to the LSTM.
- Default
2
- --lstm-dims <lstm_dims>
The size of the hidden layers in the LSTM in both directions.
- Default
256
- --final-layer-dims <final_layer_dims>
The size of a dense layer after the LSTM. If this is zero then this layer isn’t used.
- Default
0
- --dropout <dropout>
The amount of dropout to use. (not currently enabled)
- Default
0.2
- --final-bias, --no-final-bias
Whether or not to use bias in the final layer.
- Default
True
- --cnn-only, --no-cnn-only
- Default
True
- --kernel-size <kernel_size>
The size of the kernels for CNN only classifier.
- Default
3
- --cnn-dims-start <cnn_dims_start>
The size of the number of filters in the first CNN layer. If not set then it is derived from the MACC
- --factor <factor>
The factor to multiply the number of filters in the CNN layers each time it is downscaled.
- Default
2.0
- --penultimate-dims <penultimate_dims>
The factor to multiply the number of filters in the CNN layers each time it is downscaled.
- Default
1024
- --include-length, --no-include-length
- Default
False
- --transformer-heads <transformer_heads>
The number of heads in the transformer.
- Default
8
- --transformer-layers <transformer_layers>
The number of layers in the transformer. If zero then no transformer is used.
- Default
0
- --macc <macc>
The approximate number of multiply or accumulate operations in the model. Used to set cnn_dims_start if not provided explicitly.
- Default
10000000
show-batch
corgi-train show-batch [OPTIONS]
Options
- --output-path <output_path>
A location to save the HTML which summarizes the batch.
- --csv <csv>
The CSV which has the sequences to use.
- --base-dir <base_dir>
The base directory with the RefSeq HDF5 files.
- --batch-size <batch_size>
The batch size.
- Default
32
- --dataloader-type <dataloader_type>
- Default
DataloaderType.PLAIN
- Options
PLAIN | WEIGHTED | STRATIFIED
- --validation-seq-length <validation_seq_length>
- Default
1000
- --deform-lambda <deform_lambda>
The lambda for the deform transform.
train
corgi-train train [OPTIONS]
Options
- --distributed, --no-distributed
If the learner is distributed.
- Default
False
- --fp16, --no-fp16
Whether or not the floating-point precision of learner should be set to 16 bit.
- Default
True
- --output-dir <output_dir>
The location of the output directory.
- Default
./outputs
- --weight-decay <weight_decay>
The amount of weight decay. If None then it uses the default amount of weight decay in fastai.
- --csv <csv>
The CSV which has the sequences to use.
- --base-dir <base_dir>
The base directory with the RefSeq HDF5 files.
- --batch-size <batch_size>
The batch size.
- Default
32
- --dataloader-type <dataloader_type>
- Default
DataloaderType.PLAIN
- Options
PLAIN | WEIGHTED | STRATIFIED
- --validation-seq-length <validation_seq_length>
- Default
1000
- --deform-lambda <deform_lambda>
The lambda for the deform transform.
- --embedding-dim <embedding_dim>
The size of the embeddings for the nucleotides (N, A, G, C, T).
- Default
8
- --filters <filters>
The number of filters in each of the 1D convolution layers. These are concatenated together
- Default
256
- --cnn-layers <cnn_layers>
The number of 1D convolution layers.
- Default
6
- --kernel-size-maxpool <kernel_size_maxpool>
The size of the pooling before going to the LSTM.
- Default
2
- --lstm-dims <lstm_dims>
The size of the hidden layers in the LSTM in both directions.
- Default
256
- --final-layer-dims <final_layer_dims>
The size of a dense layer after the LSTM. If this is zero then this layer isn’t used.
- Default
0
- --dropout <dropout>
The amount of dropout to use. (not currently enabled)
- Default
0.2
- --final-bias, --no-final-bias
Whether or not to use bias in the final layer.
- Default
True
- --cnn-only, --no-cnn-only
- Default
True
- --kernel-size <kernel_size>
The size of the kernels for CNN only classifier.
- Default
3
- --cnn-dims-start <cnn_dims_start>
The size of the number of filters in the first CNN layer. If not set then it is derived from the MACC
- --factor <factor>
The factor to multiply the number of filters in the CNN layers each time it is downscaled.
- Default
2.0
- --penultimate-dims <penultimate_dims>
The factor to multiply the number of filters in the CNN layers each time it is downscaled.
- Default
1024
- --include-length, --no-include-length
- Default
False
- --transformer-heads <transformer_heads>
The number of heads in the transformer.
- Default
8
- --transformer-layers <transformer_layers>
The number of layers in the transformer. If zero then no transformer is used.
- Default
0
- --macc <macc>
The approximate number of multiply or accumulate operations in the model. Used to set cnn_dims_start if not provided explicitly.
- Default
10000000
- --epochs <epochs>
The number of epochs.
- Default
20
- --freeze-epochs <freeze_epochs>
The number of epochs to train when the learner is frozen and the last layer is trained by itself. Only if fine_tune is set on the app.
- Default
3
- --learning-rate <learning_rate>
The base learning rate (when fine tuning) or the max learning rate otherwise.
- Default
0.0001
- --project-name <project_name>
The name for this project for logging purposes.
- --run-name <run_name>
The name for this particular run for logging purposes.
- --run-id <run_id>
A unique ID for this particular run for logging purposes.
- --notes <notes>
A longer description of the run for logging purposes.
- --tag <tag>
A tag for logging purposes. Multiple tags can be added each introduced with –tag.
- --wandb, --no-wandb
Whether or not to use ‘Weights and Biases’ for logging.
- Default
False
- --wandb-mode <wandb_mode>
The mode for ‘Weights and Biases’.
- Default
online
- --wandb-dir <wandb_dir>
The location for ‘Weights and Biases’ output.
- --wandb-entity <wandb_entity>
An entity is a username or team name where you’re sending runs.
- --wandb-group <wandb_group>
Specify a group to organize individual runs into a larger experiment.
- --wandb-job-type <wandb_job_type>
Specify the type of run, which is useful when you’re grouping runs together into larger experiments using group.
- --mlflow, --no-mlflow
Whether or not to use MLflow for logging.
- Default
False
tune
corgi-train tune [OPTIONS]
Options
- --runs <runs>
The number of runs to attempt to train the model.
- Default
1
- --engine <engine>
The optimizer to use to perform the hyperparameter tuning. Options: wandb, optuna, skopt.
- Default
skopt
- --id <id>
The ID of this hyperparameter tuning job. If using wandb, then this is the sweep id. If using optuna, then this is the storage. If using skopt, then this is the file to store the results.
- Default
- --name <name>
An informative name for this hyperparameter tuning job. If empty, then it creates a name from the project name.
- Default
- --method <method>
The sampling method to use to perform the hyperparameter tuning. By default it chooses the default method of the engine.
- Default
- --min-iter <min_iter>
The minimum number of iterations if using early termination. If left empty, then early termination is not used.
- --seed <seed>
A seed for the random number generator.
- --distributed, --no-distributed
If the learner is distributed.
- Default
False
- --fp16, --no-fp16
Whether or not the floating-point precision of learner should be set to 16 bit.
- Default
True
- --output-dir <output_dir>
The location of the output directory.
- Default
./outputs
- --weight-decay <weight_decay>
The amount of weight decay. If None then it uses the default amount of weight decay in fastai.
- --csv <csv>
The CSV which has the sequences to use.
- --base-dir <base_dir>
The base directory with the RefSeq HDF5 files.
- --batch-size <batch_size>
The batch size.
- Default
32
- --dataloader-type <dataloader_type>
- Default
DataloaderType.PLAIN
- Options
PLAIN | WEIGHTED | STRATIFIED
- --validation-seq-length <validation_seq_length>
- Default
1000
- --deform-lambda <deform_lambda>
The lambda for the deform transform.
- --embedding-dim <embedding_dim>
The size of the embeddings for the nucleotides (N, A, G, C, T).
- --filters <filters>
The number of filters in each of the 1D convolution layers. These are concatenated together
- Default
256
- --cnn-layers <cnn_layers>
The number of 1D convolution layers.
- --kernel-size-maxpool <kernel_size_maxpool>
The size of the pooling before going to the LSTM.
- Default
2
- --lstm-dims <lstm_dims>
The size of the hidden layers in the LSTM in both directions.
- Default
256
- --final-layer-dims <final_layer_dims>
The size of a dense layer after the LSTM. If this is zero then this layer isn’t used.
- Default
0
- --dropout <dropout>
The amount of dropout to use. (not currently enabled)
- --final-bias, --no-final-bias
Whether or not to use bias in the final layer.
- --cnn-only, --no-cnn-only
- Default
True
- --kernel-size <kernel_size>
The size of the kernels for CNN only classifier.
- --cnn-dims-start <cnn_dims_start>
The size of the number of filters in the first CNN layer. If not set then it is derived from the MACC
- --factor <factor>
The factor to multiply the number of filters in the CNN layers each time it is downscaled.
- --penultimate-dims <penultimate_dims>
The factor to multiply the number of filters in the CNN layers each time it is downscaled.
- --include-length, --no-include-length
- Default
False
- --transformer-heads <transformer_heads>
The number of heads in the transformer.
- Default
8
- --transformer-layers <transformer_layers>
The number of layers in the transformer. If zero then no transformer is used.
- Default
0
- --macc <macc>
The approximate number of multiply or accumulate operations in the model. Used to set cnn_dims_start if not provided explicitly.
- Default
10000000
- --epochs <epochs>
The number of epochs.
- Default
20
- --freeze-epochs <freeze_epochs>
The number of epochs to train when the learner is frozen and the last layer is trained by itself. Only if fine_tune is set on the app.
- Default
3
- --learning-rate <learning_rate>
The base learning rate (when fine tuning) or the max learning rate otherwise.
- Default
0.0001
- --project-name <project_name>
The name for this project for logging purposes.
- --run-name <run_name>
The name for this particular run for logging purposes.
- --run-id <run_id>
A unique ID for this particular run for logging purposes.
- --notes <notes>
A longer description of the run for logging purposes.
- --tag <tag>
A tag for logging purposes. Multiple tags can be added each introduced with –tag.
- --wandb, --no-wandb
Whether or not to use ‘Weights and Biases’ for logging.
- Default
False
- --wandb-mode <wandb_mode>
The mode for ‘Weights and Biases’.
- Default
online
- --wandb-dir <wandb_dir>
The location for ‘Weights and Biases’ output.
- --wandb-entity <wandb_entity>
An entity is a username or team name where you’re sending runs.
- --wandb-group <wandb_group>
Specify a group to organize individual runs into a larger experiment.
- --wandb-job-type <wandb_job_type>
Specify the type of run, which is useful when you’re grouping runs together into larger experiments using group.
- --mlflow, --no-mlflow
Whether or not to use MLflow for logging.
- Default
False
validate
corgi-train validate [OPTIONS]
Options
- --gpu, --no-gpu
Whether or not to use a GPU for processing if available.
- Default
True
- --pretrained <pretrained>
The location (URL or filepath) of a pretrained model.
- --reload, --no-reload
Should the pretrained model be downloaded again if it is online and already present locally.
- Default
False
- --csv <csv>
The CSV which has the sequences to use.
- --base-dir <base_dir>
The base directory with the RefSeq HDF5 files.
- --batch-size <batch_size>
The batch size.
- Default
32
- --dataloader-type <dataloader_type>
- Default
DataloaderType.PLAIN
- Options
PLAIN | WEIGHTED | STRATIFIED
- --validation-seq-length <validation_seq_length>
- Default
1000
- --deform-lambda <deform_lambda>
The lambda for the deform transform.