Command Line Interface Reference

hespi

HErbarium Specimen sheet PIpeline

Takes a herbarium specimen sheet image detects components such as the institutional label, swatch, etc. It then classifies whether the institutional label is printed, typed, handwritten or a combination. If then detects the fields of the institutional label and attempts to read them through OCR and HTR models.

hespi [OPTIONS] IMAGES...

Options

--output-dir <output_dir>

A directory to output the results.

Default:

hespi-output

--gpu, --no-gpu

Whether or not to use a GPU if available.

Default:

True

--fuzzy, --no-fuzzy

Whether or not to use fuzzy matching from teh reference database.

Default:

True

--fuzzy-cutoff <fuzzy_cutoff>

The threshold for the fuzzy matching score to use.

Default:

0.8

--htr, --no-htr

Whether or not to do handwritten text recognition using Microsoft’s TrOCR.

Default:

True

--llm <llm>

The Large Langauge Model to use. Currently OpenAI and Anthropic Claude models supported.

Default:

gpt-4o

--llm-api-key <llm_api_key>

The API key to use for the Large Language Model. Can be set as an environment variable using the standard variable names.

Default:

--llm-temperature <llm_temperature>

The temperature to use for the Large Language Model.

Default:

0.0

--trocr-size <trocr_size>

The size of the TrOCR model to use for handwritten text recognition.

Default:

large

Options:

small | base | large

--sheet-component-weights <sheet_component_weights>

The path to the sheet component model weights.

Default:

https://github.com/rbturnbull/hespi/releases/download/v0.4.0/sheet-component.pt.gz

--label-field-weights <label_field_weights>

The path to the Label-Field model weights.

Default:

https://github.com/rbturnbull/hespi/releases/download/v0.4.0/label-field.pt.gz

--institutional-label-classifier-weights <institutional_label_classifier_weights>

The path to the institutional label classifier weights.

Default:

https://github.com/rbturnbull/hespi/releases/download/v0.4.2/institutional-label-classifier.pkl.gz

--force-download, --no-force-download

Whether or not to force download model weights even if a weights file is present.

Default:

False

--tmp-dir <tmp_dir>
--batch-size <batch_size>

The maximum batch size from run the sheet component model.

Default:

4

--sheet-component-res <sheet_component_res>

The resolution of images to use for the Sheet-Component model.

Default:

1280

--label-field-res <label_field_res>

The resolution of images to use for the Label-Field model.

Default:

1280

--install-completion <install_completion>

Install completion for the specified shell.

Options:

bash | zsh | fish | powershell | pwsh

--show-completion <show_completion>

Show completion for the specified shell, to copy it or customize the installation.

Options:

bash | zsh | fish | powershell | pwsh

Arguments

IMAGES

Required argument(s) A list of images to process. The images can also be URLs. STRING

Environment variables

HESPI_SHEET_COMPONENT_WEIGHTS

Provide a default for --sheet-component-weights

HESPI_LABEL_FIELD_WEIGHTS

Provide a default for --label-field-weights

HESPI_INSTITUTIONAL_LABEL_CLASSIFIER_WEIGHTS

hespi-tools

hespi-tools [OPTIONS] COMMAND [ARGS]...

Options

--install-completion <install_completion>

Install completion for the specified shell.

Options:

bash | zsh | fish | powershell | pwsh

--show-completion <show_completion>

Show completion for the specified shell, to copy it or customize the installation.

Options:

bash | zsh | fish | powershell | pwsh

bibtex

Shows the references in BibTeX format.

hespi-tools bibtex [OPTIONS]

institutional-label-classifier-location

Shows the location of the default Institutional Label Classifier model weights.

hespi-tools institutional-label-classifier-location [OPTIONS]

label-field-location

Shows the location of the default Label-Field model weights.

hespi-tools label-field-location [OPTIONS]

sheet-component-location

Shows the location of the default Sheet-Component model.

hespi-tools sheet-component-location [OPTIONS]

trocr

Run the TrOCR model on an image and print the recognized text.

hespi-tools trocr [OPTIONS] IMAGE

Options

--size <size>

The size of the TrOCR model to use for handwritten text recognition.

Default:

large

Options:

small | base | large

Arguments

IMAGE

Required argument <click.types.Path object at 0x7f5599d2fd30>