Quickstart

Installation

Install using pip:

pip install rdgai

Or install directly from the repository:

pip install git+https://github.com/rbturnbull/rdgai.git

Usage

See all the options with the command:

rdgai --help

Preparation

You first need to prepare a TEI XML file with a critical apparatus.

Define categories in the TEI XML header under <interpGrp type="transcriptional">. For example:

<interpGrp type="transcriptional">
    <interp xml:id="Addition" corresp="#Omission">An addition of a word or words.</interp>
    <interp xml:id="Omission" corresp="#Addition">An omission of a word or words.</interp>
    <interp xml:id="Substituion">A substitution of a word or words.</interp>
</interpGrp>

Then use the graphical user interface (GUI) to classify transitions via buttons or keyboard navigation in a browser-based GUI.

rdgai gui apparatus.xml output.xml

Or export classifications to Excel for collaborative editing:

rdgai export apparatus.xml reading-pairs.xlsx

Edit in Excel and re-import with:

rdgai import-classifications apparatus.xml reading-pairs.xlsx output.xml

More information about preparing the TEI XML file can be found in the Preparation documentation.

Validation

The accuracy of Rdgai is dependent on the type of text, the categories and their definitions and the LLM used. The accuracy needs to be validated on each document used with Rdgai. For this purpose, Rdgai comes with a validation tool which assigns a proportion of the manual annotations to be allowed for use in the prompt and the remainder are used as ground truth annotations for evaluating the results from Rdgai.

To run the validation tool, use the following command:

rdgai validate apparatus.xml output.xml --report output.html --proportion 0.5 --llm claude-3-5-sonnet-20241022 --examples 20

The HTML report will show the accuracy, precision, recall, F1 scores, confusion matrix, and detailed classifications (correct/incorrect). The LLM then gives suggestions for clarifying the definitions of the categories and alerts the user to any inconsistencies in the ground truth annotations.

More information about validating the results of Rdgai for your TEI XML file can be found in the Validation documentation.

Classification

After validating, you can classify the unclassified reading changes using the following command:

rdgai classify apparatus.xml output.xml --llm claude-3-5-sonnet-20241022 --examples 20

View the output TEI XML in the Rdgai GUI with:

rdgai gui output.xml --inplace

More information about making automated classifications using Rdgai can be found in the Classification documentation.