Quickstart

Tools for Managing the Manuscripts. Derived from Peter Montoro’s thesis regarding of Chrysostom’s Homilies on Romans.

Installation

To install, use the following command:

pip install msstools

To install the latest development version, use:

pip install -U git+https://github.com/rbturnbull/msstools.git

Command Line Usage

See available commands with:

msstools --help

split-images

Description: Split image files into left and right parts (typically recto and verso pages), with optional right-to-left direction, overlap, and flexible naming.

Arguments:

  • prefix: Path to the output prefix for split images.

  • images: One or more image files to be split.

  • --rtl: (Optional) Split images in right-to-left direction.

  • --overlap: (Optional) Overlap percentage between split images (default: 10).

  • --skip: (Optional) Number of images to skip before splitting.

  • --recto: (Optional) Recto folio anchor in the form FILENAME=FOLIO. Can be used multiple times.

  • --force: (Optional) Overwrite existing output files if they already exist.

Example (sequential numbering):

msstools split-images output/page img001.jpg img002.jpg --rtl --overlap 20

This will create the following files:

output/page-0.jpg
output/page-1.jpg
output/page-2.jpg
output/page-3.jpg

Skipped images are copied into the same output sequence without a folio reference. For example, --skip 3 starts with page-0.jpg, page-1.jpg, and page-2.jpg.

Example (folio references):

msstools split-images output/page img001.jpg img002.jpg img003.jpg --recto img002.jpg=49

This will create the following files:

output/page-0.jpg
output/page-1.jpg
output/page-2.jpg
output/page-3-49r.jpg
output/page-4-49v.jpg
output/page-5-50r.jpg

Everything before the first --recto anchor has no folio reference. After an anchor, folio references continue automatically until the next anchor.

remove-accents

Description: Remove accents from a UTF-8 text file and save the cleaned version to a new file.

Arguments:

  • input_file: Path to the input text file.

  • output_file: Path to the output text file with accents removed.

Example:

msstools remove-accents input.txt output.txt

number-sentences

Description: Number <S> sentence tags within <P> paragraph blocks in an XML or structured text file.

Arguments:

  • input_file: Path to the input text file.

  • output_file: Path to the output file with numbered sentence tags.

Example:

msstools number-sentences H1.txt H1_numbered.txt

count-greek-chars

Description: Count the number of Greek characters in a set of homily text files and generate a plot showing the results. Optionally display or save the plot.

Arguments:

  • filename_prefix: Prefix used to construct the filenames of the homily files.

  • --start-homily: (Optional) First homily number to compare (default: 0).

  • --end-homily: (Optional) Last homily number to compare (default: 32).

  • --warning-stdev: (Optional) Standard deviation threshold for highlighting outliers (default: 1.8).

  • --output: (Optional) Path to save the plot as an image.

  • --show: (Optional) Show the plot in a window (default: False unless there is no output).

Example:

msstools count-greek-chars Jerusalem_PB_Saba_20_copy/Saba20_H --output Saba20-counts.png

compare-counts

Description: Compare the Greek character counts between two sets of homily transcriptions and optionally generate a plot showing where the comparison text has significantly more characters than the base.

Arguments:

  • base_prefix: Prefix for the base homily files.

  • comparison_prefix: Prefix for the comparison homily files.

  • --output-svg: (Optional) Path to save the resulting plot as an SVG.

  • --start-homily: (Optional) First homily number to compare (default: 0).

  • --end-homily: (Optional) Last homily number to compare (default: 32).

  • --threshold: (Optional) Character difference threshold that triggers a warning (default: 50).

Example:

msstools compare-counts Migne_H Saba20_H --threshold 40

csv-to-tei

Description: Convert a CSV file of variant readings into TEI XML format. Optionally limit readings and add dates from a separate file.

Arguments:

  • input_csv: Path to the input CSV file containing readings.

  • output_xml: Path to the TEI XML output file.

  • --dates: (Optional) Path to a file containing date information.

  • --max-readings: (Optional) Maximum number of readings to process at each variation unit (default: 0 = no limit).

Example:

msstools csv-to-tei readings.csv output-tei.xml --dates dates.csv --max-readings 10