Quickstart
Tools for Managing the Manuscripts. Derived from Peter Montoro’s thesis regarding of Chrysostom’s Homilies on Romans.
Installation
To install, use the following command:
pip install msstools
To install the latest development version, use:
pip install -U git+https://github.com/rbturnbull/msstools.git
Command Line Usage
See available commands with:
msstools --help
split-images
Description: Split image files into left and right parts (typically recto and verso pages), with optional right-to-left direction, overlap, and flexible naming.
Arguments:
prefix: Path to the output prefix for split images.images: One or more image files to be split.--rtl: (Optional) Split images in right-to-left direction.--overlap: (Optional) Overlap percentage between split images (default: 10).--skip: (Optional) Number of images to skip before splitting.--recto: (Optional) Recto folio anchor in the formFILENAME=FOLIO. Can be used multiple times.--force: (Optional) Overwrite existing output files if they already exist.
Example (sequential numbering):
msstools split-images output/page img001.jpg img002.jpg --rtl --overlap 20
This will create the following files:
output/page-0.jpg
output/page-1.jpg
output/page-2.jpg
output/page-3.jpg
Skipped images are copied into the same output sequence without a folio
reference. For example, --skip 3 starts with page-0.jpg,
page-1.jpg, and page-2.jpg.
Example (folio references):
msstools split-images output/page img001.jpg img002.jpg img003.jpg --recto img002.jpg=49
This will create the following files:
output/page-0.jpg
output/page-1.jpg
output/page-2.jpg
output/page-3-49r.jpg
output/page-4-49v.jpg
output/page-5-50r.jpg
Everything before the first --recto anchor has no folio reference. After an
anchor, folio references continue automatically until the next anchor.
remove-accents
Description: Remove accents from a UTF-8 text file and save the cleaned version to a new file.
Arguments:
input_file: Path to the input text file.output_file: Path to the output text file with accents removed.
Example:
msstools remove-accents input.txt output.txt
number-sentences
Description:
Number <S> sentence tags within <P> paragraph blocks in an XML or structured text file.
Arguments:
input_file: Path to the input text file.output_file: Path to the output file with numbered sentence tags.
Example:
msstools number-sentences H1.txt H1_numbered.txt
count-greek-chars
Description: Count the number of Greek characters in a set of homily text files and generate a plot showing the results. Optionally display or save the plot.
Arguments:
filename_prefix: Prefix used to construct the filenames of the homily files.--start-homily: (Optional) First homily number to compare (default: 0).--end-homily: (Optional) Last homily number to compare (default: 32).--warning-stdev: (Optional) Standard deviation threshold for highlighting outliers (default: 1.8).--output: (Optional) Path to save the plot as an image.--show: (Optional) Show the plot in a window (default: False unless there is no output).
Example:
msstools count-greek-chars Jerusalem_PB_Saba_20_copy/Saba20_H --output Saba20-counts.png
compare-counts
Description: Compare the Greek character counts between two sets of homily transcriptions and optionally generate a plot showing where the comparison text has significantly more characters than the base.
Arguments:
base_prefix: Prefix for the base homily files.comparison_prefix: Prefix for the comparison homily files.--output-svg: (Optional) Path to save the resulting plot as an SVG.--start-homily: (Optional) First homily number to compare (default: 0).--end-homily: (Optional) Last homily number to compare (default: 32).--threshold: (Optional) Character difference threshold that triggers a warning (default: 50).
Example:
msstools compare-counts Migne_H Saba20_H --threshold 40
csv-to-tei
Description: Convert a CSV file of variant readings into TEI XML format. Optionally limit readings and add dates from a separate file.
Arguments:
input_csv: Path to the input CSV file containing readings.output_xml: Path to the TEI XML output file.--dates: (Optional) Path to a file containing date information.--max-readings: (Optional) Maximum number of readings to process at each variation unit (default: 0 = no limit).
Example:
msstools csv-to-tei readings.csv output-tei.xml --dates dates.csv --max-readings 10