TesseractOCR Wrapper (v2.0)

About this version

Submitter: keighrim
Submission Time: 2025-07-02T02:52:55+00:00
Prebuilt Container Image: ghcr.io/clamsproject/app-tesseractocr-wrapper:v2.0
Release Notes

Updated to Tesseract5, and output MMIF format matches other TR apps

This tool applies Tesseract OCR to a video or image and generates text boxes and OCR results. Currenly only support English language.

(Note: “*” as a property value means that the property is required but can be any value.)

http://mmif.clams.ai/vocabulary/VideoDocument/v1 (required) (of any properties)
http://mmif.clams.ai/vocabulary/TimeFrame/v5 (required)
- representatives = “?”
The Time frame annotation that represents the video segment to be processed. When representatives property is present, the app will process videos still frames at the underlying time point annotations that are referred to by the representatives property. Otherwise, the app will process the middle frame of the video segment.

(Note: Multivalued means the parameter can have one or more values.)

tfLabel: optional, defaults to []
- Type: string
- Multivalued: True
The label of the TimeFrame annotation to be processed. By default ([]), all TimeFrame annotations will be processed, regardless of their label property values.
pretty: optional, defaults to false
- Type: boolean
- Multivalued: False
- Choices: false, true
The JSON body of the HTTP response will be re-formatted with 2-space indentation
runningTime: optional, defaults to false
- Type: boolean
- Multivalued: False
- Choices: false, true
The running time of the app will be recorded in the view metadata
hwFetch: optional, defaults to false
- Type: boolean
- Multivalued: False
- Choices: false, true
The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata

(Note: “*” as a property value means that the property is required but can be any value.)

(Note: Not all output annotations are always generated.)

http://mmif.clams.ai/vocabulary/TextDocument/v1
- @lang = “en”
Fully serialized text content of the recognized text in the input images. Serialization isdone by concatenating text values of Paragraph annotations with two newline characters.
http://vocab.lappsgrid.org/Token
- text = “*”
- word = “*”
Translation of the recognized tesseract “words” in the input images. token properties store the string values of the recognized text. The duplication is for keepingbackward compatibility and consistency with Paragraph and Sentence annotations.
http://vocab.lappsgrid.org/Sentence
- text = “*”
Translation of the recognized tesseract “lines” in the input images. sentence property from LAPPS vocab stores the string value of space-joined words.
http://vocab.lappsgrid.org/Paragraph
- text = “*”
Translation of the recognized tesseract “blocks” in the input images. paragraph property from LAPPS vocab stores the string value of newline-joined sentences.
http://mmif.clams.ai/vocabulary/Alignment/v1 (of any properties)

Alignments between 1) TimePoint <-> TextDocument, 2) TimePoint <-> Token/Sentence/Paragraph, 3) BoundingBox <-> Token/Sentence/Paragraph
http://mmif.clams.ai/vocabulary/BoundingBox/v4
- label = “text”
Bounding boxes of the detected text regions in the input images. No corresponding box for the entire image (TextDocument) region