Tesseract OCR Wrapper (v1.0)
About this version
- Submitter: keighrim
- Submission Time: 2023-07-26T00:03:43+00:00
- Prebuilt Container Image: ghcr.io/clamsproject/app-tesseractocr-wrapper:v1.0
-
Release Notes
(no notes provided by the developer)
About this app (See raw metadata.json)
This tool applies Tesseract OCR to a video or image and generates text boxes and OCR results.
- App ID: http://apps.clams.ai/tesseractocr-wrapper/v1.0
- App License: MIT
- Source Repository: https://github.com/clamsproject/app-tesseractocr-wrapper (source tree of the submitted version)
- Analyzer Version: tesseract4
- Analyzer License: apache
Inputs
(Note: “*” as a property value means that the property is required but can be any value.)
(any properties)
- http://mmif.clams.ai/vocabulary/BoundingBox/v1 (required)
- boxType = “text”
- http://mmif.clams.ai/vocabulary/TimeFrame/v1
(any properties)
Configurable Parameters
(Note: Multivalued means the parameter can have one or more values.)
-
frameType
: required- Type: string
- Multivalued: True
Use this to specify TimeFrame to use for filtering “text”-typed BoundingBox annotations. Can be “slate”, “chyron”, “speech”, etc.. If not set, the app won’t use TimeFrames for filtering.
-
threshold
: optional, defaults to0.9
- Type: number
- Multivalued: False
Use this value between 0 and 1 to filter out low-confidence text boxes.
-
psm
: optional, defaults to0
- Type: integer
- Multivalued: False
- Choices:
0
,1
,2
,3
,4
,5
,6
,7
,8
,9
,10
,11
,12
,13
Tesseract Page Segmentation Modes. See https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#page-segmentation-method
-
pretty
: optional, defaults tofalse
- Type: boolean
- Multivalued: False
- Choices:
false
,true
The JSON body of the HTTP response will be re-formatted with 2-space indentation
Outputs
(Note: “*” as a property value means that the property is required but can be any value.)
(Note: Not all output annotations are always generated.)
(any properties)
(any properties)