CLAMS docTR Wrapper (v1.2)
About this version
- Submitter: keighrim
- Submission Time: 2025-05-23T01:46:09+00:00
- Prebuilt Container Image: ghcr.io/clamsproject/app-doctr-wrapper:v1.2
-
Release Notes
Minor update to support latest SDK
- updated to the latext SDK, and hence added
cli.py
entry point - in various newly-generated annotation objects, all references to other annotation ID are now “long” form (
view_id:ann_id
) even when referring annotations within the same view - now only processes
TimeFrame
withlabel
property (effectively ignores time frames from, for example, whisper-wrapper) - made model caching path inside the container consistent with other CLAMS apps (
/cache/doctr
)
- updated to the latext SDK, and hence added
About this app (See raw metadata.json)
CLAMS app wraps the docTR, End-to-End OCR model. The model can detect text regions in the input image and recognize text in the regions (via parseq OCR model, only English is support at the moment). The text-localized regions are organized hierarchically by the model into “pages” > “blocks” > “lines” > “words”, and this CLAMS app translates them into TextDocument
, Paragraphs
, Sentence
, and Token
annotations to represent recognized text contents. See descriptions for I/O types below for details on how annotations are aligned to each other.
- App ID: http://apps.clams.ai/doctr-wrapper/v1.2
- App License: Apache 2.0
- Source Repository: https://github.com/clamsproject/app-doctr-wrapper (source tree of the submitted version)
- Analyzer Version: 0.8.1
- Analyzer License: Apache 2.0
Inputs
(Note: “*” as a property value means that the property is required but can be any value.)
-
http://mmif.clams.ai/vocabulary/VideoDocument/v1 (required) (of any properties)
-
http://mmif.clams.ai/vocabulary/TimeFrame/v5 (required)
- representatives = “?”
- label = “*”
The labeled TimeFrame annotation that represents the video segment to be processed. When
representatives
property is present, the app will process videos still frames at the underlying time point annotations that are referred to by therepresentatives
property. Otherwise, the app will process the middle frame of the video segment. Generic TimeFrames with nolabel
property will not be processed.
Configurable Parameters
(Note: Multivalued means the parameter can have one or more values.)
-
tfLabel
: optional, defaults to[]
- Type: string
- Multivalued: True
The label of the TimeFrame annotation to be processed. By default (
[]
), all TimeFrame annotations will be processed, regardless of theirlabel
property values. -
pretty
: optional, defaults tofalse
- Type: boolean
- Multivalued: False
- Choices:
false
,true
The JSON body of the HTTP response will be re-formatted with 2-space indentation
-
runningTime
: optional, defaults tofalse
- Type: boolean
- Multivalued: False
- Choices:
false
,true
The running time of the app will be recorded in the view metadata
-
hwFetch
: optional, defaults tofalse
- Type: boolean
- Multivalued: False
- Choices:
false
,true
The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata
Outputs
(Note: “*” as a property value means that the property is required but can be any value.)
(Note: Not all output annotations are always generated.)
- http://mmif.clams.ai/vocabulary/TextDocument/v1
- @lang = “en”
Fully serialized text content of the recognized text in the input images. Serialization isdone by concatenating
text
values ofParagraph
annotations with two newline characters. - http://vocab.lappsgrid.org/Token
- text = “*”
- word = “*”
Translation of the recognized docTR “words” in the input images.
text
andword
properties store the string values of the recognized text. The duplication is for keepingbackward compatibility and consistency withParagraph
andSentence
annotations. - http://vocab.lappsgrid.org/Sentence
- text = “*”
Translation of the recognized docTR “lines” in the input images.
text
property stores the string value of space-joined words. - http://vocab.lappsgrid.org/Paragraph
- text = “*”
Translation of the recognized docTR “blocks” in the input images.
text
property stores the string value of newline-joined sentences. -
http://mmif.clams.ai/vocabulary/Alignment/v1 (of any properties)
Alignments between 1)
TimePoint
<->TextDocument
, 2)TimePoint
<->Token
/Sentence
/Paragraph
, 3)BoundingBox
<->Token
/Sentence
/Paragraph
- http://mmif.clams.ai/vocabulary/BoundingBox/v4
- label = “text”
Bounding boxes of the detected text regions in the input images. No corresponding box for the entire image (
TextDocument
) region