CLAMS docTR Wrapper (v1.0)

About this version

About this app (See raw metadata.json)

CLAMS app wraps the docTR, End-to-End OCR model, available at https://pypi.org/project/python-doctr . The model is capable of detecting text regions in the input image and recognizing text in the regions. The text-localized regions are organized hierarchically by the model into “pages” > “blocks” > “lines” > “words”, and this CLAMS app translates them into TextDocument, Paragraphs, Sentence, and Token annotations to represent recognized text contents, then aligns them to BoundingBox annotations that represent the detected geometries. This hierarchical structure is also represented in the TextDocument annotation output as two newlines (\n\n) between “paragraphs”, one newline (\n) between the “lines”, and one space (“ “) between the “words”. For the text recognition, the model is internally configured to use the “parseq” recognition model, and only works with English text at the moment.

Inputs

(Note: “*” as a property value means that the property is required but can be any value.)

  • http://mmif.clams.ai/vocabulary/VideoDocument/v1 (required) (of any properties)

  • http://mmif.clams.ai/vocabulary/TimeFrame/v5 (required)

    • representatives = “?”

    The Time frame annotation that represents the video segment to be processed. When representatives property is present, the app will process videos still frames at the underlying time point annotations that are referred to by the representatives property. Otherwise, the app will process the middle frame of the video segment.

Configurable Parameters

(Note: Multivalued means the parameter can have one or more values.)

  • tfLabel: optional, defaults to []

    • Type: string
    • Multivalued: True

    The label of the TimeFrame annotation to be processed. By default ([]), all TimeFrame annotations will be processed, regardless of their label property values.

  • pretty: optional, defaults to false

    • Type: boolean
    • Multivalued: False
    • Choices: false, true

    The JSON body of the HTTP response will be re-formatted with 2-space indentation

Outputs

(Note: “*” as a property value means that the property is required but can be any value.)

(Note: Not all output annotations are always generated.)

  • http://mmif.clams.ai/vocabulary/TextDocument/v1
    • @lang = “en”

    Fully serialized text content of the recognized text in the input images. Serialization isdone by concatenating text values of Paragraph annotations with two newline characters.

  • http://vocab.lappsgrid.org/Token
    • text = “*”
    • word = “*”

    Translation of the recognized docTR “words” in the input images. text and word properties store the string values of the recognized text. The duplication is for keepingbackward compatibility and consistency with Paragraph and Sentence annotations.

  • http://vocab.lappsgrid.org/Sentence
    • text = “*”

    Translation of the recognized docTR “lines” in the input images. text property stores the string value of space-joined words.

  • http://vocab.lappsgrid.org/Paragraph
    • text = “*”

    Translation of the recognized docTR “blocks” in the input images. text property stores the string value of newline-joined sentences.

  • http://mmif.clams.ai/vocabulary/Alignment/v1 (of any properties)

    Alignments between 1) TimePoint <-> TextDocument, 2) TimePoint <-> Token/Sentence/Paragraph, 3) BoundingBox <-> Token/Sentence/Paragraph

  • http://mmif.clams.ai/vocabulary/BoundingBox/v4
    • label = “text”

    Bounding boxes of the detected text regions in the input images. No corresponding box for the entire image (TextDocument) region