LLaVA Captioner (v1.2)

About this version

About this app (See raw metadata.json)

Applies LLaVA v1.6 Mistral-7B to video frames for image captioning.

Inputs

(Note: “*” as a property value means that the property is required but can be any value.)

Configurable Parameters

(Note: Multivalued means the parameter can have one or more values.)

  • frameInterval: optional, defaults to 30

    • Type: integer
    • Multivalued: False

    The interval at which to extract frames from the video if there are no timeframe annotations. Default is every 30 frames.

  • defaultPrompt: optional, defaults to Describe what is shown in this video frame. Analyze the purpose of this frame in the context of a news video. Transcribe any text present.

    • Type: string
    • Multivalued: False

    default prompt to use for timeframes not specified in the promptMap. If set to -, timeframes not specified in the promptMap will be skipped.

  • promptMap: optional, defaults to []

    • Type: map
    • Multivalued: True

    mapping of labels of the input timeframe annotations to new prompts. Must be formatted as “IN_LABEL:PROMPT” (with a colon). To pass multiple mappings, use this parameter multiple times. By default, any timeframe labels not mapped to a prompt will be used with the defaultprompt. In order to skip timeframes with a particular label, pass - as the prompt value.in order to skip all timeframes not specified in the promptMap, set the defaultPromptparameter to -

  • config: optional, defaults to config/default.yaml

    • Type: string
    • Multivalued: False

    Name of the config file to use.

  • pretty: optional, defaults to false

    • Type: boolean
    • Multivalued: False
    • Choices: false, true

    The JSON body of the HTTP response will be re-formatted with 2-space indentation

Outputs

(Note: “*” as a property value means that the property is required but can be any value.)

(Note: Not all output annotations are always generated.)