SmolVLM2 Captioner (v0.1)
About this version
- Submitter: kelleyl
- Submission Time: 2025-11-20T15:41:04+00:00
- Prebuilt Container Image: ghcr.io/clamsproject/app-smolvlm2-captioner:v0.1
-
Release Notes
(no notes provided by the developer)
About this app (See raw metadata.json)
Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.
- App ID: http://apps.clams.ai/smolvlm2-captioner/v0.1
- App License: Apache 2.0
- Source Repository: https://github.com/clamsproject/app-smolvlm2-captioner (source tree of the submitted version)
Inputs
(Note: “*” as a property value means that the property is required but can be any value.)
-
http://mmif.clams.ai/vocabulary/VideoDocument/v1 (required) (of any properties)
-
http://mmif.clams.ai/vocabulary/ImageDocument/v1 (required) (of any properties)
-
http://mmif.clams.ai/vocabulary/TimeFrame/v6 (required) (of any properties)
Configurable Parameters
(Note: Multivalued means the parameter can have one or more values.)
-
frameInterval: optional, defaults to30- Type: integer
- Multivalued: False
The interval at which to extract frames from the video if there are no timeframe annotations. Default is every 30 frames.
-
defaultPrompt: optional, defaults toDescribe what is shown in this video frame. Analyze the purpose of this frame in the context of a news video. Transcribe any text present.- Type: string
- Multivalued: False
default prompt to use for timeframes not specified in the promptMap. If set to
-, timeframes not specified in the promptMap will be skipped. -
promptMap: optional, defaults to[]- Type: map
- Multivalued: True
mapping of labels of the input timeframe annotations to new prompts. Must be formatted as “IN_LABEL:PROMPT” (with a colon). To pass multiple mappings, use this parameter multiple times. By default, any timeframe labels not mapped to a prompt will be used with the defaultprompt. In order to skip timeframes with a particular label, pass
-as the prompt value.in order to skip all timeframes not specified in the promptMap, set the defaultPromptparameter to- -
defaultSystemPrompt: optional, defaults to""- Type: string
- Multivalued: False
default system prompt to use for all timeframes. System prompts are passed to the model using the messages format with role=”system”, providing context or instructions that guide the model’s behavior. The processor will format this properly using its chat template.
-
systemPromptMap: optional, defaults to[]- Type: map
- Multivalued: True
mapping of labels of the input timeframe annotations to system prompts. Must be formatted as “IN_LABEL:SYSTEM_PROMPT” (with a colon). To pass multiple mappings, use this parameter multiple times. System prompts are passed to the model using the messages format with role=”system”, providing context or instructions that guide the model’s behavior.
-
config: optional, defaults toconfig/default.yaml- Type: string
- Multivalued: False
Name of the config file to use.
-
num_beams: optional, defaults to1- Type: integer
- Multivalued: False
Number of beams for beam search during text generation. Default is 1. Higher values may improve quality but increase generation time.
-
batchSize: optional, defaults to12- Type: integer
- Multivalued: False
Number of images to process in each batch. Default is 12. Higher values may improve throughput but require more memory.
-
pretty: optional, defaults tofalse- Type: boolean
- Multivalued: False
- Choices:
false,true
The JSON body of the HTTP response will be re-formatted with 2-space indentation
-
runningTime: optional, defaults tofalse- Type: boolean
- Multivalued: False
- Choices:
false,true
The running time of the app will be recorded in the view metadata
-
hwFetch: optional, defaults tofalse- Type: boolean
- Multivalued: False
- Choices:
false,true
The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata
Outputs
(Note: “*” as a property value means that the property is required but can be any value.)
(Note: Not all output annotations are always generated.)
-
http://mmif.clams.ai/vocabulary/Alignment/v1 (of any properties)
-
http://mmif.clams.ai/vocabulary/TextDocument/v1 (of any properties)