Whisper Wrapper (v14)
About this version
- Submitter: keighrim
- Submission Time: 2025-11-05T20:18:24+00:00
- Prebuilt Container Image: ghcr.io/clamsproject/app-whisper-wrapper:v14
-
Release Notes
Major update includes …
modelSizeparameter is nowmodelmodelLangparameter is nowlanguage
(above changes to match openai’s CLI argument names)- fixed app crashing when word-level timestamping outputs hallucinations
About this app (See raw metadata.json)
A CLAMS wrapper for Whisper-based ASR software originally developed by OpenAI.
- App ID: http://apps.clams.ai/whisper-wrapper/v14
- App License: Apache 2.0
- Source Repository: https://github.com/clamsproject/app-whisper-wrapper (source tree of the submitted version)
- Analyzer Version: 20240930
- Analyzer License: MIT
Inputs
(Note: “*” as a property value means that the property is required but can be any value.)
One of the following is required: [
-
http://mmif.clams.ai/vocabulary/AudioDocument/v1 (required) (of any properties)
-
http://mmif.clams.ai/vocabulary/VideoDocument/v1 (required) (of any properties)
]
Configurable Parameters
(Note: Multivalued means the parameter can have one or more values.)
-
model: optional, defaults toturbo- Type: string
- Multivalued: False
(from openai-whisper CLI) name of the Whisper model to use
-
language: optional, defaults to""- Type: string
- Multivalued: False
(from openai-whisper CLI) language spoken in the audio, specify None to perform language detection
-
task: optional, defaults totranscribe- Type: string
- Multivalued: False
- Choices:
transcribe,translate
(from openai-whisper CLI) whether to perform X->X speech recognition (‘transcribe’) or X->English translation (‘translate’)
-
initialPrompt: optional, defaults to""- Type: string
- Multivalued: False
(from openai-whisper CLI) optional text to provide as a prompt for the first window.
-
conditionOnPreviousText: optional, defaults toTrue- Type: string
- Multivalued: False
(from openai-whisper CLI) if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop
-
noSpeechThreshold: optional, defaults to0.6- Type: number
- Multivalued: False
(from openai-whisper CLI) if the probability of the < nospeech > token is higher than this value AND the decoding has failed due to logprob_threshold, consider the segment as silence -
pretty: optional, defaults tofalse- Type: boolean
- Multivalued: False
- Choices:
false,true
The JSON body of the HTTP response will be re-formatted with 2-space indentation
-
runningTime: optional, defaults tofalse- Type: boolean
- Multivalued: False
- Choices:
false,true
The running time of the app will be recorded in the view metadata
-
hwFetch: optional, defaults tofalse- Type: boolean
- Multivalued: False
- Choices:
false,true
The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata
Outputs
(Note: “*” as a property value means that the property is required but can be any value.)
(Note: Not all output annotations are always generated.)
-
http://mmif.clams.ai/vocabulary/TextDocument/v1 (of any properties)
- http://mmif.clams.ai/vocabulary/TimeFrame/v6
- timeUnit = “milliseconds”
-
http://mmif.clams.ai/vocabulary/Alignment/v1 (of any properties)
-
http://vocab.lappsgrid.org/Token (of any properties)
- http://vocab.lappsgrid.org/Sentence (of any properties)