Whisper Wrapper (v14)

About this version

  • Submitter: keighrim
  • Submission Time: 2025-11-05T20:18:24+00:00
  • Prebuilt Container Image: ghcr.io/clamsproject/app-whisper-wrapper:v14
  • Release Notes

    Major update includes …

    • modelSize parameter is now model
    • modelLang parameter is now language
      (above changes to match openai’s CLI argument names)
    • fixed app crashing when word-level timestamping outputs hallucinations

About this app (See raw metadata.json)

A CLAMS wrapper for Whisper-based ASR software originally developed by OpenAI.

Inputs

(Note: “*” as a property value means that the property is required but can be any value.)

One of the following is required: [

]

Configurable Parameters

(Note: Multivalued means the parameter can have one or more values.)

  • model: optional, defaults to turbo

    • Type: string
    • Multivalued: False

    (from openai-whisper CLI) name of the Whisper model to use

  • language: optional, defaults to ""

    • Type: string
    • Multivalued: False

    (from openai-whisper CLI) language spoken in the audio, specify None to perform language detection

  • task: optional, defaults to transcribe

    • Type: string
    • Multivalued: False
    • Choices: transcribe, translate

    (from openai-whisper CLI) whether to perform X->X speech recognition (‘transcribe’) or X->English translation (‘translate’)

  • initialPrompt: optional, defaults to ""

    • Type: string
    • Multivalued: False

    (from openai-whisper CLI) optional text to provide as a prompt for the first window.

  • conditionOnPreviousText: optional, defaults to True

    • Type: string
    • Multivalued: False

    (from openai-whisper CLI) if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop

  • noSpeechThreshold: optional, defaults to 0.6

    • Type: number
    • Multivalued: False
    (from openai-whisper CLI) if the probability of the < nospeech > token is higher than this value AND the decoding has failed due to logprob_threshold, consider the segment as silence
  • pretty: optional, defaults to false

    • Type: boolean
    • Multivalued: False
    • Choices: false, true

    The JSON body of the HTTP response will be re-formatted with 2-space indentation

  • runningTime: optional, defaults to false

    • Type: boolean
    • Multivalued: False
    • Choices: false, true

    The running time of the app will be recorded in the view metadata

  • hwFetch: optional, defaults to false

    • Type: boolean
    • Multivalued: False
    • Choices: false, true

    The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata

Outputs

(Note: “*” as a property value means that the property is required but can be any value.)

(Note: Not all output annotations are always generated.)