Whisper Wrapper (v5)

About this version

  • Submitter: keighrim
  • Submission Time: 2024-02-09T12:28:02+00:00
  • Prebuilt Container Image: ghcr.io/clamsproject/app-whisper-wrapper:v5
  • Release Notes

    This version include addition of modelLang param and many other changes

    • (BIG CHANGE) timeunit is now millisecond (integer)
    • modelLang parameter is added for selecting language to instruct Whisper to use, Whisper will run in language detection mode if this parameter is not given.
    • When modelLang is set to en, load english only models instead of multilingual ones for speed and performance
    • skips empty segments from whisper when generating output MMIF
    • updated to clams-python 1.1.1

About this app (See raw metadata.json)

A CLAMS wrapper for Whisper-based ASR software originally developed by OpenAI.

Inputs

(Note: “*” as a property value means that the property is required but can be any value.)

One of the following is required: [

]

Configurable Parameters

(Note: Multivalued means the parameter can have one or more values.)

Name Description Type Multivalued Default Choices
modelSize The size of the model to use. Can be “tiny”, “base”, “small”, “medium”, or “large”. string N tiny tiny, base, small, medium, large
modelLang Language of the model to use, accepts two- or three-letter ISO 639 language codes, however Whisper only supports a subset of languages. If the language is not supported, error will be raised.For the full list of supported languages, see https://github.com/openai/whisper/blob/20231117/whisper/tokenizer.py . In addition to the langauge code, two-letter region codes can be added to the language code, e.g. “en-US” for US English. Note that the region code is only for compatibility and recording purpose, and Whisper neither detects regional dialects, nor use the given one for transcription. When the langauge code is not given, Whisper will run in langauge detection mode, and will use first few seconds of the audio to detect the language. string N    
pretty The JSON body of the HTTP response will be re-formatted with 2-space indentation boolean N false false, true

Outputs

(Note: “*” as a property value means that the property is required but can be any value.)

(Note: Not all output annotations are always generated.)