Spoken Language Identification (v0.2)

About this version

Chunk-level language ID over audio based on OpenAI Whisper

(Note: “*” as a property value means that the property is required but can be any value.)

One of the following is required: [

]

(Note: Multivalued means the parameter can have one or more values.)

model: optional, defaults to tiny
- Type: string
- Multivalued: False
- Choices: tiny, base, small, medium, large, turbo
Whisper model size
chunk: optional, defaults to 30
- Type: number
- Multivalued: False
chunk/window length in seconds
top: optional, defaults to 3
- Type: integer
- Multivalued: False
top-k language scores
batchSize: optional, defaults to 1
- Type: integer
- Multivalued: False
number of windows processed in a batch
pretty: optional, defaults to false
- Type: boolean
- Multivalued: False
- Choices: false, true
The JSON body of the HTTP response will be re-formatted with 2-space indentation
runningTime: optional, defaults to false
- Type: boolean
- Multivalued: False
- Choices: false, true
The running time of the app will be recorded in the view metadata
hwFetch: optional, defaults to false
- Type: boolean
- Multivalued: False
- Choices: false, true
The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata

(Note: “*” as a property value means that the property is required but can be any value.)

(Note: Not all output annotations are always generated.)

http://mmif.clams.ai/vocabulary/TimeFrame/v6
- timeUnit = “seconds”
- labalSet = “https://raw.githubusercontent.com/openai/whisper/refs/tags/v20250625/whisper/tokenizer.py”