Distil Whisper Wrapper (v1.2)
About this version
- Submitter: BenLambright
- Submission Time: 2024-08-08T15:48:34+00:00
- Prebuilt Container Image: ghcr.io/clamsproject/app-distil-whisper-wrapper:v1.2
-
Release Notes
reverting back to HF pipeline using chunking transcription
About this app (See raw metadata.json)
The wrapper of Distil-Whisper, avaliable models: distil-large-v3, distil-large-v2, distil-medium.en, distil-small.en. The default model is distil-small.en.
- App ID: http://apps.clams.ai/distil-whisper-wrapper/v1.2
- App License: Apache 2.0
- Source Repository: https://github.com/clamsproject/app-distil-whisper-wrapper (source tree of the submitted version)
- Analyzer Version: 1.0
- Analyzer License: MIT
Inputs
(Note: “*” as a property value means that the property is required but can be any value.)
One of the following is required: [
-
http://mmif.clams.ai/vocabulary/AudioDocument/v1 (required) (of any properties)
-
http://mmif.clams.ai/vocabulary/VideoDocument/v1 (required) (of any properties)
]
Configurable Parameters
(Note: Multivalued means the parameter can have one or more values.)
-
modelSize
: optional, defaults todistil-small.en
- Type: string
- Multivalued: False
- Choices:
distil-large-v3
,distil-large-v2
,distil-medium.en
,distil-small.en
,small
,s
,medium
,m
,large-v2
,l2
,large-v3
,l3
The size of the model to use. There are four size of model to use distil-large-v3, distil-large-v2, distil-medium.en, distil-small.en. You can also enter the abbreviation of the model as parameter. ‘small’ and ‘s’ for distil-small.en; ‘medium’ and ‘m’ for distil-medium.en; ‘large-v2’ and ‘l2’ for distil-large-v2; ‘large-v3’ and ‘l3’ for distil-large-v3. The default model is distil-medium.en.)
-
pretty
: optional, defaults tofalse
- Type: boolean
- Multivalued: False
- Choices:
false
,true
The JSON body of the HTTP response will be re-formatted with 2-space indentation
-
runningTime
: optional, defaults tofalse
- Type: boolean
- Multivalued: False
- Choices:
false
,true
The running time of the app will be recorded in the view metadata
-
hwFetch
: optional, defaults tofalse
- Type: boolean
- Multivalued: False
- Choices:
false
,true
The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata
Outputs
(Note: “*” as a property value means that the property is required but can be any value.)
(Note: Not all output annotations are always generated.)
- http://mmif.clams.ai/vocabulary/TextDocument/v1
- @lang = “en”
Fully serialized text content of the recognized text in the input audio/video.
- http://mmif.clams.ai/vocabulary/TimeFrame/v5
- timeUnit = “milliseconds”
-
http://mmif.clams.ai/vocabulary/Alignment/v1 (of any properties)
Alignments between 1)
TimeFrame
<->SENTENCE
, 2)audio/video document
<->TextDocument
-
http://vocab.lappsgrid.org/Sentence (of any properties)
The smallest recognized unit of distil-whisper. Normally a complete sentence.