CLAMS NFA Wrapper (v1.0)
About this version
- Submitter: keighrim
- Submission Time: 2026-06-22T19:56:15+00:00
- Prebuilt Container Image: ghcr.io/clamsproject/app-nfa-wrapper:v1.0
-
Release Notes
First stable release
About this app (See raw metadata.json)
**Wraps the NVIDIA NeMo Forced Aligner tool to temporally align transcribed text with its audio source. **
- App ID: http://apps.clams.ai/nfa-wrapper/v1.0
- App License: Apache 2.0
- Source Repository: https://github.com/clamsproject/app-nfa-wrapper (source tree of the submitted version)
- Analyzer Version: 2.7.3
- Analyzer License: Apache 2.0
Inputs
(Note: “*” as a property value means that the property is required but can be any value.)
One of the following is required: [
-
http://clams.ai/vocabulary/type/AudioDocument/v2 (required) (of any properties)
-
http://clams.ai/vocabulary/type/VideoDocument/v2 (required) (of any properties)
]
-
http://clams.ai/vocabulary/type/TextDocument/v2 (required) (of any properties)
Text content transcribed from audio input with no existing annotations.
Configurable Parameters
(Note: Multivalued means the parameter can have one or more values.)
-
model: optional, defaults tofc_hybrid- Type: string
- Multivalued: False
- Choices:
fc_hybrid,parakeet,conformer,fc_ctc
NeMo ASR model to use. Choices: fc_hybrid, parakeet, conformer, fc_ctc. By default, the fc_hybrid model will be used.
-
pretty: optional, defaults tofalse- Type: boolean
- Multivalued: False
- Choices:
false,true
The JSON body of the HTTP response will be re-formatted with 2-space indentation
-
runningTime: optional, defaults totrue- Type: boolean
- Multivalued: False
- Choices:
false,true
The running time of the app will be recorded in the view metadata
-
hwFetch: optional, defaults tofalse- Type: boolean
- Multivalued: False
- Choices:
false,true
The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata
-
tfSamplingMode: optional, defaults torepresentatives- Type: string
- Multivalued: False
- Choices:
representatives,single,all
Sampling mode for TimeFrame annotations. Has no effect when the app does not process TimeFrames. “representatives” uses all representative timepoints if present, otherwise skips the TimeFrame. “single” uses the middle representative if present, otherwise extracts an image from the midpoint of the start/end interval (midpoint is calculated by floor division of the sum of start and end). “all” uses all target timepoints if present, otherwise extracts all images from the time interval.
Outputs
(Note: “*” as a property value means that the property is required but can be any value.)
(Note: Not all output annotations are always generated.)
-
http://clams.ai/vocabulary/type/Token/v1 (of any properties)
Token from original text split on whitespace.
textproperty stores the string value of the token.startandendproperties indicate position of token in entire text.documentproperty identifies source text document. - http://clams.ai/vocabulary/type/TimeFrame/v6
- frameType = “speech”
- timeUnit = “milliseconds”
TimeFrame annotation representing the source audio segment corresponding to a given transcribed token, with
startandendtimes given in milliseconds. -
http://clams.ai/vocabulary/type/Alignment/v1 (of any properties)
Alignment between
TokenandTimeFrameannotations.