inaSpeechSegmenter Wrapper (v2.1)
About this version
- Submitter: keighrim
- Submission Time: 2025-11-14T01:22:11+00:00
- Prebuilt Container Image: ghcr.io/clamsproject/app-inaspeechsegmenter-wrapper:v2.1
-
Release Notes
dependency version updates, fixes compat issue with new MMIF
About this app (See raw metadata.json)
inaSpeechSegmenter is a CNN-based audio segmentation toolkit. The original software can be found at https://github.com/ina-foss/inaSpeechSegmenter .
- App ID: http://apps.clams.ai/inaspeechsegmenter-wrapper/v2.1
- App License: MIT
- Source Repository: https://github.com/clamsproject/app-inaspeechsegmenter-wrapper (source tree of the submitted version)
- Analyzer Version: 0.8.0
- Analyzer License: MIT
Inputs
(Note: “*” as a property value means that the property is required but can be any value.)
One of the following is required: [
-
http://mmif.clams.ai/vocabulary/AudioDocument/v1 (required) (of any properties)
-
http://mmif.clams.ai/vocabulary/VideoDocument/v1 (required) (of any properties)
]
Configurable Parameters
(Note: Multivalued means the parameter can have one or more values.)
-
minTFDuration: optional, defaults to0- Type: integer
- Multivalued: False
minimum duration of a TimeFrame in milliseconds
-
silenceRatio: optional, defaults to3- Type: integer
- Multivalued: False
percentage ratio (0-100) of audio energy to to determine silence, ratio to mean every of the input audio.
-
pretty: optional, defaults tofalse- Type: boolean
- Multivalued: False
- Choices:
false,true
The JSON body of the HTTP response will be re-formatted with 2-space indentation
-
runningTime: optional, defaults tofalse- Type: boolean
- Multivalued: False
- Choices:
false,true
The running time of the app will be recorded in the view metadata
-
hwFetch: optional, defaults tofalse- Type: boolean
- Multivalued: False
- Choices:
false,true
The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata
Outputs
(Note: “*” as a property value means that the property is required but can be any value.)
(Note: Not all output annotations are always generated.)
- http://mmif.clams.ai/vocabulary/TimeFrame/v6
- timeunit = “milliseconds”
- labelset = a list of [“silence”, “speech”, “noise”, “music”]
The INA semgmenter uses 5-way classification ([‘noEnergy’, ‘female’, ‘male’, ‘noise’, ‘music’]) and this wrapper remaps the labels to [‘silence’, ‘speech’, ‘noise’, ‘music’], by 1) renaming
noEnergytosilence2) collapsingfemaleandmaleintospeech(leaving additionalgenderproperty). Note that the time frame annotations do not exhaustively cover the input audio, but only the segments.