inaSpeechSegmenter Wrapper (v2.0)
About this version
- Submitter: keighrim
- Submission Time: 2024-05-07T03:45:14+00:00
- Prebuilt Container Image: ghcr.io/clamsproject/app-inaspeechsegmenter-wrapper:v2.0
-
Release Notes
Major update
TimeFrame
properties are updated to match new CLAMS vocab specificationnoEnergy
label from INA segmenter is stored assilence
in MMIF for clarity- added parameter
silenceRatio
to configure silence detection threshold - renamed
minDuration
parameter tominTFDuration
for clarity - bugfixes in cli.py
About this app (See raw metadata.json)
inaSpeechSegmenter is a CNN-based audio segmentation toolkit. The original software can be found at https://github.com/ina-foss/inaSpeechSegmenter .
- App ID: http://apps.clams.ai/inaspeechsegmenter-wrapper/v2.0
- App License: MIT
- Source Repository: https://github.com/clamsproject/app-inaspeechsegmenter-wrapper (source tree of the submitted version)
- Analyzer Version: 0.7.6
- Analyzer License: MIT
Inputs
(Note: “*” as a property value means that the property is required but can be any value.)
One of the following is required: [
-
http://mmif.clams.ai/vocabulary/AudioDocument/v1 (required) (of any properties)
-
http://mmif.clams.ai/vocabulary/VideoDocument/v1 (required) (of any properties)
]
Configurable Parameters
(Note: Multivalued means the parameter can have one or more values.)
-
minTFDuration
: optional, defaults to0
- Type: integer
- Multivalued: False
minimum duration of a TimeFrame in milliseconds
-
silenceRatio
: optional, defaults to3
- Type: integer
- Multivalued: False
percentage ratio (0-100) of audio energy to to determine silence, ratio to mean every of the input audio.
-
pretty
: optional, defaults tofalse
- Type: boolean
- Multivalued: False
- Choices:
false
,true
The JSON body of the HTTP response will be re-formatted with 2-space indentation
Outputs
(Note: “*” as a property value means that the property is required but can be any value.)
(Note: Not all output annotations are always generated.)
- http://mmif.clams.ai/vocabulary/TimeFrame/v5
- timeunit = “milliseconds”
- labelset = a list of [“silence”, “speech”, “noise”, “music”]
The INA semgmenter uses 5-way classification ([‘noEnergy’, ‘female’, ‘male’, ‘noise’, ‘music’]) and this wrapper remaps the labels to [‘silence’, ‘speech’, ‘noise’, ‘music’], by 1) renaming
noEnergy
tosilence
2) collapsingfemale
andmale
intospeech
(leaving additionalgender
property). Note that the time frame annotations do not exhaustively cover the input audio, but only the segments.