Audio
Audio processing operations. Separate, enhance, and diarize audio files.
Extract audio stems (vocals, music, effects) from an audio file.
Separation Types
dialogue-me
speech, me
Speech vs music+effects
dialogue-music-effect
speech, music, effect
3-stem separation
speech-nonlingual
speech, nonlingual
Lingual vs non-lingual
Workflow
Upload audio file via
POST /v1/uploadCall this endpoint with
file_idPoll
GET /v1/jobs/{job_id}for statusDownload results from signed URLs when completed
Credits
Cost: 1 credit per second of audio
Billing: Credits reserved on job creation, confirmed on completion
How to Authenticate
- Get your token from Timbr Dashboard
- Click the Authorize button above
- Enter only the token (e.g.,
TBR_abc123def456) - Click Authorize then Close
Important
- โ
Correct:
TBR_8b3364b5a772328a - โ Wrong:
Bearer TBR_8b3364b5a772328a
The 'Bearer ' prefix is added automatically by Swagger UI.
Request to separate audio into stems
The separation is performed by RunPod GPU workers using ASS v2 models. The file must first be uploaded to GCS (Bronze layer) before separation.
File ID of the uploaded audio (from /v1/audio/upload)
Type of separation to perform
dialogue-meAvailable separation types matching RunPod config.yaml
Each type corresponds to a different model and produces different stems:
- dialogue-me: 2-stem (dialogue, me)
- dialogue-music-effect: 3-stem (dialogue, music, effect)
- speech-nonlingual: 2-stem (speech, nonlingual)
Output sample rate in Hz
48000Output bit depth
16Output audio format
wavSupported output audio formats
Job created successfully
Unauthorized - Invalid or missing Timbr token
Insufficient credits
File not found
Validation error - Invalid request body
Improve audio quality with noise reduction, EQ adjustment, and normalization.
Enhancement Options
denoise
boolean
Reduce background noise (default: true)
normalize
boolean
Normalize audio levels (default: true)
eq_preset
string
EQ preset: balanced, vocal, bass_boost
noise_reduction
float
Noise reduction strength: 0.0-1.0
Workflow
Upload audio file via
POST /v1/uploadCall this endpoint with
file_idand enhancement optionsPoll
GET /v1/jobs/{job_id}for statusDownload enhanced audio from signed URL when completed
Credits
Cost: 1 credit per second of audio
Billing: Credits reserved on job creation, confirmed on completion
How to Authenticate
- Get your token from Timbr Dashboard
- Click the Authorize button above
- Enter only the token (e.g.,
TBR_abc123def456) - Click Authorize then Close
Important
- โ
Correct:
TBR_8b3364b5a772328a - โ Wrong:
Bearer TBR_8b3364b5a772328a
The 'Bearer ' prefix is added automatically by Swagger UI.
Request to enhance audio quality
File ID to enhance
Apply noise reduction
trueNoise reduction strength (0.0-1.0)
0.5EQ preset to apply
balancedAvailable EQ presets for audio enhancement
Apply dynamic range compression
falseNormalize audio levels
trueJob created successfully
Unauthorized - Invalid or missing Timbr token
Insufficient credits
File not found
Validation error - Invalid request body
Perform speaker diarization with optional transcription.
Features
Speaker Identification: Detect and label different speakers
Transcription: Optional speech-to-text with Whisper
Language Support: Multiple languages supported
Parameters
file_id
string
File ID from upload
num_speakers
integer
Expected number of speakers (optional)
language
string
Language code (e.g., en, ko)
transcribe_model
string
Transcription model: whisper
Output Formats
timbr: Standard SRT format with speaker names
sentence_split: Sentence-segmented output
Workflow
Upload audio file via
POST /v1/uploadCall this endpoint with
file_idand optionsPoll
GET /v1/jobs/{job_id}for statusDownload results from signed URLs when completed
Credits
Cost: 1 credit per second of audio
Billing: Credits reserved on job creation, confirmed on completion
How to Authenticate
- Get your token from Timbr Dashboard
- Click the Authorize button above
- Enter only the token (e.g.,
TBR_abc123def456) - Click Authorize then Close
Important
- โ
Correct:
TBR_8b3364b5a772328a - โ Wrong:
Bearer TBR_8b3364b5a772328a
The 'Bearer ' prefix is added automatically by Swagger UI.
Request for audio diarization (speaker identification + transcription)
The diarization pipeline:
- VAD: Voice Activity Detection to find speech segments
- STT: Speech-to-Text transcription
- Speaker Embedding: Extract speaker features
- Clustering: Group segments by speaker
- SRT Generation: Create subtitles with speaker labels
File ID of the uploaded audio (from /v1/audio/upload)
Expected number of speakers (auto-detect if None)
Language code (e.g., 'en', 'ko', 'ja') or 'auto' for detection
autoTranscription model to use
whisperTranscription model options
Voice Activity Detection type
ten-vadVoice Activity Detection type
VAD sensitivity threshold
0.5Minimum speech duration in seconds
0.1Minimum silence duration in seconds
0.1Detect speaker gender (KBO baseball use case)
falseOptional pre-existing SRT file for alignment
Job created successfully
Unauthorized - Invalid or missing Timbr token
Insufficient credits
File not found
Validation error - Invalid request body
Last updated