Audio

Audio processing operations. Separate, enhance, and diarize audio files.

Separate audio into stems

post

Extract audio stems (vocals, music, effects) from an audio file.

Separation Types

Type

Output Stems

Description

dialogue-me

speech, me

Speech vs music+effects

dialogue-music-effect

speech, music, effect

3-stem separation

speech-nonlingual

speech, nonlingual

Lingual vs non-lingual

Workflow

Upload audio file via POST /v1/upload
Call this endpoint with file_id
Poll GET /v1/jobs/{job_id} for status
Download results from signed URLs when completed

Credits

Cost: 1 credit per second of audio
Billing: Credits reserved on job creation, confirmed on completion

Authorizations

AuthorizationstringRequired

How to Authenticate

Get your token from Timbr Dashboard
Click the Authorize button above
Enter only the token (e.g., TBR_abc123def456)
Click Authorize then Close

Important

✅ Correct: TBR_8b3364b5a772328a
❌ Wrong: Bearer TBR_8b3364b5a772328a

The 'Bearer ' prefix is added automatically by Swagger UI.

Body

Request to separate audio into stems

The separation is performed by RunPod GPU workers using ASS v2 models. The file must first be uploaded to GCS (Bronze layer) before separation.

file_idstring · min: 1Required

File ID of the uploaded audio (from /v1/audio/upload)

separation_typeall ofOptional

Type of separation to perform

Default: dialogue-me

string · enumOptional

Available separation types matching RunPod config.yaml

Each type corresponds to a different model and produces different stems:

dialogue-me: 2-stem (dialogue, me)
dialogue-music-effect: 3-stem (dialogue, music, effect)
speech-nonlingual: 2-stem (speech, nonlingual)

Possible values:

sample_rateinteger · min: 8000 · max: 192000Optional

Output sample rate in Hz

Default: 48000

bit_depthinteger · min: 8 · max: 32Optional

Output bit depth

Default: 16

output_formatall ofOptional

Output audio format

Default: wav

string · enumOptional

Supported output audio formats

Possible values:

Responses

200

Job created successfully

application/json

401

Unauthorized - Invalid or missing Timbr token

402

Insufficient credits

application/json

404

File not found

application/json

422

Validation error - Invalid request body

post

/v1/audio/separate

POST /v1/audio/separate HTTP/1.1
Host: 
Authorization: Bearer YOUR_SECRET_TOKEN
Content-Type: application/json
Accept: */*
Content-Length: 114

{
  "bit_depth": 16,
  "file_id": "file_abc123",
  "output_format": "wav",
  "sample_rate": 48000,
  "separation_type": "dialogue-me"
}

{
  "job_id": "sep_a1b2c3d4e5f6",
  "status": "pending",
  "estimated_time": 60,
  "cost": 30,
  "created_at": "2026-01-05T10:00:00Z"
}

Enhance audio quality

post

Improve audio quality with noise reduction, EQ adjustment, and normalization.

Enhancement Options

Option

Type

Description

denoise

boolean

Reduce background noise (default: true)

normalize

boolean

Normalize audio levels (default: true)

eq_preset

string

EQ preset: balanced, vocal, bass_boost

noise_reduction

float

Noise reduction strength: 0.0-1.0

Workflow

Upload audio file via POST /v1/upload
Call this endpoint with file_id and enhancement options
Poll GET /v1/jobs/{job_id} for status
Download enhanced audio from signed URL when completed

Credits

Cost: 1 credit per second of audio
Billing: Credits reserved on job creation, confirmed on completion

Authorizations

AuthorizationstringRequired

How to Authenticate

Get your token from Timbr Dashboard
Click the Authorize button above
Enter only the token (e.g., TBR_abc123def456)
Click Authorize then Close

Important

✅ Correct: TBR_8b3364b5a772328a
❌ Wrong: Bearer TBR_8b3364b5a772328a

The 'Bearer ' prefix is added automatically by Swagger UI.

Body

Request to enhance audio quality

file_idstring · min: 1Required

File ID to enhance

denoisebooleanOptional

Apply noise reduction

Default: true

noise_reductionnumber · max: 1Optional

Noise reduction strength (0.0-1.0)

Default: 0.5

eq_presetall ofOptional

EQ preset to apply

Default: balanced

string · enumOptional

Available EQ presets for audio enhancement

Possible values:

compressionbooleanOptional

Apply dynamic range compression

Default: false

normalizebooleanOptional

Normalize audio levels

Default: true

Responses

200

Job created successfully

application/json

401

Unauthorized - Invalid or missing Timbr token

402

Insufficient credits

application/json

404

File not found

application/json

422

Validation error - Invalid request body

post

/v1/audio/enhance

POST /v1/audio/enhance HTTP/1.1
Host: 
Authorization: Bearer YOUR_SECRET_TOKEN
Content-Type: application/json
Accept: */*
Content-Length: 122

{
  "compression": false,
  "denoise": true,
  "eq_preset": "balanced",
  "file_id": "file_abc123",
  "noise_reduction": 0.5,
  "normalize": true
}

{
  "job_id": "enh_a1b2c3d4e5f6",
  "status": "pending",
  "estimated_time": 30,
  "cost": 15,
  "created_at": "2026-01-05T10:00:00Z"
}

Diarize audio with speaker identification

post

Perform speaker diarization with optional transcription.

Features

Speaker Identification: Detect and label different speakers
Transcription: Optional speech-to-text with Whisper
Language Support: Multiple languages supported

Parameters

Parameter

Type

Description

file_id

string

File ID from upload

num_speakers

integer

Expected number of speakers (optional)

language

string

Language code (e.g., en, ko)

transcribe_model

string

Transcription model: whisper

Output Formats

timbr: Standard SRT format with speaker names
sentence_split: Sentence-segmented output

Workflow

Upload audio file via POST /v1/upload
Call this endpoint with file_id and options
Poll GET /v1/jobs/{job_id} for status
Download results from signed URLs when completed

Credits

Cost: 1 credit per second of audio
Billing: Credits reserved on job creation, confirmed on completion

Authorizations

AuthorizationstringRequired

How to Authenticate

Get your token from Timbr Dashboard
Click the Authorize button above
Enter only the token (e.g., TBR_abc123def456)
Click Authorize then Close

Important

✅ Correct: TBR_8b3364b5a772328a
❌ Wrong: Bearer TBR_8b3364b5a772328a

The 'Bearer ' prefix is added automatically by Swagger UI.

Body

Request for audio diarization (speaker identification + transcription)

The diarization pipeline:

VAD: Voice Activity Detection to find speech segments
STT: Speech-to-Text transcription
Speaker Embedding: Extract speaker features
Clustering: Group segments by speaker
SRT Generation: Create subtitles with speaker labels

file_idstring · min: 1Required

File ID of the uploaded audio (from /v1/audio/upload)

num_speakersany ofOptional

Expected number of speakers (auto-detect if None)

integer · min: 1 · max: 20Optional

nullOptional

languagestringOptional

Language code (e.g., 'en', 'ko', 'ja') or 'auto' for detection

Default: auto

transcribe_modelall ofOptional

Transcription model to use

Default: whisper

string · enumOptional

Transcription model options

Possible values:

vad_typeall ofOptional

Voice Activity Detection type

Default: ten-vad

string · enumOptional

Voice Activity Detection type

Possible values:

vad_thresholdnumber · max: 1Optional

VAD sensitivity threshold

Default: 0.5

vad_min_speech_durationnumber · max: 5Optional

Minimum speech duration in seconds

Default: 0.1

vad_min_silence_durationnumber · max: 5Optional

Minimum silence duration in seconds

Default: 0.1

detect_genderbooleanOptional

Detect speaker gender (KBO baseball use case)

Default: false

input_srt_file_idany ofOptional

Optional pre-existing SRT file for alignment

stringOptional

nullOptional

Responses

200

Job created successfully

application/json

401

Unauthorized - Invalid or missing Timbr token

402

Insufficient credits

application/json

404

File not found

application/json

422

Validation error - Invalid request body

post

/v1/audio/diarize

POST /v1/audio/diarize HTTP/1.1
Host: 
Authorization: Bearer YOUR_SECRET_TOKEN
Content-Type: application/json
Accept: */*
Content-Length: 130

{
  "file_id": "file_abc123",
  "language": "auto",
  "num_speakers": 2,
  "transcribe_model": "whisper",
  "vad_threshold": 0.5,
  "vad_type": "ten-vad"
}

{
  "job_id": "dia_a1b2c3d4e5f6",
  "status": "pending",
  "estimated_time": 300,
  "cost": 60,
  "created_at": "2026-01-05T10:00:00Z"
}

PreviousUpload NextJobs

Last updated 3 hours ago