SAM Audio - Audio Isolation API

SAM Audio integration

Powered by SAM Audio technology, this API isolates specific sounds from audio or video files using natural language descriptions.

SAM Audio is an AI-powered audio isolation API that extracts specific sounds from audio or video files based on text descriptions. Describe what you want to isolate - vocals, speech, instruments, or sound effects - and receive a clean WAV file containing only that sound. The API supports both audio files (WAV, MP3, FLAC, OGG, M4A) and video files (MP4, MOV, WEBM, AVI) as input.

Key capabilities

Text-guided isolation: Describe any sound to extract (e.g., “A person speaking”, “Piano playing”, “Dog barking”)
Multi-format input: Accepts audio (WAV, MP3, FLAC, OGG, M4A) or video (MP4, MOV, WEBM, AVI) files
Video localization: Optional bounding box (x1, y1, x2, y2) to focus on specific areas in video
Quality tuning: Adjust reranking_candidates (1-8) to balance quality vs. latency
Event detection: Enable predict_spans for better isolation of non-ambient sounds
WAV output: High-quality WAV audio file with the isolated sound
Async processing: Webhook notifications or polling for task completion

Use cases

Music production: Extract vocals from songs for remixes or karaoke tracks
Podcast editing: Isolate speech from background noise or music
Film post-production: Separate dialogue from ambient sounds for audio mixing
Sound design: Extract specific sound effects from video recordings
Transcription services: Clean up audio by isolating speech before transcription
Instrument isolation: Separate specific instruments from full band recordings

Isolate audio with SAM Audio

Submit an audio or video file with a text description of the sound to isolate. The service returns a task ID for async polling or webhook notification.

POST /v1/ai/audio-isolation

Create a new audio isolation task

GET /v1/ai/audio-isolation

List all audio isolation tasks

GET /v1/ai/audio-isolation/{task-id}

Get task status and results by ID

Parameters

Parameter	Type	Required	Default	Description
`description`	`string`	Yes	-	Text description of the sound to isolate (e.g., “A person speaking”, “Piano playing”)
`audio`	`string`	No*	-	URL or base64-encoded audio file (WAV, MP3, FLAC, OGG, M4A)
`video`	`string`	No*	-	URL or base64-encoded video file (MP4, MOV, WEBM, AVI)
`x1`	`integer`	No	`0`	Bounding box left coordinate for video localization (0 = full frame)
`y1`	`integer`	No	`0`	Bounding box top coordinate for video localization (0 = full frame)
`x2`	`integer`	No	`0`	Bounding box right coordinate for video localization (0 = full frame)
`y2`	`integer`	No	`0`	Bounding box bottom coordinate for video localization (0 = full frame)
`sample_fps`	`integer`	No	`2`	Frame sampling rate for video (1-5 FPS)
`reranking_candidates`	`integer`	No	`1`	Quality vs. latency trade-off (1-8, higher = better quality, slower)
`predict_spans`	`boolean`	No	`false`	Enable for better isolation of non-ambient, event-based sounds
`webhook_url`	`string`	No	-	URL for task completion notification

*Either audio or video must be provided, but not both.

Frequently Asked Questions

What is SAM Audio and how does it work?

SAM Audio is an AI-powered audio isolation API that uses text descriptions to identify and extract specific sounds from audio or video files. You submit a file with a description of the target sound (e.g., “A person speaking”), receive a task ID immediately, then poll for results or receive a webhook notification. The output is a WAV file containing only the isolated sound.

What audio and video formats does SAM Audio support?

For audio input: WAV, MP3, FLAC, OGG, and M4A formats. For video input: MP4, MOV, WEBM, and AVI formats. Files can be provided as URLs or base64-encoded strings.

How do I write effective sound descriptions?

Be specific and descriptive. Good examples: “A person speaking”, “Piano playing in the background”, “Dog barking loudly”, “Acoustic guitar strumming”. Avoid vague descriptions like “music” or “noise” - instead specify what type of music or sound you want to isolate.

What is the reranking_candidates parameter for?

The reranking_candidates parameter (1-8) controls the quality vs. speed trade-off. Higher values produce better isolation quality but take longer to process. Use 1 for fastest results, 8 for highest quality. Default is 1.

When should I enable predict_spans?

Enable predict_spans when isolating non-ambient, event-based sounds like speech, individual notes, or sound effects. Keep it disabled (default) for continuous ambient sounds like background music or environmental noise.

How does video localization work with bounding boxes?

For video input, you can specify a bounding box (x1, y1, x2, y2) to focus on sounds originating from a specific area of the frame. This is useful when you want to isolate audio from a particular person or object in the video. Set all values to 0 (default) to process the full frame.

What is the output format?

SAM Audio outputs a high-quality WAV audio file containing only the isolated sound. This uncompressed format is ideal for further editing or processing in audio production workflows.

Best practices

Description specificity: Use detailed descriptions for better isolation accuracy
Input quality: Higher quality input audio/video produces better isolation results
Quality tuning: Start with reranking_candidates=1 for testing, increase for production
Event sounds: Enable predict_spans for speech, music notes, or sound effects
Video focus: Use bounding boxes to isolate sounds from specific video regions
Production integration: Use webhooks instead of polling for scalable applications
Error handling: Implement retry logic with exponential backoff for 503 errors

Sound Effects: Generate sound effects from text descriptions
Lip Sync: Synchronize lip movements with audio
OmniHuman 1.5: Generate human animations driven by audio

Get Started

APIs

SAM Audio - Audio Isolation API | Freepik API

SAM Audio integration

Key capabilities

Use cases

Isolate audio with SAM Audio

POST /v1/ai/audio-isolation

GET /v1/ai/audio-isolation

GET /v1/ai/audio-isolation/{task-id}

Parameters

Frequently Asked Questions

Best practices

Get Started

APIs

SAM Audio integration

​Key capabilities

​Use cases

​Isolate audio with SAM Audio

POST /v1/ai/audio-isolation

GET /v1/ai/audio-isolation

GET /v1/ai/audio-isolation/{task-id}

​Parameters

​Frequently Asked Questions

​Best practices

​Related APIs

Key capabilities

Use cases

Isolate audio with SAM Audio

Parameters

Frequently Asked Questions

Best practices

Related APIs