Optimizing Audio Quality with Foo DSP SoundTouch ParametersOptimizing audio quality when using Foo DSP’s SoundTouch library requires balancing between time-stretching accuracy, pitch correctness, latency, and CPU efficiency. This article walks through the core parameters, explains how they affect sound, and provides practical tuning strategies and examples so you can get transparent, artifact-free results in a variety of real-world scenarios.
What is SoundTouch?
SoundTouch is an open-source audio processing library focused on time-stretching (changing playback speed without affecting pitch) and pitch-shifting (changing pitch without affecting speed). It is widely used in media players, DJ tools, and audio editing software for real-time and offline processing. Foo DSP is a set of plugins and wrappers that integrate SoundTouch into audio applications, exposing controls and parameters to end users and developers.
Key Concepts and Parameters
Understanding these core concepts will help you make informed choices when adjusting parameters.
-
Algorithmic trade-offs:
- Time-domain vs frequency-domain approaches influence artifacts and CPU cost.
- SoundTouch uses a time-domain algorithm based on the SmbPitchShift and overlap-add techniques, optimized for real-time use.
-
Fundamental parameters:
- Tempo: changes playback speed without altering pitch.
- Pitch: shifts pitch without changing tempo (in semitones or cents).
- Rate: changes both speed and pitch together.
- Sequence length: controls the length of analysis frames (affects transient preservation).
- Seek window length: controls the search window used to find the best overlap position (affects smoothness vs transient smearing).
- Overlap length: controls how much adjacent frames overlap during crossfading (affects artifacts and smoothing).
- Channels and sample rate: SoundTouch supports multi-channel and various sample rates, but parameter effects scale with sample rate.
How Each Parameter Affects Sound
- Tempo (and rate): Large tempo changes (beyond ±10–20%) increase artifacts such as phasiness and transient smearing. Small adjustments (±5%) are generally safe.
- Pitch: Pitch shifting by many semitones can introduce formant distortion and robotic timbres; applying formant correction or multi-band processing helps preserve naturalness.
- Sequence length: Short sequences (e.g., 10–30 ms) preserve fast transients better but can produce more phase issues. Longer sequences (e.g., 80–200 ms) yield smoother results but can smear percussive elements.
- Seek window length: Larger windows help find better overlap points for smoother output but can miss fine transient alignment.
- Overlap length: Higher overlap increases smoothing and reduces audible clicks but raises CPU cost and can blur fast transients.
Practical Presets and Recommended Ranges
Below are starting points for different use cases. Tweak slightly based on the source material.
- Music (vocals, instruments), small tempo changes:
- Sequence: 40–80 ms
- Seek window: 10–30 ms
- Overlap: 8–12 ms
- Tempo change: ±0–10%
- Music, large tempo changes:
- Sequence: 80–150 ms
- Seek window: 30–80 ms
- Overlap: 12–24 ms
- Use formant correction if available
- Percussive material (drums, transient-rich):
- Sequence: 10–30 ms
- Seek window: 5–10 ms
- Overlap: 4–8 ms
- Voice/speech:
- Sequence: 30–80 ms
- Seek window: 10–20 ms
- Overlap: 8–16 ms
- Pitch shifts: keep within ±3 semitones
Step-by-Step Tuning Workflow
- Choose goal: tempo change, pitch shift, or both.
- Start with a preset matching material type (use ranges above).
- Process a representative clip (~10–20 seconds) and listen for artifacts: flanging, transient smearing, chirping.
- If transients are smeared, reduce sequence and overlap lengths.
- If output sounds granular or phasy, increase sequence and overlap lengths.
- If pitch sounds robotic, reduce shift amount or enable formant preservation.
- Monitor CPU; if high, increase sequence length and reduce overlap slightly.
- Iterate until you balance quality and performance.
Advanced Techniques
- Multi-pass processing: For large pitch shifts, apply smaller shifts across multiple passes to reduce artifacts.
- Dynamic parameter modulation: Adjust sequence and overlap based on detected transient density (shorter for attacks, longer for sustains).
- Pre-filtering: Apply a slight low/high shelving to reduce extreme frequencies that exacerbate artifacts.
- Hybrid approaches: Combine time-domain SoundTouch with frequency-domain tools (e.g., phase vocoder) for complex material.
Example Code Snippets (Conceptual)
Use the official SoundTouch API or Foo DSP plugin parameters. Example logic (pseudocode):
soundtouch.setSampleRate(sr) soundtouch.setChannels(ch) soundtouch.setTempo(1.05) // +5% soundtouch.setPitchSemiTones(-2) soundtouch.setSequenceMs(50) soundtouch.setSeekWindowMs(15) soundtouch.setOverlapMs(10) processAudioInBlocks(...)
Common Pitfalls and How to Avoid Them
- Over-reliance on one preset: Always audition with the actual audio.
- Extreme parameter values: They often produce artifacts; triangulate using smaller increments.
- Ignoring sample rate effects: Higher sample rates require proportional adjustment of ms-based parameters.
- CPU vs quality: Profile performance and prefer larger sequences if CPU is constrained.
Listening Tests and Evaluation
Use ABX testing to compare processed vs original. Listen on neutral monitors/headphones at moderate volume. Focus checks on transients, formants, and rhythmic stability.
Final Notes
Optimizing SoundTouch involves trade-offs between transient fidelity, smoothness, pitch naturalness, and CPU load. Start with material-appropriate presets, iterate with short listening tests, and use advanced techniques (dynamic parameters, multi-pass) for challenging material.