Practical DSP Filter Implementation with Fixed-Point ArithmeticDigital Signal Processing (DSP) filters are fundamental to many embedded systems — from audio devices and communications equipment to sensor signal conditioning and control systems. While floating-point implementations are convenient during development, many production embedded platforms (microcontrollers, DSP cores, FPGAs, and specialized ASICs) use fixed-point arithmetic because of lower cost, power consumption, and deterministic performance. This article covers practical aspects of designing, implementing, and validating DSP filters using fixed-point arithmetic, with guidance, examples, and common pitfalls.
Why fixed-point?
- Lower cost and power: Fixed-point processors are simpler and consume less power than floating-point units.
- Higher throughput: Fixed-point arithmetic can be faster on hardware without an FPU.
- Deterministic behavior: Predictable execution time and bit-true results are important in real-time systems.
However, fixed-point requires explicit handling of dynamic range, quantization, and overflow — tradeoffs that must be carefully managed.
Overview: filter types and where fixed-point matters
- FIR (Finite Impulse Response) filters:
- Always stable.
- Linear phase achievable.
- Easier to implement in fixed-point because internal states are just delay lines and sums.
- IIR (Infinite Impulse Response) filters:
- More computationally efficient (lower order for similar specs).
- Can be unstable if coefficients or arithmetic introduce errors.
- Sensitive to quantization; requires careful structure choice (e.g., cascade of biquads).
Fixed-point concerns affect both filter coefficients and internal arithmetic (accumulators, multipliers, scaling).
Number formats and representations
Common fixed-point formats:
- Q-format: Qm.n or simply Qn denotes n fractional bits (signed). Example: Q1.15 on a 16-bit signed type uses 1 sign/integer bit and 15 fractional bits.
- Unsigned integer formats may be used for purely positive signals.
- Block floating point (shared exponent across a vector) is an alternative when dynamic range varies.
Choose word lengths based on platform: 16-bit (Q15) and 32-bit (Q31) are most common. Keep in mind:
- Effective fractional resolution = number of fractional bits.
- Integer bits determine dynamic range and saturation limits.
Design flow: from floating-point prototype to fixed-point implementation
- Design prototype in floating-point:
- Use MATLAB, Python (scipy.signal), or similar to get ideal coefficients and verify frequency response.
- Choose architecture and word lengths:
- Decide Q-format (e.g., Q1.15 for 16-bit, Q1.31 for 32-bit).
- Choose accumulator width (often wider than multiplier inputs to avoid overflow).
- Quantize coefficients:
- Round to nearest representable fixed-point value.
- Analyze coefficient quantization effects (frequency response deviations, stability).
- Choose filter structure:
- FIR: direct-form or polyphase, symmetric implementations to halve multiplications.
- IIR: prefer cascade of second-order sections (biquads) or lattice forms to limit sensitivity.
- Scale signals and stages:
- Prevent overflow by scaling coefficients or using stage-wise scaling.
- Use saturation arithmetic when available.
- Implement and test:
- Unit tests vs floating-point reference.
- Measure SNR, frequency response, and stability.
- Use fixed-point simulation tools (MATLAB Fixed-Point Designer, PyFixedPoint, or custom integer-sim).
- Optimize for speed and memory:
- Use hardware multiply-accumulate (MAC) instructions, circular buffers, and DMA for data movement.
- Validate on target hardware:
- Real input signals, resource monitoring (CPU, memory), and timing checks.
Coefficient quantization: effects and mitigation
Quantization introduces changes in magnitude and phase response and, for IIR filters, may move poles leading to instability.
Mitigations:
- Use higher-bit coefficients (e.g., Q31 coefficients on a 32-bit platform).
- Use frequency-domain sensitivity analysis: compute worst-case ripple introduced by quantization.
- Re-order sections in cascaded IIR to place higher-gain sections earlier or later depending on internal scaling.
- Apply dithering (for some audio cases) and noise-shaping techniques in downstream processing.
Example: a floating-point coefficient b = 0.123456 becomes Q1.15 ≈ round(0.123456 * 2^15)/2^15.
Fixed-point arithmetic techniques
- Accumulators: use wider accumulators (e.g., for 16-bit multiplies use 32-bit accumulators) to avoid overflow during summation.
- Guard bits: reserve extra bits to prevent overflow in intermediate sums.
- Saturation vs wrap-around: prefer saturation if available; wrap-around causes unexpected artifacts.
- Rounding: use rounding-to-nearest rather than truncation to reduce bias. For accumulators, apply rounding before downshifting.
- Bit shifts: use arithmetic shifts for signed numbers; track sign-extension.
Example: In Q1.15 multiply Q1.15 x Q1.15 = Q2.30; to return to Q1.15, shift right 15 bits with rounding.
FIR implementation tips
- Symmetric FIR: exploit symmetry of coefficients in linear-phase filters to halve multiplies: y[n] = sum_{k=0}^{M} h[k]*(x[n-k] + x[n-(N-k)])
- Use block processing (SIMD) where available.
- Use circular buffers for delay lines to avoid memory moves.
- For decimation/interpolation filters, use polyphase structures to reduce work.
Scaling:
- Compute worst-case sum of absolute coefficients; ensure accumulator width handles maximum without overflow, or pre-scale coefficients so maximum sum < 1 in Q-format.
IIR implementation tips
- Use cascaded biquads (Direct Form I or II Transposed). Transposed direct form II often offers good numerical properties in fixed-point.
- Implement per-section scaling: after each biquad apply scaling to keep internal values within range.
- Use state variable or lattice structures for high-order filters when numerical sensitivity is a concern.
- Monitor pole locations after quantization; if poles move outside unit circle, redesign or increase precision.
Example: Biquad (Direct Form II Transposed) inner operations map well to MAC instructions and can be implemented with careful scaling of coefficients and states.
Dynamic range and scaling strategies
- Input scaling: ensure input range fits chosen Q-format. For sensors, apply gain staging.
- Stage scaling: distribute overall gain across stages to avoid saturating intermediate results.
- Block floating point: if dynamic range varies substantially, consider block float where each block has an exponent and mantissa in fixed-point.
- Auto-scaling: some DSP libraries/hardware offer automatic block scaling at cost of complexity.
Testing and verification
- Compare fixed-point output to floating-point reference for:
- Impulse and step responses.
- Frequency response (magnitude and phase).
- SNR and THD (for audio).
- Worst-case inputs (maximum amplitude, DC steps).
- Use bit-true simulation: simulate exact integer arithmetic and shifts to catch overflows and rounding errors.
- Hardware-in-the-loop: run on target with representative signals and measure performance and correctness.
Performance and optimization
- Use hardware MAC and DSP extensions (SIMD, packed arithmetic).
- Align data and use DMA to reduce CPU load.
- Minimize memory accesses: reuse buffers and apply loop unrolling where beneficial.
- Trade precision for speed: sometimes using Q15 vs Q31 reduces memory bandwidth and increases throughput; quantify impact on SNR.
- Consider fixed-point libraries optimized for the platform (CMSIS-DSP for Arm, vendor DSP libraries).
Practical example: Q15 FIR filter (pseudo-code)
/* 16-bit Q15 coefficients and 16-bit input, 32-bit accumulator */ for n in 0..N-1: acc = 0 // 32-bit signed for k in 0..M-1: acc += (int32_t)coeff[k] * (int32_t)x[n-k] // product is Q30 // rounding and shift to return to Q15 y[n] = (int16_t)((acc + (1<<14)) >> 15)
Notes:
- coeff and x are Q1.15.
- Accumulator must be wide enough to hold sum of products (use 64-bit if M large).
Common pitfalls
- Ignoring accumulator width and causing overflow.
- Forgetting to apply rounding after shifts, causing bias.
- Using direct-form high-order IIR without sectioning, leading to instability after quantization.
- Mismatched Q-formats between stages or libraries.
- Not testing with worst-case inputs.
Quick checklist before deployment
- Prototype filter in floating-point and verify specs.
- Choose Q-format and accumulator widths.
- Quantize coefficients and simulate fixed-point behavior.
- Select numerically robust filter structure (symmetric FIR, cascade biquads, lattice).
- Add saturation and rounding as needed.
- Optimize implementation for target hardware (MAC, DMA, circular buffers).
- Validate on target with representative signals and edge cases.
Implementing DSP filters in fixed-point requires careful attention to numeric details, scaling, and structure selection. With systematic design, simulation, and testing, fixed-point filters can achieve excellent performance and accuracy on resource-constrained hardware.
Leave a Reply