Home > On-Demand Archives > Talks >
Are Digital Signal Processors Going Away?
Paul Beckmann - Watch Now - DSP Online Conference 2025 - Duration: 40:47
There is an ongoing narrative that specialized DSP processors are being replaced by general purpose processors. This presentation examines the core architectural principles that set DSPs apart and provide speed and efficiency advantages over general purpose processors. We compare benchmarks across commercially available processors and discuss emerging trends and innovations that could shape the next generation of DSP technology.
This guide was created with the help of AI, based on the presentation's transcript. Its goal is to give you useful context and background so you can get the most out of the session.
What this presentation is about and why it matters
This talk examines whether specialized digital signal processors (DSPs) will disappear in favor of general-purpose CPUs. It compares the architectural features that make DSPs efficient for real-time signal workloads, traces historical evolution (from early TMS320 parts through SHARC and modern Hexagon/HiFi cores), and reports measured benchmarks on realistic automotive audio workloads. For engineers, the practical takeaway is not a philosophical debate but a tool-selection guide: when to use a DSP, when an application processor makes sense, and what to expect from mixed SoC designs and emerging ML acceleration in audio.
Who will benefit the most from this presentation
- Embedded audio engineers choosing between DSP cores, Cortex CPUs, and SoC components.
- DSP algorithm developers who want to understand how architecture influences performance and precision.
- System architects designing low-latency, power-constrained audio or voice products (headphones, automotive, Bluetooth SoCs).
- Students and engineers learning why certain instructions and addressing modes matter in real-time signal processing.
What you need to know
To get the most from the talk, review these concise concepts and why they matter for implementation and performance.
- FIR filter inner loop — many DSP workloads reduce to repeated multiply–accumulates over a tap delay line. The basic difference equation is:
\[ y[n] = \sum_{k=0}^{N-1} b[k]\,x[n-k] \]
Understanding how inner-loop memory accesses and arithmetic map to processor instructions is key to performance. - Multiply–Accumulate (MAC) — a combined multiply and add in a single instruction. Hardware MACs reduce instruction count and cycle count per tap.
- Addressing modes — circular (modulo) addressing and bit‑reverse addressing remove per-iteration pointer math and are especially useful for FIRs and FFTs.
- Zero‑overhead loops — loop constructs that avoid per-iteration branch penalties. They improve deterministic timing for real‑time tasks.
- SIMD and wide vectors — Single‑Instruction Multiple‑Data operations (vector instructions) process several data elements per instruction and are essential for modern throughput improvements.
- VLIW and compiler dependence — Very-Long-Instruction-Word cores expose instruction-level parallelism to the compiler, which must schedule many operations together for performance.
- Multi-threading vs. multicore — hardware threads (Hexagon-style) can improve throughput and power by running several contexts on one physical core; multicore scales by replication but needs careful memory and synchronization handling.
- Memory bandwidth and caches — real-world performance often becomes memory-bound; cache behavior and bus contention can limit the benefit of extra compute units.
- Numerical formats — fixed-point, fractional, and floating-point register widths plus accumulator guard bits determine precision and headroom for audio fidelity.
- Real-time determinism and power — DSPs are designed for sustained, low-power, deterministic execution; CPUs excel at peak performance and ecosystem support but can be non-deterministic due to caches and OS activity.
Glossary
- DSP (Digital Signal Processor): A processor optimized for streaming arithmetic (MACs), special addressing modes, and deterministic real-time workloads.
- MAC (Multiply–Accumulate): An instruction that multiplies two numbers and adds the product to an accumulator in one cycle.
- Circular addressing: Hardware support that wraps pointer addresses automatically for FIFO-style buffers (common in FIRs).
- Zero-overhead loop: A loop mechanism that eliminates per-iteration branching and reduces loop overhead to improve timing predictability.
- SIMD (Single-Instruction Multiple-Data): Vector instructions that perform the same operation on multiple data elements simultaneously.
- VLIW (Very-Long-Instruction-Word): An architecture packing many operations into one wide instruction word for high instruction-level parallelism.
- HVX / Hexagon: Qualcomm’s vector extensions and DSP architecture that use hardware threads and vector units for mobile/automotive audio workloads.
- Accumulator / Guard bits: Extra bits in an accumulator to prevent overflow during repeated accumulation (important for fixed-point audio).
- Latency and determinism: Latency is the time from input to output; determinism means consistent timing — both are crucial for audio and ANC systems.
- SoC (System on Chip): A chip integrating CPU cores, DSPs, accelerators, memory controllers, and I/O—common in modern devices.
Final note
Paul Beckmann brings decades of hands-on audio DSP experience and pairs architectural explanation with concrete benchmark data. The talk is practical: it doesn’t just describe features, it shows where those features matter in a real automotive audio load. If you work on embedded audio, low‑latency systems, or product architecture, this presentation will sharpen your understanding of trade-offs between DSP, CPU, and hybrid SoC designs—and give you a clearer view of where ML acceleration and APUs may fit in the near future.
There are more and more "multimodal" applications which leverage both audio and video information. Consider a video conferencing application where the device sits in the middle of a table. The microphones would give a rough indication of where the talker is. Then the camera would turn and face the talker and visual cues would be used to fine tune the camera angle. Many applications like this exist even in automotive.
Dsp vs fpga vs gpu?
FPGAs and GPUs can be good for specialized computation. There are some commercial mic arrays that use over 100 microphones and you need specialized processing to receive and process all of these signals. The processing for beamformer is consistent and can be parallelized and implemented by FPGAs. (Plus the FPGAs help to aggregate all of the I/O.)
It is possible to do audio processing on GPUs but the price point makes it unsuitable for any high volume applications.
Dsp vs fpga? Digital comm vs audia?

What about Multimedia chips / Is it always going to be just audio/sound or is there a case for visual sensations (flashing lights, video) or vibrating seats or SurroundSmell ? What is UX in consumer applications ?