Home > On-Demand Archives > Talks >
FPGA Resource Optimizing in DSP Algorithms
Pablo Trujillo - Watch Now - DSP Online Conference 2023 - Duration: 31:50
This guide was created with the help of AI, based on the presentation's transcript. Its goal is to give you useful context and background so you can get the most out of the session.
What this presentation is about and why it matters
This talk walks through practical techniques to reduce FPGA resource usage when implementing digital signal processing (DSP) algorithms. The speaker uses an example eighth-order FIR filter and a Xilinx 7-series DSP slice (DSP48E1) to demonstrate how choices in algorithm structure and coding map directly to hardware cost: number of multipliers (MAC units), registers, and logic (LUTs). You will see concrete trade-offs such as using coefficient symmetry, registering the MAC, or folding computations into a single multiplier.
Why this matters: in many real systems (industrial controllers, embedded radios, audio processors, sensors) developers must fit signal-processing blocks into small, low-cost FPGAs that have a limited number of hardware multipliers. Making the right structural and coding choices can turn an impractical design into one that meets timing, cost, and power targets without changing the algorithmic behavior.
Who will benefit the most from this presentation
- FPGA engineers and developers implementing DSP (filters, PID controllers, IIR biquads).
- DSP engineers who want to understand hardware constraints and how algorithm structure affects resource use.
- Embedded system designers choosing devices and trade-offs between logic, DSP blocks, and clock frequency.
- Students learning how synthesis and place-and-route map code to real hardware and why coding style matters.
What you need to know
This presentation assumes basic familiarity with discrete-time DSP and some FPGA development concepts. The short list of useful background topics:
- MAC structure: Many DSP operations are repeated multiply-accumulate steps. A single MAC performs multiply then add — it is the inner loop of filters and many controllers.
- FIR filter math: The impulse-response form is useful to keep in mind: $y[n]=\sum_{k=0}^N b_k\,x[n-k]$. Implementing this directly needs one multiply per coefficient per output sample.
- FPGA primitives: Modern FPGAs include dedicated DSP slices that combine pre-adders, multipliers, accumulators, and registers. In Xilinx 7-series the DSP48E1 contains an 18×25 multiplier and a 48-bit accumulator plus configurable input/output registers.
- Bit-width and quantization: Hardware multipliers have fixed operand widths. Reducing operand width (for example, from 32-bit to 16-bit fixed-point) can let you use fewer DSP blocks or fit more parallelism into a device.
- Pipelining vs latency: Adding registers improves clock frequency (timing) but increases latency (pipeline depth). This is acceptable when there are many clock cycles between input samples (e.g., high clock, low sample rate).
- Folding / time-multiplexing: You can reuse a single multiplier for multiple coefficient products by sequencing operations across several clock cycles. Folding reduces multiplier count at the cost of control logic and increased computation latency.
- Synthesis/place-and-route behavior: The tools choose how to map RTL to primitives. You influence the result by coding style (e.g., using registers where DSP slice registers exist) and by selecting algorithm structures congenial to the DSP primitive.
Glossary
- FPGA — Field-Programmable Gate Array, a reconfigurable chip consisting of logic blocks, interconnect, RAM, and specialized blocks like DSP slices.
- DSP slice / DSP48E1 — A dedicated MAC block inside many FPGAs; contains pre-adder(s), multiplier, ALU/accumulator, and input/output registers.
- MAC — Multiply–Accumulate operation (multiply then add); the core operation of filters and many DSP algorithms.
- FIR filter — Finite Impulse Response filter implementing $y[n]=\sum b_k x[n-k]$ via a set of multiplies and adds.
- IIR filter — Infinite Impulse Response filter that uses feedback (poles) and often is implemented as cascaded biquad sections.
- Pipelining — Inserting registers to split long combinational paths, increasing maximum clock rate while adding latency.
- Folding (time-multiplexing) — Reusing a single hardware multiplier across multiple multiplication operations by sequencing them in time under control logic.
- Pre-adder — An adder located before the multiplier in a DSP slice; useful for symmetric coefficient exploitation (add samples then multiply).
- Bit-width / Quantization — Number of bits used to represent signals; reducing width lowers resource use but can affect numerical accuracy.
- Synthesis / Place-and-Route — Toolchain steps that translate RTL into FPGA primitives and then place and interconnect them on the chip.
Final note
Pablo Trujillo’s talk provides a clear, hands-on walk through how small design changes — coding style, choice of filter structure, and explicit use of DSP slice features — change the hardware cost and timing of DSP implementations. The examples (symmetric FIR, registered MAC, pipelined chain inside DSP slices, and folding) are practical patterns you can apply immediately. If you build DSP on FPGAs, this presentation is a compact, practical guide to thinking like the device: match your algorithmic diagram to what the silicon provides, and you’ll save multipliers, meet timing, and reduce board cost. Enjoy the talk — it’s a useful blend of theory-to-hardware lessons and real implementation results.
Pablo,
Very informative with the comparison of FPGA resource usage and DSP algorithm implementation. The flow of your presentation was quite helpful in showing me the implementation options and the results. I'll definitely be applying this information with my DSP
algorithm design on FPGAs. I enjoyed watching your presentation and I'm going to be reading your blog for additional information. Thank you.
May I add one piece of constructive criticism, your heavy accent made some of the words pronounced difficult to understand. I was able to figure out most of them after a while. If you slowdown with some of those words and/or provided a transcript of your presentation that I could quickly reference, I would have been able to follow you better. Your comments are quite valuable, and I didn't want to miss what you were conveying.
Hi Brad,
many thanks for your comments. Yes, my accent is far from native English. Maybe I have to add more words to my slides in order to make it easy for the attendants to follow me. Noted for the next!
Regarding the blog, you will find from the design of the DSP algorithm in Matlab or Python to the implementation in Verilog. Also, you can follow me on X (Twitter) where I share new posts and posts that I wrote some time ago.
Thanks for attending!
Dear Pablo, very interesting and comprehensive talk, I enjoyed it very much. I need to check out your website. Thank you for this talk.
Hi Thomas,
thanks for your comment!

Some nice re-compositions and why you might want to do them. Many thanks for an interesting in-depth presentation of some things we kinda knew, but had forgot. You might want to point to the elements in the presentation that you accent heavily eg. point to a register for "resistors". Anyway, most interesting.