Home > On-Demand Archives > Talks >

Deep Learning Inference in Embedded Systems

Jeff Brower - Watch Now - DSP Online Conference 2021 - Duration: 31:58

The path to developing, testing, and deploying deep learning inference in embedded system products is marked by obstacles you must avoid. To do it sucessfully -- and avoid your investors and managers losing faith -- here are some crucial, high level guidlines.

This guide was created with the help of AI, based on the presentation's transcript. Its goal is to give you useful context and background so you can get the most out of the session.

What this presentation is about and why it matters

This talk gives a practical, experience-driven roadmap for getting deep-learning inference to run reliably on embedded systems. Jeff Brower distills lessons learned from years of embedded and AI work into a high-level development pattern and a set of realities you must accept if you want a product that meets accuracy and performance targets. For engineers who work in signal processing, communications, audio, or vision on constrained hardware, the talk explains why the usual desktop/cloud workflows are insufficient and why a three-stage approach (design, simulation, target) drastically reduces wasted time and costly mistakes.

Who will benefit the most from this presentation

  • Embedded systems engineers who need to integrate neural-network inference into SoCs, FPGAs, or small ARM platforms.
  • Signal-processing engineers transitioning to machine learning for tasks like speech recognition, speaker ID, or real-time vision.
  • Project leads and managers who must estimate schedules, resource needs, and the iterative nature of training and debugging.
  • Algorithms and ML engineers who want to understand the constraints and practical trade-offs of running models in the field rather than only in the cloud.

What you need to know

The talk assumes familiarity with basic signal-processing concepts and a working sense of neural networks. Below are the specific ideas and terms you should understand before watching, which will help you follow the trade-offs and engineering choices Jeff emphasizes.

  • Training vs. Inference: Training happens on powerful x86+GPU machines and produces weights for a model. Inference is the runtime use of that trained model on your embedded device; it has different constraints.
  • Precision and Quantization: Models trained in 32-bit float often must be converted to lower precision (16-bit, 8-bit, or fixed point) to fit embedded memory and compute limits. Quantization affects accuracy and may require retraining or calibration.
  • SWaP constraints: Size, Weight, and Power limit what processors and accelerators you can use. Those constraints directly affect achievable real-time performance for DSP and ML workloads.
  • Code-base alignment: Keep the same code base across simulation and target as much as possible. Use conditional compilation or runtime flags to avoid introducing platform-specific bugs early on.
  • Three-level workflow: Design server (x86 training and initial model tuning), Simulation server (in-house box that mimics the target), Target (embedded device). Iteration among these levels is expected and frequent.
  • Debug strategy: Verify I/O and preprocessing independently of inference first. Use lots of probe points to validate data passed into the model before enabling the inference block.
  • Retraining and CI/CD: New data, papers, or edge cases will force retraining. Treat model updates as part of continuous integration and deployment.
  • Open-source ecosystems: Forums and community help assume x86+GPU setups. Frame questions in those terms to get useful answers, then adapt solutions to your embedded target.
  • Vendor tools caution: Avoid relying on closed or vendor-specific translation tools early in development; they can become debugging black boxes. Use vendor compilers and debuggers only at the final target integration step.
  • No heavy DSP-tool reliance: Traditional DSP vendor workflows and tools are often not suitable for modern deep-learning inference; treat ML as a distinct engineering discipline with its own toolchain.

Glossary

  • Inference — Running a trained neural network to produce outputs from new input signals in real time.
  • Training — Process of optimizing a models weights on labeled data, typically done on GPU-equipped x86 servers.
  • Quantization — Reducing numerical precision of weights/activations to save memory and compute (e.g., FP32 to INT8).
  • SWaP — Size, Weight, and Power constraints that drive embedded hardware choices.
  • SoC — System-on-Chip; an integrated device that may include CPU cores, accelerators, and memory for embedded inference.
  • Simulation server — An in-house x86 machine configured to closely match the target hardware for early integration testing.
  • Design server — The environment (often cloud or x86 workstation) used for model development and training.
  • JTAG — A hardware debugging interface used at the target stage for low-level troubleshooting.
  • CI/CD — Continuous Integration/Continuous Deployment; practice of automatically retraining, testing, and releasing models as data evolves.
  • Code-base parity — Keeping the same source tree across stages to ensure bugs are reproducible and not introduced by platform drift.

Final note

Jeff Browers talk is a concise, no-nonsense guide that pairs embedded-systems discipline with modern deep-learning realities. It does a good job of reframing common ML optimism into practical engineering steps you can adopt immediately: isolate problems, iterate between three stages, and keep retraining in your project plan. If you need to ship reliable, accurate inference on constrained hardware, this presentation will save you time and help you avoid the common landmines Jeff calls out.

M↓ MARKDOWN HELP
italicssurround text with
*asterisks*
boldsurround text with
**two asterisks**
hyperlink
[hyperlink](https://example.com)
or just a bare URL
code
surround text with
`backticks`
strikethroughsurround text with
~~two tilde characters~~
quote
prefix with
>

No comments or questions yet. Will you be the one who will break the ice?