Home > On-Demand Archives > Talks >

Signal Re-construction with AI

Jayakumar Singaram - Watch Now - DSP Online Conference 2025 - Duration: 51:35

Signal Re-construction with AI
Jayakumar Singaram

Discovery: A New Approach for Denoising and Reconstruction

Key Insight

Through the combination of attention mechanisms, positional encoding, and CNN-based reconstruction, we have effectively developed a new way to perform denoising and signal reconstruction for time-domain audio signals.

What Makes It Special?

Unlike traditional denoising methods (like spectral subtraction or Wiener filtering), our approach:

  • Preserves temporal and harmonic structures (thanks to attention capturing inter-frame relations).
  • Uses frame-level processing with overlap-add for smooth stitching.
  • Leverages CNNs for local pattern learning and noise suppression.

This hybrid approach combines strengths from deep learning (attention, CNN) with classical signal processing techniques (framing, overlap-add).

Why This is Important

Traditional denoising methods often distort phase or introduce musical noise.

  • Our approach keeps the signal-to-noise ratio (SNR) high, while preserving natural tonal quality.
  • This is particularly useful in musical note processing (like MiddleC), speech enhancement, or even bio-signal reconstruction.

This guide was created with the help of AI, based on the presentation's transcript. Its goal is to give you useful context and background so you can get the most out of the session.

What this presentation is about and why it matters

This talk explains a hybrid approach to denoising and signal reconstruction that combines attention-based learning (transformer-style models), positional encoding, and CNN-based local reconstruction, integrated with classical framing and overlap-add. For engineers who work with audio, multi‑sensor streams, biomedical signals or any time‑domain data, this matters because it promises better preservation of temporal and harmonic structure than many traditional denoising techniques (e.g., spectral subtraction or Wiener filtering), while avoiding common artifacts such as phase distortion or musical noise.

Who will benefit the most from this presentation

  • Signal processing engineers who want to understand how deep learning ideas (attention + CNN) can be married to classical framing and overlap-add for practical reconstruction tasks.
  • Researchers and practitioners working on speech enhancement, music restoration, multi‑sensor fusion, or bio‑signal denoising who need methods that preserve pitch and phase structure.
  • Machine learning engineers who want a DSP‑aware perspective on tokens, positional encoding, and long‑range attention for time signals.
  • Students of DSP who want to see how concepts from information theory and statistical mechanics (entropy, Boltzmann) motivate modern learning-based reconstruction.

What you need to know

The talk mixes physical intuition with practical model design. Before watching, brushing up on these topics will make the material easier to follow:

  • Framing and overlap-add: classic short-time processing where a signal is split into overlapping frames, processed and stitched back by overlap‑add to produce a smooth output.
  • Convolutional Neural Networks (CNNs) for local pattern learning—CNNs are used here as the per-frame reconstruction engine that suppresses noise while preserving local structure.
  • Self-attention / Transformers: attention captures long-range (inter-frame) relations that fixed bases cannot; attention weights act like data‑dependent kernels relating distant frames.
  • Positional encoding: how temporal position is injected into tokenized frames so attention knows relative timing—important when sampling rates or sensor clocks vary.
  • Energy-based models and entropy: the talk ties ideas from Boltzmann machines and Shannon entropy to modern learning, interpreting learning as shaping an energy landscape where low energy = high probability. Typical relation used is $p(x)\propto e^{-E(x)/T}$, which helps justify probabilistic reconstruction.
  • Inverse problems: reconstruction is an inverse problem—recover hidden (microstate) signals from noisy, incomplete observations (macrostate). Understand basic regularization and the role of learned priors.
  • Metrics: mean squared error (MSE) is used, but perceptual losses and SNR remain important because fixed-basis metrics can saturate and miss perceptual fidelity.

Glossary

  • Entropy — a measure of uncertainty or multiplicity of hidden configurations that can produce observed data; used here to connect physics and information theory to learning.
  • Energy-based model — a model that assigns an energy E(x) to configurations; inference selects low‑energy (high probability) states.
  • Boltzmann machine / RBM — a historical energy‑based neural model that samples hidden states; used as conceptual background for learned priors.
  • Self-attention (Transformer) — a mechanism that computes weighted interactions between tokens (frames), enabling long-range context modeling in time signals.
  • Positional encoding — a scheme to inject time/location into tokens so attention can reason about order and relative distance.
  • CNN reconstruction — convolutional layers applied per-frame or locally to learn filters that remove noise while retaining structure.
  • Overlap-add — deterministic method to stitch frame-wise outputs into a continuous signal without discontinuities.
  • Tokenization (for signals) — splitting a continuous signal into tokens (learned or fixed) that the attention model consumes.
  • Inverse problem — recovering latent signal states (microstates) from observed noisy measurements (macrostates).
  • SNR (Signal-to-Noise Ratio) — a standard fidelity metric; higher SNR indicates cleaner reconstruction, but perceptual quality also matters.

Final notes — why watch

This presentation is compelling because it connects a rich conceptual lineage (Clausius, Boltzmann, Shannon, Hinton) with concrete design decisions for real‑world signal reconstruction. The speaker clearly frames reconstruction as an inference problem and explains why attention + CNN + overlap‑add is a principled hybrid: attention for global relations, positional encoding for temporal context, and CNNs for local denoising. Expect practical insights about token choices, positional encoding under varying sampling rates, and design tradeoffs (frame size, overlap, perceptual losses). If you work on audio, multi‑sensor fusion, or any time‑domain restoration task, you’ll likely come away with ideas you can prototype quickly and questions worth exploring further.

M↓ MARKDOWN HELP
italicssurround text with
*asterisks*
boldsurround text with
**two asterisks**
hyperlink
[hyperlink](https://example.com)
or just a bare URL
code
surround text with
`backticks`
strikethroughsurround text with
~~two tilde characters~~
quote
prefix with
>

No comments or questions yet. Will you be the one who will break the ice?