Home > On-Demand Archives > Talks >
Bending Sound to Suit the Musician
David McClain - Watch Now - DSP Online Conference 2023 - Duration: 44:28
This guide was created with the help of AI, based on the presentation's transcript. Its goal is to give you useful context and background so you can get the most out of the session.
What this presentation is about and why it matters
David McClain explains how to use DSP to make music sound "right" for musicians and listeners with sensorineural hearing impairment. Standard hearing aids are optimized for speech and often destroy musical timbre: instruments lose their character because the ear’s spectral and nonlinear behavior is not being compensated correctly. This talk shows practical, physics-informed ways to pre-warp audio — by dividing sound into cochlea-like bands and applying level-dependent (nonlinear) gain — so a hearing-impaired listener hears musical timbres closer to the intended sound instead of flattened or distorted versions.
For engineers this matters because it brings together perceptual models (Bark/critical bands, equal-loudness contours), practical DSP (FFT-based filter banks, overlap-add convolution, multiband compressors), and system constraints (latency, safety limits, audiometry errors) into a working real-time product. It is an applied case study of perceptual signal processing where math, physiology, and engineering choices directly affect musical quality.
Who will benefit the most from this presentation
- DSP engineers and audio plugin developers building assistive audio or music-processing tools.
- Audio researchers interested in perceptual loudness models, multiband compression, and cochlear-inspired processing.
- Acousticians and systems engineers who must balance latency, CPU, and perceptual fidelity for live music.
- Musicians, audio techs, and audiology-interested developers who want practical heuristics for tuning playback for impaired listeners.
What you need to know
The talk assumes familiarity with basic digital audio and a few perceptual and DSP concepts. Key background:
- Critical bands & Bark scale: The cochlea groups frequencies into curved, overlapping bands. Use Bark-based channels (roughly 24 across the audible range) rather than uniform FFT bins for perceptual operations.
- Equal-loudness contours (ISO 226), phons and sones: Phons index loudness contours; sones are a perceptual linear-scale loudness unit. At 1 kHz, 40 phons = 1 sone and each +10 phons ≈ double loudness in sones. The practical relation used in the talk is $\text{sones} = 2^{(\text{phon} - 40)/10}$.
- Recruitment and nonlinear loudness: Sensorineural loss often raises thresholds and compresses comfortable ranges (recruitment). Everyday hearing follows an approximate cube-root compression above low levels, so perceptual response ≈ input_power^{1/3}. In math form (qualitative): $\text{perceived} \propto P^{1/3}$.
- Multiband nonlinear compression: Compensation must be frequency selective and level dependent: per-Bark-band compressors that raise sounds above the impaired threshold just enough to match the normal-ear percept. Linear EQ or single global compressor is insufficient.
- FFT-based processing and overlap-add: Real-time implementation is efficient with block FFT multiply / inverse-FFT and overlap-add. Block size and update rate trade resolution vs. latency (e.g., 256/512 blocks, few-millisecond lookahead).
- Power conservation when mapping Bark filters to FFT bins: When estimating band power from FFT bins, preserve energy: use squared-magnitude weighting and correct for equivalent rectangular bandwidth (ERB) to get accurate band power.
- Practical constraints: Limit gain for safety, immediate reduction on impulses, slower safe increases on fades, and allow user tuning because audiometry measurements are noisy (±5 dB typical) and over/under-correction is perceptible.
Glossary
- Bark scale — A psychoacoustic frequency scale that maps roughly to cochlear place; used to define critical bands.
- Critical band — Frequency region where masking occurs; the unit for perceptual loudness computations.
- ISO 226 (equal-loudness) — Standard loudness contours (phons) across frequency and SPL.
- Phon — Loudness level relative to 1 kHz equal-loudness contour; used with sones for perceptual scaling.
- Sone — Perceptual loudness unit where doubling sones ≈ +10 phons.
- Recruitment — Rapid perceived loudness growth above elevated threshold in sensorineural loss.
- ERB (Equivalent Rectangular Bandwidth) — A way to describe auditory filter bandwidth for power calculations.
- Overlap-add FFT — An efficient block convolution method for real-time FIR filtering implemented via FFTs.
- Cube-root compression — Empirical perceptual compression where perceived loudness grows like the cube root of physical power in everyday ranges.
- Gain dynamics — Attack/release rules applied to per-band gain; here immediate reduction for safety and controlled release/increase for audibility.
Why you should watch
McClain’s talk is a rare mix of perceptual insight, pragmatic DSP, and production-ready choices. He connects physiological models (basilar place, recruitment) to concrete algorithmic choices (Bark-channel power estimation, FFT overlap-add, latency targets, and safety limits). If you work on any audio system that interacts with listeners rather than just meters — music players, plugins, assistive audio — you’ll find directly applicable ideas and sensible trade-offs. The demo and engineering anecdotes make the material practical and credible: this is engineering informed by listening and by lived experience.
Watch to learn a clear, implementable approach for preserving musical timbre for impaired listeners and to see how perceptual models and DSP engineering fit together in a real product.
Hi John,
Thanks for chiming in. Crescendo itself is not written in Lisp. I used Lisp to help write the high performance C/C++ code that is Crescendo. Just the same, in the early days only a DSP had the horsepower to run the Crescendo engine. Today, measured on my M1 iMac computer, it takes less than 2.5% of CPU capacity.
The main quest for Crescendo has been accurate restoration of the harmonic content of musical sound, so that oboes continue to sound more like they should, instead of being transformed into Miles Davis muted jazz trumpets. And to that end, Crescendo performs astonishingly well.
But Crescendo does not have many of the features of hearing aids - those solve a different problem - helping to comprehend human speech. And so hearing aids take great liberties in producing, often intentional harmonic distortion, as well as eroding drone background sounds. Both of those are harmful to music, but often help in discerning the spoken word. Case in point is the limited bandwidth, high-pass nature of telephone conversations. Without the bass it is often easier to comprehend. But that would be disastrous to music.
Just the same, I use Crescendo for myself 24/7, and with it I can hear my own sibilance and fricative and glottal sounds that I cannot hear without aid. And I use it right now during the conference to help me hear everyone. But Crescendo's main aim is accurate musical restoration, not speech.
Cheeers,
- DM
I should add, that today's hearing assistance technology is mostly focused on fixed-width frequency bins - WOLA Filter-banks, which are close cousins of FFT's. Those fixed-width bandpass filters are not well matched to our hearing. And so when you see them brag of 7 bands, or 13 bands, those are 7 or 13 FFT bins, poorly matched to our hearing, and typically only cover 500 Hz to 5 kHz.

Thank you for your presentation, most interesting. I do lots of work for the Govt too, and would have enjoyed working on muscle cars. At the turn-on sessions of cochlear implants, recipients get to hear high frequencies often like never before. However they complain of robotic-sounding speech like that of a ring modulator. The audiologist assures them that their brain will get used to the stimulation, recognizing voices and sounds and their life will improve. Their world becomes much more noisy, confusing and exhausting, but at least they are not missing out. They often remove the external "audio processor" for a break and relax using a prior limited hearing aid. In auditoriums, the mush can be even worse. I wonder if such tools like Crescendo might ease that effort, packaged on the necessarily-limited processors and even if written in LISP ?