GPU-Powered Real-Time Audio Source Separation
Alexander Talashov - DSP Online Conference 2025

Audio source separation has been a long-standing area of research due to its broad range of real-world applications. This technology allows music producers and sound engineers to repurpose archival recordings, even those captured under suboptimal conditions. In the film industry, it enables the restoration of classic movies—particularly those with mono audio—by enhancing them with modern innovations such as spatial audio. Moreover, it plays a crucial role in assistive technologies, helping individuals with hearing impairments communicate more effectively in noisy environments.
Leveraging GPUs to accelerate source separation addresses two key objectives: (1) increasing throughput to process multiple audio files in parallel, which streamlines workflows for music and film professionals; and (2) reducing latency, which is vital for real-time communication systems. The latter is especially relevant in high-demand sectors such as industrial operations and transportation, where speed and precision are essential.
In this workshop, we will provide an overview of the GPU AUDIO Platform—its architecture, core features, and how it enables the development of accelerated audio modules. Following this introduction, we will explore how to integrate state-of-the-art, open-source neural network-based source separation models with the GPU AUDIO Platform. Attendees will learn how to build a custom software stack, integrate with virtually any GPU-powered environment, and achieve millisecond-level latencies while running audio compute alongside other GPU workloads.
Participants will receive access to the workshop codebase and development environment, enabling them to replicate the setup on their own machines and continue experimenting with real-time, low-latency audio processing. By the end of the session, attendees will have a solid foundation in GPU-accelerated digital signal processing for modern audio applications.