Live Q&A - Signal Processing Formulations of Sequence Models

Login

About Julius O. Smith III

Julius O. Smith is a research engineer, educator, and musician devoted primarily to developing new technologies for music and audio signal processing. He received the B.S.E.E. degree from Rice University in 1975 (Control, Circuits, and Communication), and the M.S. and Ph.D. degrees in E.E. from Stanford University, in 1978 and 1983, respectively. For his MS/EE, he focused largely on statistical signal processing. His Ph.D. research was devoted to improved methods for digital filter design and system identification applied to music and audio systems, particularly the violin.

From 1975 to 1977 he worked in the Signal Processing Department at ESL, Sunnyvale, CA, on systems for digital communications. From 1982 to 1986 he was with the Adaptive Systems Department at Systems Control Technology, Palo Alto, CA, where he worked in the areas of adaptive filtering and spectral estimation. From 1986 to 1991 he was employed at NeXT Computer, Inc., responsible for sound, music, and signal processing software for the NeXT computer workstation.

After NeXT, he became a Professor at the Center for Computer Research in Music and Acoustics (CCRMA) at Stanford, with a courtesy appointment in EE, teaching courses and pursuing/supervising research related to signal processing techniques applied to music and audio systems. At varying part-time levels, he was a founding consultant for Staccato Systems, Shazam Inc., and moForte Inc. He is presently a Professor Emeritus of Music and by courtesy Electrical Engineering at Stanford, and a perennial consultant for moForte Inc. and a few others. For more information, see http://ccrma.stanford.edu/~jos/.

Home > Q&A Sessions >

Live Q&A - Signal Processing Formulations of Sequence Models

Julius O. Smith III- Recording Soon Available - DSP Online Conference 2024

Live Q&A - Signal Processing Formulations of Sequence Models

Julius O. Smith III

Abstract Questions & Comments (8)

Live Q&A with Julius O. Smith III for the talk titled Signal Processing Formulations of Sequence Models

M↓ MARKDOWN HELP

italics	surround text with asterisks
bold	surround text with two asterisks
hyperlink	[hyperlink](https://example.com) or just a bare URL
code	surround text with `backticks`
~~strikethrough~~	surround text with ~~two tilde characters~~
quote	prefix with >

Upvotes Newest Oldest

Robert_W

Score: 0 | 2 months ago | 1 reply

Hello Dr Smith, Thanks for the discussion yesterday. I meant to ask a question during it, but the opportunity wasn't right, so I'll ask here. My entire career in DSP and embedded software has been driven by a love of music, and a desire to use technology to understand it better. And coming out of grad school, I had an idea of being able to decompose music into the associated chord progressions and notation, using DSP methods, and primarily FFT or frequency analysis. I'm a lifelong guitar player (electric maninly, Gibson SG), and can usually figure out note-by-note solos, but had trouble determing the chord progression of a song, and even more coveted, the position on the fretboard that chord was struck, using which strings. So as an example, using Cream's White Room that you had mentioned in the presentation yesterday, I could figure out the awesome wah solo, but I really wanted to know how Eric Clapton was fingering the rhythm section, since by ear, it was hard to know. I spent a long time personally pursuing that, as a DSP audio hobbyist (and learning a good deal about embedded DSP in the process). But it ended up as a glorified spectrum/audio analyzer, which took in the real-time audio, and could zoom filter and display various frequency bands, etc. And occassionaly determining some rudimentary chords, but only under the most ideal conditions; so it didn't come close to my original idea. One primary issue I encountered was that the interplay of fundamental and harmonics of a chord made it almost impossible to know what was the note, versus its harmonics. In my pursuit of this technology, I often ended up on your webpages, thanks.

My question is whether you are aware of that this problem has been solved, or is it the search for the Holy Grail, inherently unsolvable due to the physics and data characteristics involved? I'm aware that having a separate sampling channnel/amplifier on each string would make it easier, but I was more interested in taking any composition, recorded without such special setups.
Thanks again, for any review of my question, and your contributions to an area of lifelong interest to me (DSP audio!)
Best Regards,
Robert Wolfe

JOSmithIIISpeaker

Score: 0 | 2 months ago | 1 reply

Hi Robert,
Thanks for your kind words!
Transcribing Clapton's guitar playing and such fits within the classic problem "automatic transcription" and specifically "polyphonic F0 estimation". Nowadays, neural methods are probably best to pursue first, followed by comparison to, and conversion to, more classical methods. In this case, I'd say the "ideal answer" is a maximum-likelihood estimate of the playing parameters for each guitar string. Ideally you take advantage of all constraints, such as "there is only one left hand", and estimate left-hand position along with everything else. For neural starters, I would probably build on Audio Spectrogram Transformer (AST) which is based on Vision Transformer (ViT). Its output could be, e.g., the stopped fret for each string, and whether or not that string is sounding. For a 25-fret neck, adding open and muted states, that gives 27 * 6 = 162 output logits. For solos you also want to follow bending, of course, and that could be modeled as a second "original fret" output, or whatever. All that said, what I try to do as a guitar player is find a YouTube video showing his hands, and reverse engineer from there by ear.

Robert_W

Score: 0 | 2 months ago | 1 reply

Hello again, and thank you for the reply and information! I'll be looking further into your recommendations, which confirms my assumption that the problem has received a good deal of interest and research from fellow audio lovers and musicians of the world. A brief look into the 'polyphonic F0 estimation' approach also confirms my conclusion that it's not an easy problem to solve (if even). From wikipedia "There have been many attempts at multiple (aka polyphonic) F0 estimation and melody extraction, a related area" and "Since F0 tracking of all sources in a complex audio mixture can be very hard, we are restricting the problem to 3 cases...". Maybe it will be solved by future DSP and music heads :) Being able to input any music composition, be it Cream, or Bach on classical guitar (a much easier prospect), and get an immediate transcription, is just too enticing.

I've used YouTube live performances too :), not to mention some of the detailed instructional videos, which pretty much show what to play and where! Versus the old days, when the main option was infinite repeat of a record or cassette (record playback at 16 speed was useful for some of the more tricky solos :) )

Best Regards, and thanks again,
Robert

JOSmithIIISpeaker

Score: 0 | 2 months ago | 1 reply

I see there is an October 2023 review article on music transcription:
https://www.mdpi.com/2076-3417/13/21/11882
I would also find all citations to that (and other good papers you find) at Google Scholar to get fully updated.
Cheers,
Julius

Robert_W

Score: 0 | 2 months ago | no reply

An excellent paper, thanks!, right in the heart of the matter. Section 4.3 was particularly relevant to my findings: "Musical sounds often consist of a fundamental frequency and its harmonics. The presence of strong harmonics can lead to ambiguity in pitch estimation, as the algorithm may detect multiple potential fundamental frequencies that align with different harmonics."

I've been checking out some of the transcription tools cited, including AnthemScore, which was relatively easy to use and interpret. And even though it didn't exactly replicate polyphonic scores, it did get pretty close with a relatively clean Bach Fugue on classical guitar.

Fascinating!
Best Regards,
Robert

DanBoschen

Score: 0 | 2 months ago | 1 reply

Hi Dr. Smith,
During the Q&A discussion you had mentioned something about an interesting YouTube titled "Make More" (or something like that) - I made a note to ask you about that as it sounded interesting, could you post a link here?

JOSmithIIISpeaker

Score: 0 | 2 months ago | 1 reply

Sure: Andrej Karpathy's "micrograd" and "makemore" tutorials on YouTube:
https://www.youtube.com/@AndrejKarpathy
I especially recommend the series "Neural Networks: Zero to Hero"

DanBoschen

Score: 0 | 2 months ago | no reply

Super, that was it. Thank you!

Login

About Julius O. Smith III

Live Q&A - Signal Processing Formulations of Sequence Models

OUR PARTNERS