Just want to call out the resources listed at the bottom of the Resonate website:
- The Oscillators app demonstrates real-time linear, log and Mel scale spectrograms, as well as derived audio features such as chromagrams and MFCCs
https://alexandrefrancois.org/Oscillators/
- The open source python module noFFT provides python and C++ implementations of Resonate functions and Jupyter notebooks illustrating their use in offline settings. https://github.com/alexandrefrancois/noFFT
If we then scale this by some value, such that A y_i = z_i we can write this as
z_{i+1} = dt e^(k dt) z_i + A x_i
Here the `dt e^(k dt)` plays a similar role to (1-alpha) and A is similar to P alpha - the difference being that P changes over time, while A is constant.
We can write `z_i = e^{w dt i} r_i` where w is the imaginary part of k
Where p_i = e^{-w dt (i+1) } A = e^{-w dt ) p_{i-1}
Which is exactly the result from the resonate web-page.
The neat thing about recognising this as a convolution integral, is that we can use shaping other than exponential decay - we can implement a box filter using only two states, or a triangular filter (this is a bit trickier and takes more states). While they're tricky to derive, they tend to run really quickly.
This formulation is close to that of the Sliding Windowed Infinite Fourier Transform (SWIFT), of which I became aware only yesterday.
For me the main motivation developing Resonate was for interactive systems: very simple, no buffering, no window... Also, no need to compute all the FFT bins so in that sense more efficient!
I might be mistaking, but I don't see how this is novel. As far as I know, this has a proven DSP technique for ages, although it it usually only applied when a small amount of distinct frequencies need to be detected - for example DTMF.
When the number of frequencies/bins grows, it is computationally much cheaper to use the well known FFT algorithm instead, at the price of needing to handle input data by blocks instead of "streaming".
The difference from FFT is this is a multiresolution technique, like the constant-Q transform. And, unlike CQT (which is noncausal), this provides a better match to the actual behavior of our ears (by being causal). It's also "fast" in the sense of FFT (which CQT is not).
There exists the multiresolution FFT, and other forms of FFT which are based around sliding windows/SFFT techniques. CQT can also be implemented extremely quickly, utilising FFT's and kernels or other methods, like in the librosa library (dubbed pseudo-CQT).
I'm also not sure how this is causal? It has a weighted-time window (biasing the more recent sound), which is farily novel, but I wouldn't call that causal.
This is not to say I don't think this is cool, it certainly looks better than existing techniques like synchrosqueezing for pushing the limit of the heisenberg uncertainty principle (technically given ideal conditions synchrosqueezing can outperform the principle, but only a specific subset of signals).
For some reason the value of Pi given in the C++ code is wrong!
It's given in the source as 3.14159274101257324219
when the right value to the same number of digits is
3.14159265358979323846. Very weird. I noticed when I went to look at the C++ to see how this algorithm was actually implemented.
seems since it's a float it's only 32-bits, and the representation of both 3.14159274101257324219 and 3.14159265358979323846 is the same in IEEE-754: 0x40490fdb
though I agree that it is odd to see, and not sure I see a reason why they wouldn't use 3.14159265358979323846
Yeah, it’s as if they wrote a program to calculate pi in a float and saved the output. Very strange choice given how many places the value of pi can be found.
Indeed... I honestly don't remember where or how I sourced the value, and why I did not use the "correct" one - I will correct in the next release of the package. Thanks for pointing it out!
Nice! I've used a homegrown CQT-based visualizer for a while for audio analysis. It's far superior to the STFT-based view you get from e.g. Audacity, since it is multiresolution, which is a better match to how we actually experience sound. I have for a while wanted to switch my tool to a gammatone-filter-based method [1] but I didn't know how to make it efficient.
Actually I wonder if this technique can be adapted to use gammatone filters specifically, rather than simple bandpass filters.
If you already have the implementation for the CQT, wouldn't you just be able to replace the morlet wavelet used in the CQT by the gammatone wavelet without much of on efficiency hit? I'm just learning about the gammatone filter, and it sounds interesting since it apparently better models human hearing.
This is very much like doing a Fourier Transform without using recursion and the butterflies to reduce the computation. It would be even closer to that if a "moving average" of the right length was used instead of an IIR low-pass filter. This is something I've considered superior for decades but it does take a lot more computation. I guess we're there now ;-)
It only requires more computation if you really need to compute the full FFT with all the bins, in which case the FFT is more efficient... With this approach you only compute the bins you really need, without having to pre-filter your signal, or performing additional computations on the FFT result.
Some sliding window FFT methods compute frequency bands independently, but they do require buffering and I really wanted to avoid that.
Curious if there is available math to show the gain scale properties of this technique across the spectrum -- in other words its frequency response. The system doesn't appear to be LTI so I don't believe we can utilize the Z-transform to do this. Phase response would also be important as well.
The Sliding Windowed Infinite Fourier Transform (SWIFT) has very similar math, and they provide some analysis in the paper. I use a different heuristic for alpha so I am not sure the analysis transfers directly. In my upcoming paper I have some numerical experiments and graphs that show resonator response across the range.
Actually digging into SWIFT a bit more, the formulas differ by more than just the heuristic for alpha (unless I missed something) so the analysis in the SWIFT paper does not apply directly to(or maybe even at all).
Nice! Can any signals/AI folks comment on whether using this would improve vocoder outputs? The visuals look much higher res, which makes me think a vocoder using them would have more nuance. But, I'm a hobbyist.
yes - the sample app has demo of single resonator (so frequency bin equivalent) frequency estimation/tracking based on phase shift and also Doppler velocity computation (the code for these is in the Swift package, equations in the upcoming paper...).
this video is from an older version of the demo app (less efficient implementation but same principle): https://www.youtube.com/watch?v=iQCPDJ8L_ao
Cool, thanks! I'm currently building a eurorack module where I need to estimate the frequency and phase of a sequence of input gate signals, and an issue I've run into is the delay inherent in the STFT algorithm. This seems like it might work better!
Just want to call out the resources listed at the bottom of the Resonate website:
- The Oscillators app demonstrates real-time linear, log and Mel scale spectrograms, as well as derived audio features such as chromagrams and MFCCs https://alexandrefrancois.org/Oscillators/
- The Resonate Youtube playlist features video captures of real-time demonstrations. https://www.youtube.com/playlist?list=PLVcB_ABiKC_cbemxXUUJX...
- The open source Oscillators Swift package contains reference implementations in Swift and C++.https://github.com/alexandrefrancois/Oscillators
- The open source python module noFFT provides python and C++ implementations of Resonate functions and Jupyter notebooks illustrating their use in offline settings. https://github.com/alexandrefrancois/noFFT
You can view this result as the convolution of the signal with an exponentially decaying sine and cosine.
That is, `y(t') = integral e^kt x(t' - t) dt`, with k complex and negative real part.
If you discretize that using simple integration and t' = i dt, t = j dt you get
If we then scale this by some value, such that A y_i = z_i we can write this as Here the `dt e^(k dt)` plays a similar role to (1-alpha) and A is similar to P alpha - the difference being that P changes over time, while A is constant.We can write `z_i = e^{w dt i} r_i` where w is the imaginary part of k
Where p_i = e^{-w dt (i+1) } A = e^{-w dt ) p_{i-1} Which is exactly the result from the resonate web-page.The neat thing about recognising this as a convolution integral, is that we can use shaping other than exponential decay - we can implement a box filter using only two states, or a triangular filter (this is a bit trickier and takes more states). While they're tricky to derive, they tend to run really quickly.
This formulation is close to that of the Sliding Windowed Infinite Fourier Transform (SWIFT), of which I became aware only yesterday.
For me the main motivation developing Resonate was for interactive systems: very simple, no buffering, no window... Also, no need to compute all the FFT bins so in that sense more efficient!
I might be mistaking, but I don't see how this is novel. As far as I know, this has a proven DSP technique for ages, although it it usually only applied when a small amount of distinct frequencies need to be detected - for example DTMF.
When the number of frequencies/bins grows, it is computationally much cheaper to use the well known FFT algorithm instead, at the price of needing to handle input data by blocks instead of "streaming".
The difference from FFT is this is a multiresolution technique, like the constant-Q transform. And, unlike CQT (which is noncausal), this provides a better match to the actual behavior of our ears (by being causal). It's also "fast" in the sense of FFT (which CQT is not).
There exists the multiresolution FFT, and other forms of FFT which are based around sliding windows/SFFT techniques. CQT can also be implemented extremely quickly, utilising FFT's and kernels or other methods, like in the librosa library (dubbed pseudo-CQT).
I'm also not sure how this is causal? It has a weighted-time window (biasing the more recent sound), which is farily novel, but I wouldn't call that causal.
This is not to say I don't think this is cool, it certainly looks better than existing techniques like synchrosqueezing for pushing the limit of the heisenberg uncertainty principle (technically given ideal conditions synchrosqueezing can outperform the principle, but only a specific subset of signals).
For some reason the value of Pi given in the C++ code is wrong!
It's given in the source as 3.14159274101257324219 when the right value to the same number of digits is 3.14159265358979323846. Very weird. I noticed when I went to look at the C++ to see how this algorithm was actually implemented.
https://github.com/alexandrefrancois/noFFT/blob/main/src/Res... line 31.
seems since it's a float it's only 32-bits, and the representation of both 3.14159274101257324219 and 3.14159265358979323846 is the same in IEEE-754: 0x40490fdb
though I agree that it is odd to see, and not sure I see a reason why they wouldn't use 3.14159265358979323846
Yeah, it’s as if they wrote a program to calculate pi in a float and saved the output. Very strange choice given how many places the value of pi can be found.
Indeed... I honestly don't remember where or how I sourced the value, and why I did not use the "correct" one - I will correct in the next release of the package. Thanks for pointing it out!
You got off easy compared to this dude https://en.wikipedia.org/wiki/William_Shanks
At least I might have introduced a bit of creative noise in some AI coding models :-P
That is a very 'childhood exposure to 8 digit calculators' thing to notice.
Childhood exposure to pi generation algorithms; the correct version above was from memory.
Close enough! The wrong 7 jumped out at me instantly although I didn't remember more than a few after.
Nice! I've used a homegrown CQT-based visualizer for a while for audio analysis. It's far superior to the STFT-based view you get from e.g. Audacity, since it is multiresolution, which is a better match to how we actually experience sound. I have for a while wanted to switch my tool to a gammatone-filter-based method [1] but I didn't know how to make it efficient.
Actually I wonder if this technique can be adapted to use gammatone filters specifically, rather than simple bandpass filters.
[1] https://en.wikipedia.org/wiki/Gammatone_filter
If you already have the implementation for the CQT, wouldn't you just be able to replace the morlet wavelet used in the CQT by the gammatone wavelet without much of on efficiency hit? I'm just learning about the gammatone filter, and it sounds interesting since it apparently better models human hearing.
Thanks for your contribution! Reminds me of Helmholtz resonators.
I wrote this cross-disciplinary paper about resonance a few years ago. You may find it useful or at least interesting.
https://www.frontiersin.org/journals/neurorobotics/articles/...
Interesting - thanks for sharing!
This is very much like doing a Fourier Transform without using recursion and the butterflies to reduce the computation. It would be even closer to that if a "moving average" of the right length was used instead of an IIR low-pass filter. This is something I've considered superior for decades but it does take a lot more computation. I guess we're there now ;-)
It only requires more computation if you really need to compute the full FFT with all the bins, in which case the FFT is more efficient... With this approach you only compute the bins you really need, without having to pre-filter your signal, or performing additional computations on the FFT result. Some sliding window FFT methods compute frequency bands independently, but they do require buffering and I really wanted to avoid that.
Curious if there is available math to show the gain scale properties of this technique across the spectrum -- in other words its frequency response. The system doesn't appear to be LTI so I don't believe we can utilize the Z-transform to do this. Phase response would also be important as well.
The Sliding Windowed Infinite Fourier Transform (SWIFT) has very similar math, and they provide some analysis in the paper. I use a different heuristic for alpha so I am not sure the analysis transfers directly. In my upcoming paper I have some numerical experiments and graphs that show resonator response across the range.
Actually digging into SWIFT a bit more, the formulas differ by more than just the heuristic for alpha (unless I missed something) so the analysis in the SWIFT paper does not apply directly to(or maybe even at all).
Nice! Can any signals/AI folks comment on whether using this would improve vocoder outputs? The visuals look much higher res, which makes me think a vocoder using them would have more nuance. But, I'm a hobbyist.
Can this process estimate the phase of the input signal in a given frequency bucket similar to the DFT?
yes - the sample app has demo of single resonator (so frequency bin equivalent) frequency estimation/tracking based on phase shift and also Doppler velocity computation (the code for these is in the Swift package, equations in the upcoming paper...). this video is from an older version of the demo app (less efficient implementation but same principle): https://www.youtube.com/watch?v=iQCPDJ8L_ao
Cool, thanks! I'm currently building a eurorack module where I need to estimate the frequency and phase of a sequence of input gate signals, and an issue I've run into is the delay inherent in the STFT algorithm. This seems like it might work better!
Awesome - The code (Swift or C++) in the Oscillators package is probably the best place to look for implementation details https://github.com/alexandrefrancois/Oscillators
ping me if you have any question