Constant-Q transform

In mathematics and signal processing, the constant-Q transform transforms a data series to the frequency domain. It is related to the Fourier transform[1] and very closely related to the complex Morlet wavelet transform.[2]

The transform can be thought of as a series of logarithmically spaced filters, with the k-th filter having a spectral width some multiple of the previous filter's width:

where δfk is the bandwidth of the k-th filter, fmin is the central frequency of the lowest filter, and n is the number of filters per octave.

Calculation

The short-time Fourier transform of x[n] for a frame shifted to sample m is calculated as follows:

Given a data series sampled at fs = 1/T, T being the sampling period of our data, for each frequency bin we can define the following:

This is shown below to be the integer number of cycles processed at a center frequency fk. As such, this somewhat defines the time complexity of the transform.
Since S/fk is the number of samples processed per cycle at frequency fk, Q is the number of integer cycles processed at this central frequency.

The equivalent transform kernel can be found by using the following substitutions:

After these modifications, we are left with

Fast calculation using FFT

The direct calculation of the constant-Q transform is slow when compared against the fast Fourier transform (FFT). However, the FFT can itself be employed, in conjunction with the use of a kernel, to perform the equivalent calculation but much faster.[3]

Comparison with the Fourier transform

In general, the transform is well suited to musical data, and this can be seen in some of its advantages compared to the fast Fourier transform. As the output of the transform is effectively amplitude/phase against log frequency, fewer frequency bins are required to cover a given range effectively, and this proves useful where frequencies span several octaves. As the range of human hearing covers approximately ten octaves from 20 Hz to around 20 kHz, this reduction in output data is significant.

The transform exhibits a reduction in frequency resolution with higher frequency bins, which is desirable for auditory applications. The transform mirrors the human auditory system, whereby at lower-frequencies spectral resolution is better, whereas temporal resolution improves at higher frequencies. At the bottom of the piano scale (about 30 Hz), a difference of 1 semitone is a difference of approximately 1.5 Hz, whereas at the top of the musical scale (about 5 kHz), a difference of 1 semitone is a difference of approximately 200 Hz.[4] So for musical data the exponential frequency resolution of constant-Q transform is ideal.

In addition, the harmonics of musical notes form a pattern characteristic of the timbre of the instrument in this transform. Assuming the same relative strengths of each harmonic, as the fundamental frequency changes, the relative position of these harmonics remains constant. This can make identification of instruments much easier.

Relative to the Fourier transform, implementation of this transform is more tricky. This is due to the varying number of samples used in the calculation of each frequency bin, which also affects the length of any windowing function implemented.

Also note that because the frequency scale is logarithmic, there is no true zero-frequency / DC term present, perhaps limiting possible utility of the transform.

References

  1. Judith C. Brown, Calculation of a constant Q spectral transform, J. Acoust. Soc. Am., 89(1):425–434, 1991.
  2. Continuous Wavelet Transform "When the mother wavelet can be interpreted as a windowed sinusoid (such as the Morlet wavelet), the wavelet transform can be interpreted as a constant-Q Fourier transform. Before the theory of wavelets, constant-Q Fourier transforms (such as obtained from a classic third-octave filter bank) were not easy to invert, because the basis signals were not orthogonal."
  3. Judith C. Brown and Miller S. Puckette, An efficient algorithm for the calculation of a constant Q transform, J. Acoust. Soc. Am., 92(5):2698–2701, 1992.
  4. http://newt.phys.unsw.edu.au/jw/graphics/notes.GIF
This article is issued from Wikipedia - version of the 9/14/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.