Haskell FFT 4: A Simple Application

November 21, 2013

[The code from this article is available as a Gist.]

Before we go on to look at how to deal with arbitrary input vector lengths (i.e. not just powers of two), let’s try a simple application of the powers-of-two FFT from the last article.

We’re going to do some simple frquency-domain filtering of audio data. We’ll start with reading WAV files. We can use the Haskell WAVE package to do this – it provides functions to read and write WAV files and has data structures for representing the WAV file header information and samples.

We’ll use a sort of sliding window FFT to process the samples from each channel in the WAV file. Here’s how this works:

Audio windowing

Audio windowing

We split the audio samples up into fixed-length “windows” (each ww samples long) with adjacent windows overlapping by oo samples. We select ww to be a power of 2 so that we can use our power-of-two FFT algorithm from the last section. We calculate the FFT of each window, and apply a filter to the spectral components (just by multiplying each spectral component by a frequency-dependent number). Then we do an inverse FFT, cut off the overlap regions and reassemble the modified windows to give a filtered signal.

I’m not going to talk much about the details of audio filtering here. Suffice it to say that what we’re doing isn’t all that clever and is mostly just for a little demonstration – we’ll definitely be able to change the sound of our audio, but this isn’t a particularly good approach to denoising speech. There are far better approaches to that, and I may talk about them at some point in the future.

Anyway, let’s look at some code:

First we define a very simple data structure for holding the information about a filter and we define some filters. The filtering function is defined as a vector of complex numbers (which must be the same length as the window size we specify), and we apply it just by multiplying into the spectra element-by-element. One thing we’re not bothering with here is power equalisation – if we use a filtering function that radically alters the overall magnitudes of the audio samples, we’ll end up with something that sounds much louder or softer than the original. It’s better to stick with filtering functions ff for which |f(x)|2dx=1\int_{-\infty}^\infty |f(x)|^2 \, dx = 1. We’ll gloss over this detail. (We’re also not going to talk about windowing functions or anything else filtering-related. This is an example where we’re going to take almost the dumbest approach possible just to have a play.

Then we have a simple driver (Listing~) to read our WAV file, apply the filter (using the runFilter function) and write the result back to a new WAV file:

The only slight annoyance here is flipping back and forth between the WAV files “frame”-based view, which bundles all the samples at one time point for all channels together, and a channel-based view, where we get all the samples for a channel as a single vector that we can manipulate. We also deal with making sure that the signal we pass to the filter fits the window and overlap sizes we’re using.

Finally, we have the audio filtering pipeline itself:

We define a handy sequencing operator which is just normal function composition with the arguments reversed (oddly enough, this doesn’t seem to have a standard name in Haskell), so we can just string operations together in something that looks like a signal processing pipeline. We slice up the data into windows, Fourier transform, apply the filtering function, inverse Fourier transform, cut off the window overlaps, concatenate everything back together and convert back to real values and we’re done.

This is all wrapped up into a little standalone program that we can run from the command line giving input and output WAV file names, along with an index into the filters list to select a filter to apply. So, what can we do with this? I recorded a few seconds of speech using no care at all and the crappiest microphone I could find, producing a WAV file with lots of noise. Let’s apply a series of different filters to it and see what we get. In all cases, we take a window size of 1024 samples (about 0.02 s at a sample rate of 44,100 Hz). We use a window-to-window overlap of either zero or 256 samples (i.e. a quarter of the window length at each end is overlapped with the adjacent windows), and we either use a “hard” filter which cuts out the middle half of the spectral elements (corresponding to zeroing out spectral components for frequencies above about 11&thinksp;kHz) or a “smooth” cosine-shaped filter (which is one at zero frequency and falls to zero at the Nyquist frequency).

The following links go to WAV files:

  • Original signal – very noisy, sounds like it might have been recorded on one of Thomas Edison’s old wax cylinder rigs.
  • No overlap, hard filter – slightly less noisy, but it sounds like I’m in a bucket and you can hear a “clicking” artefact from the lack of window overlap.
  • Overlap, hard filter – less noise, no clicking, still a little bit “buckety”.
  • No overlap, smooth filter – same clicking artefact as in the “hard” filter case with no overlap.
  • Overlap, soft filter – less noise, no clicking, not very buckety: definitely the best result.

There’s obviously relatively little merit to this in terms of making the sound sound nicer, but it shows how easy it is to fiddle with this kind of audio processing – the “filtering” function we’ve been using is a complex vector, so we could do all sorts of phase shifting things with it as well as modifying the amplitudes of the spectral components.

Anyway, I thought a post with no equations in it at all might be a nice break. In the next article, there will be quite a lot of equations and a fair bit head-scratching as we extend the “toy algebra” system to help understand the mixed-radix FFT for arbitrary vector lengths.