=== Arduino FFT Library ===

==== About the Arduino FFT Library ====

The Arduino FFT library is a fast implementation of the standard FFT algorithm.  It can give you up to 256 frequency bins at 16b depth, and a minimum of ~7ms update rate.  It is adjustable from 16 to 256 bins, and has several output methods to suit your needs.  It can be set to 16b linear, 8b linear, 8b logarithmic, or 8b octave output.  All of these different modes are detailed in the read_me file (inside the FFT library folder), including their relative speed and memory characteristics.

===== Speed characteristics =====

||Func|| run  || reorder || window || lin || lin8 || log ||
|| N || (ms) || (us) || (us) || (us) || (us)* || (us) ||
|| 256 || 6.32 || 412 || 608 || 588 || 470 || 608 ||
|| 128 || 2.59 || 193 || 304 || 286 || 234 || 290 ||
|| 64 || 1.02 || 97 || 152 || 145 || 114 || 144 ||
|| 32 || 0.37 || 41 || 76 || 80 || 59 || 74 ||
|| 16 || 0.12 || 21 || 37 || 46 || 30 || 39 ||

* Note: the lin8 values are approximate, as they vary a small amount due
to SCALE factor.  See #define section of the read_me for more detials.

===== Memory characteristics =====

|| Func || run || reorder || window || lin || lin8 || log ||
|| N || S/F(B) || F(B) || F(B) || S/F(B) || S/F(B) || S/F(B) ||
|| 256 || 1k/952 || 120 || 512 || 256/768 || 128/640 || 128/256 ||
|| 128 || 512/448 || 56 || 256 || 128/768 || 64/640 || 64/256 ||
|| 64 || 256/200 || 28 || 128 || 64/768 || 32/640 || 32/256 ||
|| 32 || 128/80 || 12 || 64 || 32/768 || 16/640 || 16/256 ||
|| 16 || 64/24 || 6 || 32 || 16/768 || 8/640 || 8/256 ||

S = SRAM, F = Flash

==== Files ====

Libraries: Arduino1.0 or Arduino-0022 (should work with both)

 * [[attachment:ArduinoFFT.zip|Arduino FFT library]]

==== Installing Libraries ====

The above files need to be placed in the ''libraries'' folder inside of your Arduino sketch directory.  After you unzip ArduinoFFT.zip, take the FFT folder and place it in your ''libraries'' folder, restart Arduino and load one of the example programs to test out the library.

If you are not certain where the ''libraries'' folder is located on your computer, try the following:

===== PC =====
Open up the Arduino software, and go to '''Sketch -> Add File...''', and a window will pop up that is your sketch folder.  This is usually ''C:\Documents and Settings\<your user name>\My Documents\Arduino''.  If you see a ''libraries'' folder, put the !AudioCodec library in there.  If you don't already have one, create the ''libraries'' folder in that directory.

===== Mac =====
Open up your Arduino sketchbook folder.  This is typically ''/Users/<your user name>/Documents/Arduino'', or ''/Users/<your user name>/Documents/Maple'' if you are using Maple.  If there is not a folder already named ''libraries'', you should create one and place the unzipped !AudioCodec library within it.  

==== Implementation Details ====

For those of you who want to look under the hood, let me give you a guided tour.  The speed improvements in this particular implementation are due to 2 things.  First, in any FFT, you must multiply the input variables by fixed cosine and sine constants.  This is what consumes the most time on the ATmega, as 16b x 16b multiplies take around 18 clock cycles.  On the other hand, 16b + 16b adds only take 2 clock cycles.  So, its better to add than it is to multiply.  As it turns out, a lot of those sine and cosine constants used in the FFT are just 0 or 1, so you don't have to multiply, and can just add.  For example, in a 256 point FFT, there are 1024 complex multiplies to be done, of which 382 do not need to be done as they are either 0 or 1.  Thats almost half of them!

The ArduinoFFT checks for those 0 or 1 conditions, and simply does adds instead.  as it turns out, those easy constants occur at regular intervals, and can be easily checked for.  The benefits of this sort of approach are limited for larger FFTs.  The total savings is (1.5*N - 2) for an N sized FFT, whereas the total number of multiplies is (N/2)*log2(N).  This gives a savings ratio of 3/log2(N), which drops as N increases.

The second set of time savings in this implementation comes from using lookup tables to calculate the square roots of the magnitudes.  The difficulty in this method is that the input mapping to the lookup table is much, much larger than the actual contents of the lookup table itself.  So, to not waste memory space, a compression of the input values must be done.  For example, taking the square root of a 16b value has 64k input values which must map down to 256 output values.  To have an answer hard coded into memory space for all of those inputs is impossible on the Arduino (and a waste in general).  So instead, i used a linear interpolation of the input space, with different slopes for different sections.  For the 8b linear output, this can be done with no loss of precision with either 3 or 4 linear sections.  This means that the input value can be checked for which section it lies in, and the square root fetched, in around 12 clock cycles.  This is much less than the usual 150 clock cycles that a standard square root library would require.

The 32b input version is slightly more difficult, as the output mapping space is now 16b (64k), and the linear mapping technique can not compress it any more than that.  In this case, i implement a hybrid approach where the input value is converted to a floating point value with 16b of precision plus 8b of exponent.  This can be done very quickly in base 2, and then the above 16b square root lookup table method can be used.  If the input compression is done in steps of x4, the output value can be reconstructed by shifting it back up in steps of x2.  basically, the exponent is forced to be an even value upon creation, so the square root can return an integer value.

This 32b version is not as precise as a true square root library, but it only takes around 40 clock cycles, compared to 500 for a true square root.  This lookup table version only gives an accurate first 8b on the return value, but for the purposes of this FFT, that is good enough.  The total bit depth of the FFT is not much past 12b since it is implemented in fixed point (each value must be divided by 2 before adding to prevent overflow - this gives an eventual divide by 256 for a 256 point FFT). The relative accuracy is a function of output value size.  For a return value of 8b, it is as close as you can get.  For a 9b value, its lsb might be wrong.  for a 10b value, 2 lsbs might be wrong, and so on.  So the worst case scenario is a 16b return value where you get +/-0.5% accuracy.

==== References ====

If you are interested in learning more about the FFT, here are some good resources that i used in writing my code.

 * [[http://elm-chan.org/docs/avrlib/avrfft.zip|ELM-ChaN FFT library:]] A very good implementation that is more portable and can handle imaginary inputs, larger FFT sizes, and a slightly more accurate output. But, it is slower, and not quite as Arduino friendly.  Be sure to check out the rest of the stuff on the [[http://elm-chan.org/cc_e.html|ELM-ChaN site]] as well, tons of great info.

 * [[http://www.alwayslearn.com/dft%20and%20fft%20tutorial/DFTandFFT_BasicIdea.html|FFT tutorial from alwayslearn.com:]] This site is great for breaking down how the fft works, into an easily understood format.  highly reccomended if you want to understand the butterfly operations.

 * [[http://www.katjaas.nl/home/home.html|Katja's homepage on sinusoids:]] This site is amazing. A must see!  Take the tour, buy the t-shirt!  It goes through and explains all sorts of crazy math things in a very fun, and excrutiatingly in-depth fashion.  i laughed, i cried, i learned a lot.  thanks.

 * [[http://en.wikipedia.org/wiki/Window_function|wikipedia article on window functions:]] A relatively good explanation of what window functions do, and why you need them. i mostly used it for the nice graph of the relative attenuation of various window functions. good for picking which one to use.