Home > Image Processing > 6. Fourier Transform Model of Image Formation

6. Fourier Transform Model of Image Formation

Discrete Fast Fourier Transform

Fourier analysis has long been used in optics as an effective means of predicting what occurs when light passes through various apertures, slits, and curved mirrors.  Now, it has gone further to be the foundational concept behind advanced techniques in image processing.  Fourier transformation creates a representation of an image in the frequency domain, from dimension X to dimension 1/X [1].  The FT is one technique used to make more compact representation of images (e.g. JPEG compression) by rounding its components to lower arithmetic precisions and removing weak signals.  The inverse of this FT provides an approximated reconstruction of the original image [2].

The Fourier transform of a 2-dimensional image f(x,y) is given by [3]:

which in discrete can be expressed as:

The Scilab tool contains a built-in function for calculating Fourier Transforms for 1- and 2-dimensional arrays, fft() and fft2().  For images, it is best to use the FFT function with images of dimension in power of 2 (e.g. 128×128).  Calling this function returns the result with the quadrants of the image interchanged.  The fftshift() function rearranges these blocks to their original positions (see Figure 1).

Figure 1. A. original quadrants. B. orientation of quadrants after FFT.

Noting that the FFT is a set of complex numbers, we view the intensities by taking absolute values using the abs() function.

Lens as Fourier Transformer

We can see this basic concept by simulating an image created by light entering a lens. For instance, we create a binary image of a circle – centered, colored white, and embedded on a black background.

I = imread('circle.bmp');
Igray = im2gray(I);

Alternatively, we can create the shape within Scilab and also create a gray representation of it (Figure 2A).

x = [-64:1:64];
[X,Y] = meshgrid(x);
r = sqrt(X.^2 + Y.^2);
circle = zeros(size(X, 1), size(X, 2));
circle(find (r <= 32)) = 1;
imshow(circle, []);
Igray = im2gray(circle);

We implement FFT on the image and then display the intensity of the result.

FIgray = fft2(Igray);   // FIgray is complex

The brackets passed to the imshow() function adjusts the image such that the maximum value is set to white (or 1) and the minimum is set to black (or 0).  Figure 2B shows the result of our transformation, however with quadrants still interchanged.

Figure 2. A. a binary image of a circle; B. the FFT of the circle; C. its shifted FFT; D. its inverse FFT

We apply fftshift() to have the quadrants back to their proper placements. This result is displayed in Figure 2C.

imshow(fftshift(abs(FIgray))), []);

Now, if we want to do an inverse of the FFT we have performed, we do this by applying another FFT on our previous result.


This approximately reconstructs our original circle (Figure 2D). However, though the resulting image appears similar to our input, it is an inverted version of the original.
We can verify this if we perform the same methods on a text character, as in Figure 3.

Figure 3. A. binary image containing character “A”; B. the FFT of the image; C. its shifted FFT; D. its inverse FFT


Convolution is a function that solves for the amount of overlap of one function g as it is shifted over the function f.  For 2-dimensional functions, convolution is expressed by [4]:

This equation basically describes the relationship between three signals: the input, the impulse response and the output.

Simulation of an imaging device

Photographs are usually defined by the transfer function (transform of the impulse response) of the camera, which is limited by finite radius of the lens. If minimal light rays are able to enter through the camera due to the lens, the image is reconstructed with lesser resolution.
To simulate this, we create an image composed of the text “VIP” covering about 50% of its total area, and a circle to serve as a lens aperture.

We take the FFT of the “VIP” image and multiply it with the fftshift of the circular aperture. We do not anymore perform FFT on the circle since it already is part of the Fourier plane. Our Scilab code will read as follows:

r = imread('circle.bmp');
a = imread('vip.bmp');

rgray = im2gray(r);
agray = im2gray(a);

Fr = fftshift(rgray);
Fa = fft2(agray);

FRA = Fr.*(Fa);
IRA = fft2(FRA); // inverse FFT
FImage = abs(IRA);

imshow(FImage, [ ]);

We can investigate on the properties (and integrity) of this method by observing the resulting image if allowed to pass through various aperture sizes.

Figure 4. The convolution formed from the text “VIP” through various aperture sizes. Top row: circular apertures of various diameters (32, 16, 8, 4 pixels). Bottom row: the resulting images after convolution with the different aperture sizes. NOTE: the “VIP” text was originally written upright.

The results in Figure 4 show that the size of the aperture importantly matters to produce a well-resolved image of the object and to create a better representation of it by allowing more light to enter through the lense.


Correlation is the function for measuring the degree of similarity between two functions [1].  For 2-dimensional objects, it is expressed as:

High resulting values mean high correlation, which occur when functions (or images) are greatly identical.  This is why correlation is widely used in image processing for pattern recognition.

Template matching / Pattern recognition

We test the method of correlation by creating an image containing a short text and correlate it with another image containing a letter frequently occurring in it.  We perform correlation by multiplying the FT of the single-letter image and the conjugated FT of the text image.  We then obtain the result by solving for the inverse FT of their product.  The Scilab implementation reads:

r = imread('a.bmp');
a = imread('spain.bmp');

rgray = im2gray(r);
agray = im2gray(a);

Fr = fft2(rgray);
Fa = conj(fft2(agray));

FRA = Fr.*(Fa);
IRA = fft2(FRA); // inverse FFT
FImage = abs(fftshift(IRA));
imshow(FImage, [ ]);

We obtain a resulting image representing areas on which the images are of high correlation (Figure 5C). However, the values are broadly scattered due to the large difference in the character size. We repeat the procedure but this time correlating with a character size more identical to the ones in the text.

Figure 5. A. a sample short text; B. our pattern, which is a particular letter repeating in the text; C. correlation of the images; D. pattern, now a single-letter image more identical to the ones in the short text; E. correlation of the images with more defined peaks.

Notice that the more identical the images, the higher the corresponding values are in the resultant matrix (Figure 5E).  We find the highest peaks (or the brightest pixels) on the specific words where the letter “A” is present (i.e. “rain”, “spain”, “mainly”, and “plain”).

Edge detection using the convolution integral

Edge detection in images can be a way of template matching.  For instance, we create a pattern within a 3×3 matrix such that the total of the values equal to zero.  We then convolve this with a sample image for edge detection via the imcorrcoef() function.

pattern = [-1 -1 -1; 2 2 2; -1 -1 -1];  // horizontal pattern

a = imread('vip.bmp');
agray = im2gray(a);

img = imcorrcoef(agray, pattern);

We also test other templates such as vertical, diagonal and spot.

pattern = [-1 2 -1; -1 2 -1; -1 2 -1];       // vertical
//pattern = [-1 -1 2; -1 2 -1; 2 -1 -1];     // diagonal right
//pattern = [-1 -1 -1; -1 8 -1; -1 -1 -1];   // spot pattern
Figure 6. Template matching as method for edge detection. Top row: 3×3 patterns (horizontal, vertical, diagonal and spot); Bottom row: the edges detected via correlation from an image with text “VIP”.

From our results in Figure 6, we notice that the edges more pronounced in the resulting image are those similar to the template with which the text is correlated with.  The letters “I” and “P” are the ones least detected with edges when related with the horizontal and diagonal templates.  The best detection, however, was obtained by the spot template which, in a sense, does not outline a specific pattern and thus was able to trace approximately all the boundaries of the text image.

Special thanks to Gladys R. for sharing her thoughts and immediate realizations, and Dr. Soriano for the supplementary insights.  I would rate myself 10 in this activity for comprehending the ideas and producing the desired outputs.
[1] Soriano, 2010. Fourier Transform Model of Image Formation. Applied Physics 186.
[2] Wikipedia, 2010. Fourier Analysis.
[3] Wang, R., 2007. Two-Dimensional Fourier Transform.
[4] Weisstein, E., 2010. Convolution. From MathWorld–A Wolfram Web Resource.
[5] Peters, R. II, 2008. Image processing: Spatial convolution. Vanderbilt University School of Engineering.

  1. JC
    February 21, 2013 at 3:17 am

    Awesome article ! Really explained the entire stuff in a very lucid and pratical manner

  1. January 25, 2011 at 11:09 pm
  2. November 15, 2017 at 6:26 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: