12. Color camera processing

September 23, 2010 Leave a comment

A digital image is made of red, green and blue components overlaid at different levels. Each pixel is defined by a summation of the product of the spectral power distribution of the incident light source S(λ), the surface reflectance ρ(λ) and the spectral sensitivity of the camera η(λ).  If a camera has spectral sensitivity η(λ), η(λ), η(λ) for its RGB channels, the color of the pixel is defined as:

whereis the balancing constant that is equal to the inverse of the camera output for a white object ρ(λ).  The equations tell us why some pictures captured by cameras do not appear satisfactory.

White Balancing

Digital cameras have settings called WHITE BALANCE or BW, which allows us to select appropriate constants for specific conditions when capturing images.  Illumination conditions would depend whether it is sunny, cloudy or under the light of a bulbs.  Some cameras include an AWB or Automatic White Balance.  Learning how to set these in the camera allows obtaining an image wherein the white color is properly represented in the image and the other colors are also rendered properly.

Automatic White Balancing Algorithms

There are two popularly known algorithms for automatic white balancing.  The equations for RGB tell us the raw output of the camera divided by the camera output for a white object.  The White Patch Algorithm uses a white object and utilizes its RGB values as divider. The Gray World Algorithm, on the other hand, assumes that the average color of the world is gray.  It obtains balancing constants by taking the average RGB values of the image.

For instance we assemble a group of colored objects (with major hues represented) with a white background.  We choose a specific light source (here, fluorescent lamp) to capture them on image.

We locate a pixel belonging to a white object and with it set the hue values for white (Rw, Gw, Bw).

im = imread('A1. daylight.png');

R = im(:, :, 1);
G = im(:, :, 2);
B = im(:, :, 3);

//white object at (108, 213)
R_white = R(108, 213);
G_white = G(108, 213);
B_white = B(108, 213);

We implement the White Patch Algorithm by dividing all the RGB values of the pixels with their respective white balancing constants.

R_balanced = R / R_white;
G_balanced = G / G_white;
B_balanced = B / B_white;

To make sure that there are no pixels exceeding 1.0, we apply clipping on them to 1.0.

R_balanced(find(R_balanced > 1)) = 1;
G_balanced(find(G_balanced > 1)) = 1;
B_balanced(find(B_balanced > 1)) = 1;

im(:, :, 1) = R_balanced;
im(:, :, 2) = G_balanced;
im(:, :, 3) = B_balanced;

Same process will be performed for Gray World Algorithm, except that the white balancing constants for RGB should be the average value for the RGB planes.

R_white = mean(R);
G_white = mean(G);
B_white = mean(B);

The resulting images from these algorithms are shown in Figure 1 below.

White Balance White Patch Gray World
Figure 1. Captured images at different white balance camera settings and resulting images after applying the White Patch and Gray World aAlgorithms.

From the images above, we can see that raw images captured at daylight and shade settings produce good approximations of the actual fluorescent setup.  However, the daylight image has little influences of blue, as well as tungsten which almost saturates the entire picture. The images corrected show fair results after applying the two algorithms.  The White Patch algorithm produced a good correction of the color values in a moderate way.  The output from the Gray World algorithm, however, put in too much brightness on the image, which we can suppose have resulted from several clipped pixels along the image.

We repeat these steps but now apply them to objects with similar hues and only put a small white object as reference.  Figure 2 shows the results after applying the two algorithms.

White Balance White Patch Gray World
Figure 2. Captured images (of objects with similar hue) at different white balance camera settings and resulting images after applying the White Patch and Gray World Algorithms.

From the images above we can see the difference in accuracies of the resulting images upon performing the white balancing algorithms when the image is not dominated by white colors.  We can see that the images processed with the Gray World algorithm was altered significantly due to the lack of white background.  The White Patch algorithm, however, maintains good rendering of the colors and maintains a good representation of the objects.

From these outputs, the more favorable method to use (and recommend) is the White Patch algorithm for the reason that it is more accurate in representation and more safe for pictures under various settings.

For this segment, I would rate myself 10 for being able to capture and process the images as prescribed by the algorithms.

Credits: Dennis D., Alex C., and Tin C. (for lending me her camera).

[1] Soriano, 2010. Color camera processing. Applied Physics 186.
[2] The Scilab Reference Manual. 2010.


10. Binary Operations

August 31, 2010 2 comments

Important applications, such as biomedical imaging, need the performance of accurate area detection and measurement, especially in disease examinations (e.g. malaria parasite detection, cancer cell detection). Oftentimes, noisy background obstructs the region of interest (ROI), which causes false interpretations from the image data.  Thresholding an image to binary is not able to make up for these errors due to the illegitimate features still present in the grayscaled specimen.

Methods in morphological image processing offer operators that are able to transform images and clean them from unwanted artifacts.  A good combination of the erosion and dilation techniques accomplish more sophisticated morphological operations that achieve image cleaning.  The morphological opening operator is an erosion followed by a dilation, using the same structuring element for both operations.  It reduces the size of areas that are of smaller size compared to the strel.  The closing operator, on the other hand, performs a reverse of this – it fills in small holes along areas similar to the strel.  A procedure doing opening followed by closing makes an intelligent workhorse for morphological noise removal.

For instance we have a specimen image similar to a microscope photo-capture of human cells (Figure 1).

Figure 1. An image of scattered punched paper digitized using a flatbed scanner. Suppose these circles are cells under the microscope. Courtesy of Soriano, M. [1].

We cut the big picture into sub-images of dimension 256 x 256 with slight overlap.  We perform binarization with a threshold that best captures the important parts of the image with minimal noise in the background.

Figure 2. Binarized sub-images of the cell image at optimal threshold.

For each of these images, we apply the opening and closing operations, using a structuring element shaped as a circle with a size larger than the background noise and about similar to the cells in the image.  This can then be implemented in Scilab as follows:

im = erode(im, se);    // opening, se being the cirular strel
im = dilate(im, se);
im = dilate(im, se);   // closing
im = erode(im, se);

The image, after this noise reduction method, and recombining the sub-images, produces a much excellent capture of the cells – our regions of interest (ROI) (Figure 3).

Figure 3. Cleaned image of the cell circles after undergoing morphological opening and closing operations.

In order to be able to calculate for the areas, we use the function bwlabel to distinguish connected areas in the image.

[L, n] = bwlabel(img);
area = [];
count = 1;

// scan regions for connected areas
for i = 1:n
   area(count) = length(find(L==i));
   count = count + 1;

Curiosity would raise a question on how to determine the area of the cells considering there are overlapping cells in the image. The best solution to this is to look into the histogram of the areas detected and find the range at which we are certain that overlaps are not involved.

scf(0); histplot(length(area), area);

This gives us a plot that tells us that most of the detected areas (which in our special case, are the unconnected ones) fall under the range 450-600 pixels (Figure 4A).  We can zoom into this range by creating a separate histogram for it (Figure 4B).

cells = find(area<600 & area>450);
scf(1); histplot(length(cells), area(cells));
Figure 4. Histogram of areas detected in the cleaned image. A. for all areas  B. for areas with no overlapping cells.

We can now calculate for the average area of the cells in this range, implemented by:

all_cells = area(cells);
average_area = sum(all_cells)/length(cells) //area
stdev_area = stdev(area(cells))             //error

Our calculations will result to an area A = 536.28 + 23.17.  To verify the integrity of this value, we attempt to measure the average area of the cells in a sub-image at which we are sure of no overlaps (Figure 5).

Figure 5. 10th sub-image of the cleaned binarized cells.

This gives us an area value A = 533.60 + 25.58, which is of no significant difference to our actual measurement.

Circles with cancer
Now that we know the area of the cells, we can now use these methods for cancer cell detection.  Suppose we have an image of cells now with the presence of enlarged blobs to portray abnormal cancer cells (Figure 6).

Figure 6. Another set of punched papers now with some “cells” bigger than the rest. Courtesy of Soriano, M. [1].

The technique here is to make an intelligent use of the morphological operations such that we can isolated the enlarged cells.  We instead use a structuring element with area greater than our previously measured area A.  We use a circular strel with radius R > sqrt[(μarea + σ) / π].  Oftentimes, R would need a larger value in order to eliminate clustered cells.  This will result to a binary image with the cancer cells isolated (Figure 7).

Figure 7. Isolated “cancer” cells from the set of punched papers.

Current intelligent systems already make more sophisticated applications of image isolation, wherein an image of a target property is captured separately based on a user-specified property identifier.  Nonetheless, it is the morphological operations that make up the fundamentals of these complex machines.

Credits to Dr. Soriano for the valuable tips and discussions.  I would rate myself 10 for doing an awesome job and enjoying the learning at the same time.

[1] Soriano, 2010. Binary operations. Applied Physics 186.
[2] Mathworks, 2010. Morphology fundamentals: Dilation and erosion. Image processing toolbox.

Now that we know the area of the cells, we can now use this for cancer cell detection

9. Morphological Operations

August 8, 2010 3 comments

Morphological operations, such as erosion and dilation, transform images such that each output pixel is defined by the application of a rule on the input pixel and its neighbors.  The following table (adapted from Mathworks) summarizes the rules for these basic operations:

Operation Rule
Erosion The value of the output pixel is the minimum value of all the pixels in the input pixel’s neighborhood. In a binary image, if any of the pixels is set to 0, the output pixel is set to 0.
Dilation The value of the output pixel is the maximum value of all the pixels in the input pixel’s neighborhood. In a binary image, if any of the pixels is set to the value 1, the output pixel is set to 1.

The neighborhood of a certain pixel is defined by the structuring element used in the operation.  The function applies the rule to the input pixel and its neighbors based on the center point of the structuring element, and then assigns a value to the corresponding output pixel.  Figure 1 illustrates dilation with a 3-pixel horizontal line.

Figure 1. Morphological dilation operation with a horizontal line.  Courtesy of Mathworks.

Morphological erosion and dilation operations follow the Set Theory.  The erosion of A by structuring element B is expressed as:

This results to the reduction of the image by the shape of B.

On the other hand, the dilation of A by structuring element B is denoted by:

This results to the expansion or elongation of A in the shape of B.


In the succeeding sections, we will see the actual effect of these morphological operations on images with certain shapes when they are operated with particular strels (shortcut for structuring elements).  We have the following initial shapes:

  1. a 5×5 square
  2. a triangle (base = 4 boxes, height = 3 boxes)
  3. a 10×10 square hollow (2 boxes thick)
  4. a plus sign (one box thick, 5 boxes along each line)

The structuring elements with which we will apply the morphological rules with will be the following:

  1. 2×2 ones
  2. 2×1 ones
  3. 1×2 ones
  4. a cross (3 pixels long, one pixel thick)
  5. a diagonal line (2 pixels long, i.e. [0 1; 1, 0])

We apply the operations using Scilab’s erode() and dilate() functions, which are implemented through the following:

E = erode(Img, [SE, center])

where Img is the MxN binary image to be eroded, SE the arbitrary structuring element represented as a binary array, and center the origin of the structuring element.  Similar implementation is done with the dilate() function.

Note that the center of each structuring element should be assigned to a pixel of value 1 or “ON”.  The center for the cross strel will be pixel [2, 2].  For strels with even-numbered dimensions such as first three items listed, the center can be assigned to pixel [1, 1], while diagonal strel can have the pixel [2, 1] as center.


The transformations resulting after erosion are shown in Figure 2, where the first row displays the initial shapes and the leftmost column displays the structuring elements.

Figure 2. Morphological erosion of basic shapes by certain structuring elements.

On binary images, erosion reduces the figures by the shape of the structuring element.  On grayscale images, the operation reduces the brightness of objects.  On certain practical applications, it is able to remove small irregularities from the image by subtracting objects that are of smaller radius compared to the structuring element.


The results for morphological dilation is shown below (Figure 3).

Figure 3. Morphological dilation of basic shapes by certain structuring elements.

The dilation of binary images connects sections that are parted by spaces smaller compared to the structuring element.  On grayscale images, this function increases the brightness of objects.  Practical applications of this is usually to increase the size of objects and remove holes and broken areas.

Other operations such as skel and thin also impose different rules in processing images.

Using thin performs thinning of the image by means of border deletion using the Zhang-Suen technique.  The function gives an output similar to Figure 4, which was implemented simply by:

out = thin(img)
Figure 4. Morphological thinning of a stick figure.

The function skel, on the other hand, does a skeletonization of the image using the Medial Axis Transform.  Some optional parameters for processing include side [interior (default), exterior or both] and algorithm [fast euclidean (default) or exact euclidean].  It can be performed with the following script:

[skl,dt,lbl] = skel(img, 'interior', 'fast euclidean'])

Below are the figures resulting from varying the side parameter.

Figure 5. Morphological skeletonization of a stick figure using fast euclidean algorithm. Left to right: interior side, exterior side, both sides.

The fast and exact euclidean algorithm make little difference in their results.  It is only that the exact method runs relatively slower than the other.

These basic morphological operations are often used for further image processing and/or extracting information from the image.  An article following this post (soon) aims to make useful application of these methods.

I would rate myself 11 for this activity for completing the tasks and investigating for other morphological functions.

[1] Soriano, 2010. Morphological operations. Applied Physics 186.
[2] Mathworks, 2010. Morphology fundamentals: Dilation and erosion. Image processing toolbox.
[3] IDL Astronomy User’s Library, 2007. Eroding and dilating image objects. NASA Goddard Space Flight Center.
[4] Wikipedia, 2010. Mathematical morphology.
[5] SIP, 2010. Scilab Image Processing Toolbox. Sourceforge Reference.

Categories: Image Processing Tags: ,

8b. Enhancement in the Frequency Domain

July 29, 2010 8 comments

Improving the quality of an image can often be done by manipulating the image’s frequency aspect.  Here we demonstrate basic applications of eliminating unwanted signals and patterns by means of the Fourier Transform.

Fingerprints: Ridge Enhancement
Clarity of patterns are very important in fingerprint recognition, especially in forensic science.  In cases when fingerprint data are unsatisfactorily represented, a filtering technique can be used to improve its ridges.

Figure 1. An example of an unprocessed fingerprint image.

An enhancement of the elevations and edges can be achieved by creating a filtering mask on the Fourier Transform of the image that allows only the frequencies of the fingerprint to pass through (Figure 2).

Figure 2. Creation of a filter mask for ridge enhancement. A. original FT of the fingerprint; B. Gaussian masks; C. filtered FT of the image.

The FT of the fingerprint is better viewed in the logarithmic scale, using Scilab’s log() function:

imshow(log(fftshift(abs(fingerprint_gray))), []);

The masking is performed by doing a bitwise multiplication on the FT of the fingerprint and the FT of the filter, and then calculating for product’s inverse Fourier Transform.

fingerprint = gray_imread('fingerprint.png');
filter = gray_imread('fft_fingerprint_filter.png');
newI = ifft(fftshift(filter).*fft2(fingerprint));
imshow(abs(newI), []);

The resulting image will be a fingerprint pattern composed of better defined ridges, with clearer elevations and improved contrast (Figure 6).  We particularly used a smooth mask (similar to a 2D Gaussian) to filter the FFT, instead a hard-edged mask (e.g. a circle), in order to avoid the effect of diffraction patterns (an airy disk, in the case of the circle).  This is part of considering the Fourier Transform of the circle, if we recall the properties of the 2D Fourier Transform.

Figure 3. Fingerprint ridge enhancement. A. a grayscale of the original fingerprint; B. an enhanced image of the fingerprint after filtering.

Note that filtering does not at all times need to be done using Gaussian masks.  The filter to be applied would depend on the initial Fourier Transform and is created by carefully identifying which of the frequencies belong to the desired parts of the image.

Lunar Landing Scanned Pictures: Line Removal
Pictures taken from space are sometimes developed on-board the spacecraft and then sent to Earth in their digitized form.  An example is the following image captured by the Lunar and Planetary Institute (LPI) during the Apollo 11 Mission.  The noticeable vertical and horizontal lines are said to have resulted from combining image “framelets” to form the composite image.

Figure 4. A composte image of the surface of the moon. Courtesy of USRA.

We will expect the frequencies of these lines to appear on the FT of the image along the x- and y-axis of the Fourier plane.  We can eliminate these unwanted patterns by creating a mask against these peaks.

Figure 5. Creation of filter mask for line removal. A. original FT of the image; B. small Gaussian masks on peaks along the frequency axes; C. filtered FT of the image.

Note that this time we used black shaded patterns on a white canvas.  This is to cancel to zero the occurrence of the lines and allowing only the frequencies of the desired parts of the image to pass through.  The result is shown in Figure 6.

Figure 6. Line removal technique in the frequency domain. A. a grayscale of the original lunar landing photo; B. an enhanced image after filtering.

Notice that the presence of the lines on the photo are now minimized (if not totally removed) in the resulting image.  The quality of the output image can be further improved by greater accuracies in filtering.

Canvas Weave Modeling and Removal
The original beauty of a painting or artwork can sometimes be obstructed by patterns from the surface of the canvas.  An example is the following image of the detail of an oil painting obtained from the UP Vargas Museum Collection [1].

Figure 7. Detail of an oil painting. Courtesy of the UP Vargas Museum.

The underlying weave patterns are characterized by diagonally-oriented lines, as shown in its Fourier Transform in Figure 8A.  We remove these lines by blocking their frequency peaks with our mask filter.

Figure 8. Creation of filter mask for weave removal. A. original FT of the image; B. small Gaussian masks on diagonal peaks; C. filtered FT of the image.

This process will result to the reduction of weave patterns from the canvas.  Better results can achieved by creating the mask such that it more accurately blocks the unwanted artifacts.

Figure 9. Weave pattern removal technique in the frequency domain. A. a grayscale of the original oil painting photo; B. an enhanced image after filtering.

The filters used for this case are made large enough in order to securely mask against the frequency peaks of the weave.  The ideal filter would most closely resemble smaller masks with highly accurate placements over the Fourier plane.  If we invert the color value of this mask filter and get its Fourier Transform, we will see the pattern of the canvas weave that we are trying to remove.

Figure 10. The weave pattern in the frequency domain. A. Color invert of the mask filter used; B. Fourier Transform of the filter.

These are only some of the basic applications of enhancing images in the Fourier plane.  Utilization of these methods does not limit to the examples presented above.  Fourier enhancement may also be used in to achieve other goals such as image sharpening, which is achieved by bandpass filtering process.

For this set of activities (8A and 8B), I would rate my 10 for the accomplishment.  I sincerely enjoyed this activity. 🙂
[1] Soriano, 2010. Enhancement in the frequency domain. Applied Physics 186.
[2] Lie, 2010. Image enhancement in the frequency domain. CCU, Taiwan.


8a. Fourier Transforms and Basic Image Patterns

July 29, 2010 Leave a comment

Oftentimes we see repetitive patterns on images we capture in photography or encounter in image processing.  Whether these are unexpected by the viewer or not, their origins are better understood by the concept of Fourier Transform.  In this section, we will develop a deeper understanding on the concept of Convolution Theorem.  We note that:

(1) the FT of the convolution of two functions f and g is equal to the product of their individual Fourier Transforms;

(2) the convolution of a function f and a Dirac delta function results in the replication of the function on the location of the function.

Convolution Theorem

From the basic properties of 2D Fourier Transform, we learned how two dots equidistant to the origin produce an FT output composed of bright and dark fringes (Figure 1).

Figure 1. Fourier Transform of a pair of dots.  Left: original pattern of the two dots; Right: FT of the pattern.

Now if we modify this pattern into circles of some radius, we get an FT pattern composed of concentric circles where the middle circle (maximum) becomes smaller in radius and number of concentric ones increase as the apertures are enlarged (Figure 2).  These is referred to as the Airy pattern, which model is used to describe diffraction patterns in optics.

Figure 2. Fourier Transform of two circles with various radii. Top row (left to right): pairs of circles with radius 0.02 to 0.06 at 0.01 increments; Bottom row: corresponding FT of the patterns.

If we replace the circles instead with squares, we obtain an FT pattern where the maximum frequency follows the shape of a square (Figure 3).

Figure 3. Fourier Transform of two squares with various widths. Top row (left to right): pairs of squares with width 0.02 to 0.06 at 0.01 increments; Bottom row: corresponding FT of the patterns.

Next we try to replace these with Gaussians, expressed by:

where μ0 is the mean and σ2 is the variance.  At increasing values for variance, we get FT patterns similar to the ones in Figure 4.  We notice that the width of the FT pattern decreases as the width of the spatial Gaussian is increased.

Figure 4. Fourier Transform of Gaussians with different variance. Top row (left to right): pairs of 2D gaussians at σ = 0.01 to 0.05 at 0.01 increments; Bottom row: corresponding FT of the patterns.

This is performed with the following Scilab implementation:

nx = 128; ny = 128; // defines the number of elements along x and y

x = linspace(-1, 1, nx); //defines the range
y = linspace(-1, 1, ny);

[X,Y] = ndgrid(x, y); // creates two 2-D arrays of x and y coordinates

A = zeros(nx, ny);
var = (0.05)^2;

r = sqrt((X.^2 + (Y - 0.3).^2)/var);
A1 = exp(-r);
r = sqrt((X.^2 + (Y + 0.3).^2)/var);
A2 = exp(-r);

A = A1 + A2;

scf(0); imshow(A, []);
scf(1); imshow(fftshift(abs(fft2(A))), []);

The Dirac delta function can be approximately produced on a 2D image by putting a value 1 on a matrix composed of zeros.  We create 10 Dirac delta’s placed on random locations of a 200×200 binary matrix and convolve it with 3×3 patterns (Figure 5).  We us Scilab’s imconv() function to perform this.

A = zeros(nx, ny);

while sum(A) < 10
   A(round(rand()*200)+1, round(rand()*200)+1) = 1;

pattern = [-1 -1 -1; 2 2 2; -1 -1 -1];  // horizontal

img = imconv(agray, pattern);
Figure 5. Convolution with the Dirac delta function. A. 10 randomly placed Dirac delta functions; B. horizontal pattern; C. convolution of the Dirac with the horizontal pattern; D. diagonal pattern; E. convolution of the Dirac with the diagonal pattern.

We observe on the resulting Fourier Transforms that a replication of the patterns were created on the locations of the Dirac deltas.  Let’s create a modification of this where the Dirac delta functions are placed on equally spaced points along the x- and y-axis.  The Scilab implementation will read as follows:

A = zeros(nx, ny);

interval = 2

for i = 1:interval:nx+1
  for j = 1:interval:ny+1
    A(i, j) = 1;

FIgray = fft2(A);
imshow(fftshift(abs(FIgray)), []);

At increasing values of spacing between the 1’s we get the following FT results (Figure 5).  Notice that the frequencies of the Dirac deltas shown in the Fourier Transform appear closer to each other as the spacing between the 1’s are increased.

Figure 6. Fourier Transform of equally spaced Dirac delta functions. Top row (left to right): Dirac delta peaks with spacing 2 to 10px at 2px interval; Bottom row : corresponding FT of the patterns.

The basic knowledge on the important concept behind these FT patterns allows us to utilize this as an intelligent technique in image processing.  A succeeding article on enhancement in the frequency domain describes these basic applications.

[1] Soriano, 2010. Enhancement in the Frequency domain.  Applied Physics 186.
[2] Weisstein, 2010. Convolution. MathWorld–A Wolfram Web Resource.

7. Properties of the 2D Fourier Transform

July 20, 2010 4 comments

In the previous blog post we observed how the Fourier Transform helps us predict the result if light passes through a certain aperture.  For further familiarization, here are more examples of FFTs obtained from various 2D patterns:

Figure 1. Fourier Transform of different aperture shapes. Top row (left to right): square, annulus (donut), square annulus, vertical slits, and two dots. Bottom row: the respective FFTs of the patterns.

Anamorphic property of the Fourier Transform

Another property of the Fourier Transform is that it enables us to find the frequencies of sinusoidal waves and project them on the Fourier space.  We can perform this in Scilab through the following implementation:

nx = 100; ny = 100;
x = linspace(-1, 1, nx);
y = linspace(-1, 1, ny);

[X, Y] = ndgrid(x, y);

f = 2;                 // frequency
z = sin(2*%pi*f*X);

scf(0); imshow(z, []); // display in graphics window 0

FIgray = fft2(z);

scf(1); imshow(abs(fftshift(FIgray)), []);

We observe from the results in Figure 2, that the FFT locates the frequencies of the sinusoids and plots them along the axis along which the sinusoid propagates.  We can see from these (and the following examples) that 2D Fourier Transforms are always symmetrical.

Figure 2. Fourier Transform of 2D sinusoids. Top row (left to right): sinusoids of frequencies f = 2, f = 4, and f = 8. Bottom row: respective FFTs of the sinusoids.

Extending this concept, we can try adding bias to these sinusoids with the implementation:

z = sin(2*%pi*f*X) + 1;

This produces a new peak in the Fourier Transform located on the origin of the FFT image, which displays that it is calculated to be at frequency 0.

Figure 3. Fourier Transform of the sinusoidal images with added constant bias.

To test with non-constant bias, we perform this on sinusoids of different frequencies added together. The implementation will be the following:

f1 = 2;
f2 = 8;
z = sin(2*%pi*f1*X) + sin(2*%pi*f2*X);

We will find that the resulting Fourier Transform displays sets of peaks that tell of the frequencies of both sinusoids that were superimposed.

Figure 4. Fourier Transform of superimposed sinusoids. Top row (left to right) F1 = 2 and F2 = 4, F1 = 2 and F2 = 8; F1 = 4 and F2 = 8; Bottom row: respective FFTs of the sinusoid combinations

Rotating these sinusoids can prove that the Fourier Transform results to peaks along the direction of propagation of the waves.  We use the following Scilab execution:

theta = 30;
z = sin(2*%pi*f*(Y*sin(theta) + X*cos(theta)));

Figure 5 displays the result for 2D sinusoids rotated at angles 30, 45 and 60.

Figure 5. Fourier Transform of rotated sinusoids. Top row (left to right): sinusoids tilted at angles 30, 45 and 60. Bottom row: respective Fourier Transforms of the sinusoids.

Operations between sinusoids can yield different results in the Fourier place.  We try to perform FFT on superimposed sinusoids (traveling perpendicular to each other, X and Y) upon addition, matrix multiplication and bitwise multiplication.  We carry these out in Scilab with the following instructions:

z_sum = sin(2*%pi*f1*X) + sin(2*%pi*f2*Y);
z_prod = sin(2*%pi*f1*X) * sin(2*%pi*f2*Y);
z_bitw = sin(2*%pi*f1*X). * sin(2*%pi*f2*Y);

In Figure 6, we can find that the Fourier peaks appear in right angles to each other just us how the original sinusoids propagate.  Note that these set of images were not normalized unlike the previous images shown.  This is in order to clearly display the differences of results from the matrix and bitwise multiplications, which somehow display visibly similar output image if normalized.

Figure 6. Fourier Transform of sinusoids traveling along the X and Y direction combined under different operations. Top row (left to right): sinusoids combined after performing addition, matrix multiplication and bitwise multiplication.  Bottom row: respective FFTs of the patterns.

The Fourier Transform of a superimposition of different rotated sinusoids depends on the frequencies and angles of the waves and the matrix operations applied on them.  Figure 7 displays FFT of the following combinations: a) addition of 2 perpendicular sinusoids and a sinusoid of angle 45; b.) bitwise multiplication of 2 perpendicular sinusoids and a sinusoid of angle 45; and c.) matrix multiplication of sinusoids of angles 30, 45 and 60.

Figure 7. Fourier Transform of several rotated sinusoids produced out of different operations. Top row (left to right): addition, bitwise multiplication and matrix multiplication. Bottom row: respective FFTs of the sinusoid combinations.

We observe that the addition of several rotated sinusoids (of similar frequency) results to a set of peaks creating a circle on the Fourier plane.

From the our examples we witnessed that the Fourier Transform is able to find underlying patterns in images and matrices, in general.  If the image contains no recurring pattern, the FFT will speak the same.  Mental prediction of Fourier output is remarkable but requires diligent practice.  Good thing that modern technology now allows us to carry this out much easier with the latest tools in digital image processing.

For this section, I would rate myself 11 for the job well done. 🙂


[1] Soriano, 2010. Properties of the 2D Fourier Transform. Applied Physics 186.
[2] Jepson, 2005. 2D Fourier Transforms. Introduction to Visual Computing. University of Toronto.
[3] Quantum Scientific Imaging, 2008. Interpret Fourier Transforms (FFT).

6. Fourier Transform Model of Image Formation

July 15, 2010 3 comments

Discrete Fast Fourier Transform

Fourier analysis has long been used in optics as an effective means of predicting what occurs when light passes through various apertures, slits, and curved mirrors.  Now, it has gone further to be the foundational concept behind advanced techniques in image processing.  Fourier transformation creates a representation of an image in the frequency domain, from dimension X to dimension 1/X [1].  The FT is one technique used to make more compact representation of images (e.g. JPEG compression) by rounding its components to lower arithmetic precisions and removing weak signals.  The inverse of this FT provides an approximated reconstruction of the original image [2].

The Fourier transform of a 2-dimensional image f(x,y) is given by [3]:

which in discrete can be expressed as:

The Scilab tool contains a built-in function for calculating Fourier Transforms for 1- and 2-dimensional arrays, fft() and fft2().  For images, it is best to use the FFT function with images of dimension in power of 2 (e.g. 128×128).  Calling this function returns the result with the quadrants of the image interchanged.  The fftshift() function rearranges these blocks to their original positions (see Figure 1).

Figure 1. A. original quadrants. B. orientation of quadrants after FFT.

Noting that the FFT is a set of complex numbers, we view the intensities by taking absolute values using the abs() function.

Lens as Fourier Transformer

We can see this basic concept by simulating an image created by light entering a lens. For instance, we create a binary image of a circle – centered, colored white, and embedded on a black background.

I = imread('circle.bmp');
Igray = im2gray(I);

Alternatively, we can create the shape within Scilab and also create a gray representation of it (Figure 2A).

x = [-64:1:64];
[X,Y] = meshgrid(x);
r = sqrt(X.^2 + Y.^2);
circle = zeros(size(X, 1), size(X, 2));
circle(find (r <= 32)) = 1;
imshow(circle, []);
Igray = im2gray(circle);

We implement FFT on the image and then display the intensity of the result.

FIgray = fft2(Igray);   // FIgray is complex

The brackets passed to the imshow() function adjusts the image such that the maximum value is set to white (or 1) and the minimum is set to black (or 0).  Figure 2B shows the result of our transformation, however with quadrants still interchanged.

Figure 2. A. a binary image of a circle; B. the FFT of the circle; C. its shifted FFT; D. its inverse FFT

We apply fftshift() to have the quadrants back to their proper placements. This result is displayed in Figure 2C.

imshow(fftshift(abs(FIgray))), []);

Now, if we want to do an inverse of the FFT we have performed, we do this by applying another FFT on our previous result.


This approximately reconstructs our original circle (Figure 2D). However, though the resulting image appears similar to our input, it is an inverted version of the original.
We can verify this if we perform the same methods on a text character, as in Figure 3.

Figure 3. A. binary image containing character “A”; B. the FFT of the image; C. its shifted FFT; D. its inverse FFT


Convolution is a function that solves for the amount of overlap of one function g as it is shifted over the function f.  For 2-dimensional functions, convolution is expressed by [4]:

This equation basically describes the relationship between three signals: the input, the impulse response and the output.

Simulation of an imaging device

Photographs are usually defined by the transfer function (transform of the impulse response) of the camera, which is limited by finite radius of the lens. If minimal light rays are able to enter through the camera due to the lens, the image is reconstructed with lesser resolution.
To simulate this, we create an image composed of the text “VIP” covering about 50% of its total area, and a circle to serve as a lens aperture.

We take the FFT of the “VIP” image and multiply it with the fftshift of the circular aperture. We do not anymore perform FFT on the circle since it already is part of the Fourier plane. Our Scilab code will read as follows:

r = imread('circle.bmp');
a = imread('vip.bmp');

rgray = im2gray(r);
agray = im2gray(a);

Fr = fftshift(rgray);
Fa = fft2(agray);

FRA = Fr.*(Fa);
IRA = fft2(FRA); // inverse FFT
FImage = abs(IRA);

imshow(FImage, [ ]);

We can investigate on the properties (and integrity) of this method by observing the resulting image if allowed to pass through various aperture sizes.

Figure 4. The convolution formed from the text “VIP” through various aperture sizes. Top row: circular apertures of various diameters (32, 16, 8, 4 pixels). Bottom row: the resulting images after convolution with the different aperture sizes. NOTE: the “VIP” text was originally written upright.

The results in Figure 4 show that the size of the aperture importantly matters to produce a well-resolved image of the object and to create a better representation of it by allowing more light to enter through the lense.


Correlation is the function for measuring the degree of similarity between two functions [1].  For 2-dimensional objects, it is expressed as:

High resulting values mean high correlation, which occur when functions (or images) are greatly identical.  This is why correlation is widely used in image processing for pattern recognition.

Template matching / Pattern recognition

We test the method of correlation by creating an image containing a short text and correlate it with another image containing a letter frequently occurring in it.  We perform correlation by multiplying the FT of the single-letter image and the conjugated FT of the text image.  We then obtain the result by solving for the inverse FT of their product.  The Scilab implementation reads:

r = imread('a.bmp');
a = imread('spain.bmp');

rgray = im2gray(r);
agray = im2gray(a);

Fr = fft2(rgray);
Fa = conj(fft2(agray));

FRA = Fr.*(Fa);
IRA = fft2(FRA); // inverse FFT
FImage = abs(fftshift(IRA));
imshow(FImage, [ ]);

We obtain a resulting image representing areas on which the images are of high correlation (Figure 5C). However, the values are broadly scattered due to the large difference in the character size. We repeat the procedure but this time correlating with a character size more identical to the ones in the text.

Figure 5. A. a sample short text; B. our pattern, which is a particular letter repeating in the text; C. correlation of the images; D. pattern, now a single-letter image more identical to the ones in the short text; E. correlation of the images with more defined peaks.

Notice that the more identical the images, the higher the corresponding values are in the resultant matrix (Figure 5E).  We find the highest peaks (or the brightest pixels) on the specific words where the letter “A” is present (i.e. “rain”, “spain”, “mainly”, and “plain”).

Edge detection using the convolution integral

Edge detection in images can be a way of template matching.  For instance, we create a pattern within a 3×3 matrix such that the total of the values equal to zero.  We then convolve this with a sample image for edge detection via the imcorrcoef() function.

pattern = [-1 -1 -1; 2 2 2; -1 -1 -1];  // horizontal pattern

a = imread('vip.bmp');
agray = im2gray(a);

img = imcorrcoef(agray, pattern);

We also test other templates such as vertical, diagonal and spot.

pattern = [-1 2 -1; -1 2 -1; -1 2 -1];       // vertical
//pattern = [-1 -1 2; -1 2 -1; 2 -1 -1];     // diagonal right
//pattern = [-1 -1 -1; -1 8 -1; -1 -1 -1];   // spot pattern
Figure 6. Template matching as method for edge detection. Top row: 3×3 patterns (horizontal, vertical, diagonal and spot); Bottom row: the edges detected via correlation from an image with text “VIP”.

From our results in Figure 6, we notice that the edges more pronounced in the resulting image are those similar to the template with which the text is correlated with.  The letters “I” and “P” are the ones least detected with edges when related with the horizontal and diagonal templates.  The best detection, however, was obtained by the spot template which, in a sense, does not outline a specific pattern and thus was able to trace approximately all the boundaries of the text image.

Special thanks to Gladys R. for sharing her thoughts and immediate realizations, and Dr. Soriano for the supplementary insights.  I would rate myself 10 in this activity for comprehending the ideas and producing the desired outputs.
[1] Soriano, 2010. Fourier Transform Model of Image Formation. Applied Physics 186.
[2] Wikipedia, 2010. Fourier Analysis.
[3] Wang, R., 2007. Two-Dimensional Fourier Transform.
[4] Weisstein, E., 2010. Convolution. From MathWorld–A Wolfram Web Resource.
[5] Peters, R. II, 2008. Image processing: Spatial convolution. Vanderbilt University School of Engineering.