Posts Tagged ‘edge detection’

Ink caricature via edge detection and morphology

February 25, 2011 4 comments
I’m gonna teach you
how to turn …



into THIS →

In this post, I’ll be showing an algorithm for generating an ink caricature out of a face photograph. Using a series of basic image processing techniques we will extract the features of the human face and then reconstruct them into a black-on-white sketch with details similar to brush (or pen) touches. This is useful for those who wish to avoid the work of manually producing such artworks. In addition, this set of methods can serve as initial groundwork for more intelligent and advanced algorithms in creating digital caricatures.

The overall algorithm can be summarized as follows:

Firstly, we choose a good threshold value to produce a satisfactory binary conversion of the raw image.  The parameter value should be sufficient to capture and preserve the important features of the face (i.e. eyes, brows, facial linings, teeth, etc.).

Applying edge detection using a spot filter would obtain the important linings of these facial aspects.

grayscaled image
binary equivalent
detected edges

Notice however that though we were able to capture the details of the face, due to the limitations of the photo (dark hair against dark background), we were not able to capture the details of the hair.  This is because the dark areas were not acquired/obtained by the chosen threshold value.  This can be resolved by applying value inversion, which changes the brightness of the pixels, but not the color.  We use the built-in method in Scilab, rgb2hsv(), to accomplish this and then revert it back its RGB representation.

img = rgb2hsv(im)
// img (:, :, 1) → HUE
// img (:, :, 2) → SATURATION
// img (:, :, 3) → VALUE

img_value = img (:,:,3);
img (:,:,3) = 1 – img_value;
im = hsv2rgb(img);

After applying the same methods performed above, the following pictures show the captured details of the photo.

grayscaled image
binary equivalent
detected edges

Next, we combine the edges detected for the face and the hair and obtain the overall edges for the face photo.


We remove the pits and holes by applying a simple morphology using a structuring element in a shape of a circle. Note that for a 5×5 matrix this would appear more like a diamond (like the one below), but it is the best shape to approximate the touch of a inkbrush or pen.

After morphology, we invert the processed image into its black-on-white equivalent and finally obtain our ink caricature.

structuring element
resulting image after morphology
inverted value (our ink caricature)

This set of methods also show to be effective for various facial expressions.

Raw images courtesy of DailyMail.

The limitations however are that (1) the methods sometimes need human supervision in choosing threshold values, so to make sure all the valuable features of the face are captured (aesthetic reasons, i.e. eyelashes, dimples, etc.).  (2) The effectiveness would depend on value contrasts, thus the need for value inversions. (3) The caricature detail would depend on the brush/pen size preferred and would greatly affect its accuracy.

[1] Soriano, 2010. Binary operations. Applied Physics 186.
[2] Soriano, 2010. Fourier Transform model of image formation. Applied Physics 186.
[3] Wikipedia, 2010. Caricature.
[4] Portrait Workshop, 2010. Types of Caricatures.

6. Fourier Transform Model of Image Formation

July 15, 2010 2 comments

Discrete Fast Fourier Transform

Fourier analysis has long been used in optics as an effective means of predicting what occurs when light passes through various apertures, slits, and curved mirrors.  Now, it has gone further to be the foundational concept behind advanced techniques in image processing.  Fourier transformation creates a representation of an image in the frequency domain, from dimension X to dimension 1/X [1].  The FT is one technique used to make more compact representation of images (e.g. JPEG compression) by rounding its components to lower arithmetic precisions and removing weak signals.  The inverse of this FT provides an approximated reconstruction of the original image [2].

The Fourier transform of a 2-dimensional image f(x,y) is given by [3]:

which in discrete can be expressed as:

The Scilab tool contains a built-in function for calculating Fourier Transforms for 1- and 2-dimensional arrays, fft() and fft2().  For images, it is best to use the FFT function with images of dimension in power of 2 (e.g. 128×128).  Calling this function returns the result with the quadrants of the image interchanged.  The fftshift() function rearranges these blocks to their original positions (see Figure 1).

Figure 1. A. original quadrants. B. orientation of quadrants after FFT.

Noting that the FFT is a set of complex numbers, we view the intensities by taking absolute values using the abs() function.

Lens as Fourier Transformer

We can see this basic concept by simulating an image created by light entering a lens. For instance, we create a binary image of a circle – centered, colored white, and embedded on a black background.

I = imread('circle.bmp');
Igray = im2gray(I);

Alternatively, we can create the shape within Scilab and also create a gray representation of it (Figure 2A).

x = [-64:1:64];
[X,Y] = meshgrid(x);
r = sqrt(X.^2 + Y.^2);
circle = zeros(size(X, 1), size(X, 2));
circle(find (r <= 32)) = 1;
imshow(circle, []);
Igray = im2gray(circle);

We implement FFT on the image and then display the intensity of the result.

FIgray = fft2(Igray);   // FIgray is complex

The brackets passed to the imshow() function adjusts the image such that the maximum value is set to white (or 1) and the minimum is set to black (or 0).  Figure 2B shows the result of our transformation, however with quadrants still interchanged.

Figure 2. A. a binary image of a circle; B. the FFT of the circle; C. its shifted FFT; D. its inverse FFT

We apply fftshift() to have the quadrants back to their proper placements. This result is displayed in Figure 2C.

imshow(fftshift(abs(FIgray))), []);

Now, if we want to do an inverse of the FFT we have performed, we do this by applying another FFT on our previous result.


This approximately reconstructs our original circle (Figure 2D). However, though the resulting image appears similar to our input, it is an inverted version of the original.
We can verify this if we perform the same methods on a text character, as in Figure 3.

Figure 3. A. binary image containing character “A”; B. the FFT of the image; C. its shifted FFT; D. its inverse FFT


Convolution is a function that solves for the amount of overlap of one function g as it is shifted over the function f.  For 2-dimensional functions, convolution is expressed by [4]:

This equation basically describes the relationship between three signals: the input, the impulse response and the output.

Simulation of an imaging device

Photographs are usually defined by the transfer function (transform of the impulse response) of the camera, which is limited by finite radius of the lens. If minimal light rays are able to enter through the camera due to the lens, the image is reconstructed with lesser resolution.
To simulate this, we create an image composed of the text “VIP” covering about 50% of its total area, and a circle to serve as a lens aperture.

We take the FFT of the “VIP” image and multiply it with the fftshift of the circular aperture. We do not anymore perform FFT on the circle since it already is part of the Fourier plane. Our Scilab code will read as follows:

r = imread('circle.bmp');
a = imread('vip.bmp');

rgray = im2gray(r);
agray = im2gray(a);

Fr = fftshift(rgray);
Fa = fft2(agray);

FRA = Fr.*(Fa);
IRA = fft2(FRA); // inverse FFT
FImage = abs(IRA);

imshow(FImage, [ ]);

We can investigate on the properties (and integrity) of this method by observing the resulting image if allowed to pass through various aperture sizes.

Figure 4. The convolution formed from the text “VIP” through various aperture sizes. Top row: circular apertures of various diameters (32, 16, 8, 4 pixels). Bottom row: the resulting images after convolution with the different aperture sizes. NOTE: the “VIP” text was originally written upright.

The results in Figure 4 show that the size of the aperture importantly matters to produce a well-resolved image of the object and to create a better representation of it by allowing more light to enter through the lense.


Correlation is the function for measuring the degree of similarity between two functions [1].  For 2-dimensional objects, it is expressed as:

High resulting values mean high correlation, which occur when functions (or images) are greatly identical.  This is why correlation is widely used in image processing for pattern recognition.

Template matching / Pattern recognition

We test the method of correlation by creating an image containing a short text and correlate it with another image containing a letter frequently occurring in it.  We perform correlation by multiplying the FT of the single-letter image and the conjugated FT of the text image.  We then obtain the result by solving for the inverse FT of their product.  The Scilab implementation reads:

r = imread('a.bmp');
a = imread('spain.bmp');

rgray = im2gray(r);
agray = im2gray(a);

Fr = fft2(rgray);
Fa = conj(fft2(agray));

FRA = Fr.*(Fa);
IRA = fft2(FRA); // inverse FFT
FImage = abs(fftshift(IRA));
imshow(FImage, [ ]);

We obtain a resulting image representing areas on which the images are of high correlation (Figure 5C). However, the values are broadly scattered due to the large difference in the character size. We repeat the procedure but this time correlating with a character size more identical to the ones in the text.

Figure 5. A. a sample short text; B. our pattern, which is a particular letter repeating in the text; C. correlation of the images; D. pattern, now a single-letter image more identical to the ones in the short text; E. correlation of the images with more defined peaks.

Notice that the more identical the images, the higher the corresponding values are in the resultant matrix (Figure 5E).  We find the highest peaks (or the brightest pixels) on the specific words where the letter “A” is present (i.e. “rain”, “spain”, “mainly”, and “plain”).

Edge detection using the convolution integral

Edge detection in images can be a way of template matching.  For instance, we create a pattern within a 3×3 matrix such that the total of the values equal to zero.  We then convolve this with a sample image for edge detection via the imcorrcoef() function.

pattern = [-1 -1 -1; 2 2 2; -1 -1 -1];  // horizontal pattern

a = imread('vip.bmp');
agray = im2gray(a);

img = imcorrcoef(agray, pattern);

We also test other templates such as vertical, diagonal and spot.

pattern = [-1 2 -1; -1 2 -1; -1 2 -1];       // vertical
//pattern = [-1 -1 2; -1 2 -1; 2 -1 -1];     // diagonal right
//pattern = [-1 -1 -1; -1 8 -1; -1 -1 -1];   // spot pattern
Figure 6. Template matching as method for edge detection. Top row: 3×3 patterns (horizontal, vertical, diagonal and spot); Bottom row: the edges detected via correlation from an image with text “VIP”.

From our results in Figure 6, we notice that the edges more pronounced in the resulting image are those similar to the template with which the text is correlated with.  The letters “I” and “P” are the ones least detected with edges when related with the horizontal and diagonal templates.  The best detection, however, was obtained by the spot template which, in a sense, does not outline a specific pattern and thus was able to trace approximately all the boundaries of the text image.

Special thanks to Gladys R. for sharing her thoughts and immediate realizations, and Dr. Soriano for the supplementary insights.  I would rate myself 10 in this activity for comprehending the ideas and producing the desired outputs.
[1] Soriano, 2010. Fourier Transform Model of Image Formation. Applied Physics 186.
[2] Wikipedia, 2010. Fourier Analysis.
[3] Wang, R., 2007. Two-Dimensional Fourier Transform.
[4] Weisstein, E., 2010. Convolution. From MathWorld–A Wolfram Web Resource.
[5] Peters, R. II, 2008. Image processing: Spatial convolution. Vanderbilt University School of Engineering.

4. Area estimation of images with defined edges

July 1, 2010 1 comment

Area computation has several applications in biomedical imaging, marine signaling and geomatics.  Here we explore on the use of the image processing tool  Scilab and a standard scientific algorithm for calculating areas out of raw images.

Green’s theorem is a vector identity that relates the surface integral of a boundary to its line integral.  It says that the area of a surface can be calculated by the equation:

which can be expressed in discrete form as:

When dealing with images, we can use the coordinates (x and y) or the pixel locations of the image in order to compute for the area of interest.  We begin by applying these basic methods on the simple figures below:

Figure 1. Basic polygons: circle, square and triangle.

We first make sure the images are in their binary form in order for the machine to easily interpret the values.  The area to calculate should be set to white, and the background black.  Then we use Scilab’s follow() function that returns an array of the coordinates of the detected edges or contour.

I = imread('circle.bmp');
I = im2bw(I, 0.5);

[x,y] = follow(I);
plot2d(x,y, rect=[0, 0, 250, 419]);

The plot2d() function allows us to view the edges detected in the image. Figure 2 below shows the results for the three polygons.

Figure 2. Edge detection results for the three polygons.

Applying the Green’s theorem we write our Scilab code as follows:

len = length(x);
area_calculated = 0;

for i=1:len-1
   area_partial = (x(i)*y(i+1)) - (x(i+1)*y(i));
   area_calculated = area_calculated + area_partial;

Finally we make sure that the boundary of the contour is closed such that x(len+1)=x(1) and y(len+1)=y(1)

area_partial = (x(len)*y(1)) - (x(1)*y(len));
area_calculated = (area_calculated + area_partial)/2;

The actual area of a digital image can be known through the sum of all its filled pixels.  We get the accuracy by first calculating the percent error.

area_analytical = sum(I);
percent_error = (abs(area_analytical - area_calculated)/area_analytical)*100;

accuracy = 100 - percent_error;

Accuracies may also be visualized by overlaying the plot of the polygon with its detected edges. (Click on the images for larger view.)

Figure 3. Overlayed plots for the polygons and their contours.

Here the summarized calculations for the three figures:


Pixel Count

Calculated Area


Circle 95812 95319 99.4854507
Square 62500 62001 99.2016000
Triangle 45066 44614 98.9970266

Notice the accuracy of the algorithm on well-defined contours.  We can also investigate the effectiveness of this method on other shapes such as the following:

Figure 4. Test figures: (1) a smaller square, (2) puzzle swan, (3) lowecase T (4) cat silhouette

The following are the results for the area estimation:


Pixel Count

Calculated Area


Small Square 22500 22201 98.6711111
Puzzle swan 20983 20650.5 98.4153840
Lowercase T 26517 25882.5 97.6071954
Cat silhouette 17904 18073.5 99.0532842

We notice here that error increases when the area calculated for is of lower resolution. Also, more inaccuracies arise when it becomes more difficult for the algorithm to traverse the contour.


Geomatics Application

We apply this estimation algorithm in finding the total land area of the National Science Complex of the University of the Philippines Diliman.

Figure 5. Helicopter view of the National Science Complex. Courtesy of Gil Jacinto.

We obtain a satellite map from Google Maps and highlight the area using a GIMP selection tool [2].

Figure 6. Highlighted area of the National Science Complex. Courtesy of Google Maps.

The selected area is then converted to a binary image and was later calculated for its area.

Figure 7. Binary representation of the land area.

The National Science Complex is known to have an area of 21.9 hectares (multiply by 10,000 to convert to sq. meters).   Using the Google Maps’ scale legend, we perform ratio and proportion in order to obtain actual measurements.  We find that 87px of the image represent 200m of the land area. The following table displays the pixel calculations and the actual area accuracy.


Analytical Area

Calculated Area


Map image (in sq. pixels) 45481 45047 99.045755
Land area (in hectares) 21.9 23.806051 91.296571

Our results show that the method of Green’s Theorem proves to be an accurate algorithm on practical discrete digital images, such as the NSC map.  The minimal errors is accounted on the manual method selection and the limited resolution of the map.

For this segment, I would rate myself 11 for completing the tasks and figuring out other necessary methods to find the land area estimation.

[1] Soriano, M., 2010.  Area estimation of images with defined edges.  Applied Physics 186.
[2] Hattingh, F., 2008. GIMP for dummies – How to highlight part of image.
[3] Vergara, J., 2009. Area estimation for images with defined edges.
[4] Wolfram, 2010. A Discrete Green’s Theorem. Wolfram Demonstrations Project.