Chapter 1: Introduction
1.1 Introduction
For the past few years, the common input computer devices did not change a lot. This means, the communicating with computers at this moment are limited to mouse, keyboard, track ball, web-cam, light pen and etc. This is happened because the existing of input devices is adequate in order to perform most of the function that computer able to do. In other hand, the new application/software is constantly introduced into our market. This software is able to perform multiples of functions using just the common input computer devices.
Vision based interfaces are feasible and popular at this moment because the computer is able to communicate with user using webcam. This means, user able to give command to the computer by just showing some actions in front of the webcam without typing keyboard and clicking mouse button. Hence, users are able to perform human-machine interaction (HMI) with these user-friendlier features. Eventually, this will deploy new commands that are not possible with current computer input devices.
Lately, there has been a surge in interest in recognizing human hand gestures. Hand gesture recognition has various of applications such as computer games, gaming machines, as mouse replacement and machinery control (e.g. crane, surgery machines). Moreover, controlling computers via hand gestures can make many applications work more intuitive than using mouse, keyboard or other input devices.
The most structured sets of gestures belong to sign language. In sign language, each gesture has an assigned meaning (or meanings). This project will focus on American Sign Language (ASL). ASL is the language of choice for most deaf people. The main purpose of invented ASL is to allow deaf people communicate with normal people. ASL consists of approximately 6000 gestures of common words with finger spelling which are use to communicate proper nouns. Finger spelling can be performs by one hand and 26 gestures to communicate the 26 letters of the alphabets. [1] Examples of signs shown in Figure 1.
Figure 1: ASL examples
1.2 Background
There are few ways to perform hand gesture recognition. Let’s classifies it into three categories. The first category is involved heavily in hardware parts such as glove based analysis, employ sensors (mechanical or optical) attached to a glove that transduce finger flexions into electrical signals to determine the hand posture. Normally, the sensors that used are acoustic or magnetic sensor which embedded into the glove.
Example, there is a system able to translates mutually between Japanese Sign Language (JSL) and Japanese. The system is work by recognition of one-handed motions. A VPL (Visual Programming Language) Data Glove Model II is used for acquiring hand data. It has 2 sensors for measuring bending angles of the 2 joints on each finger, one over the knuckle and the other over middle joint of the finger. There is also a sensor attached to the back of the glove which measures 3 position data and 3 orientation data relative to the fixed magnetic source. The position data is calibrated by subtracting the neutral position data from the raw position data. [2]
The second category is analysis of drawing gesture, which are involved using special input devices such as stylus. [2] Most of hand gesture recognition currently works by using mechanical sensing, most often for direct manipulation of a virtual environment. But this type of sensing has a range of problems such as accuracy, reliability and electromagnetic noise. [2] These two categories involved external hardware parts.
The third category is vision based analysis which is based on the way human beings perceive information about their surroundings. Visual sensing has the potential to make gestural interaction more practical and this type of method is most intuitive method to perform hand gesture recognition because it involved no external hardware part, this mean it can recognition our hand gesture freely without anything put on our hand. [3] What it need is just a camera, webcam, camcorder or anything can capture image that able to interface with computer. In this project, we will focus on vision based analysis.
1.3 Objectives:
a) Implementation of pattern recognition using Neural Network into MATLAB.
b) The implemented system should able to perform classification correctly.
c) The implemented application should be user friendly enough for anyone to use.
d) System should be able to get static image through the webcam and perform the classification.
1.4 Aim:
The aim of this project is creating visual biased analysis application to perform Hand Gesture Recognition of American Sign Language (ASL). This project is able to recognize a few hand gestures of ASL such as hand gestures for letter A, B, C and number 1, 2 ,3 and etc successfully without any error regardless the person hand sizes and other external causes.
1.5 Goal
This project should not involve any other external part/hardware except computer equipped with webcam. This is to keep the cost minimum and everyone able to own and use this application easily.
1.6 Approach
Since this project is limited the hardware part to computer and webcam, we just need to consider the software and programming parts. There are few software can perform hand gesture recognition such as MATLAB, Microsoft Visual C#, Microsoft Visual C++, and Microsoft Visual Basic with correct way of programming but the most common software are MATLAB and Microsoft Visual C#. Both are very powerful tools.
This project is based on MATLAB software. MATLAB is chosen over Microsoft C# because (MATLAB is perfect for speeding up development process which it allows user to work faster and concentrate on the results rather on the design of the programming. [7]
MATLAB has Toolboxes which allow us to learn and apply specialized technology. It is a tool of choice for high –productivity research, analysis and development. In this project, we used 2 toolboxes which are Neural Network and Images Processing. [8]
Chapter 2: Literature Review
2.1 Vision based Analysis
There are few technologies already using vision based analysis system.
Example, Hand gesture recognition which using ‘Vision-Based’ approaches use only the vision sensor (camera) for understand musical conduction action. This system works by when the conductor uses only one-side hand and must in the view range of camera. The conductor may indicate 4 timing patterns with 3 tempos by his/her hand motion. When the camera capture the image of hand gesture, the system extract the human hand region which is the region of interest (ROI) using the intensity color information. The system is obtained the motion velocity and the direction by tracking the center of gravity (COG) of the hand region, which provides the speed of any conducting time pattern. [4]
While another one is Gesture-Based Interface for Home Appliance Control in Smart Home. This technology is based on HMI (Human-Machine Interface). A small advanced color camera built onto the television senses when someone enters its field of vision and searches for their hand. The machine will then interprets the hand’s signal such as waving up and down could alter the volume and raise their finger would switch the channel. This technology is designed by detecting the skin color of face and hand whether is matching, then commanding the hand detection/tracking algorithms which use a cascade hand motion recognizer is used for distinguishing pre-defined hand motions from the meaningless gestures. [5]
For this Hand Gesture Recognition project, is based on Human-Computer Interaction (HCI) technology. The computer can perform hand gesture recognition on American Sign Language (ASL). The system use MATLAB Toolboxes, Neural Network to perform the gesture recognition. It work by feed numerous types of hand gestures images then into ‘neural network’ and the system will train the network itself. Once the ‘neural network’ is trained, this ‘neural network’ can perform multiples of hand gesture recognition of ASL. [11]
Further explanation regarding how the images are feed into network and how the network process will be discuss on this report.
2.2 Neural Network
Neural network is also known as Artificial Neural Network (ANN), is an artificial intelligent system which is based on biological neural network. Neural network are makes up of simple elements operating in parallel. These elements also know as neurons and will discuss at Simple Neuron chapter. The network function is widely determined by the connections between these elements. These elements are works same as biological nervous systems. Neural networks able to be trained to perform a particular function by adjusting the values of the connections (weight) between these elements. [8]
Figure 2: Neural Network Block Diagram [8]
Neural network is adjusted and trained in order the particular input leads to a specific target output.[8] Example at Figure 2, the network is adjusted, based on a comparison of the output and the target until the network output is matched the target.
Neural network have been used in various fields of applications such as identification, speech, recognition, vision, classifications and control systems. Neural network can be trained to perform all this complex functions. [8]
Nowadays, neural network can be trained to solve many difficult problems faced by human being and computer. Examples of applications using neural network is shown in Table 1.
Industry | Business Applications |
Automobile | Automobile warranty analysis and automatic guidance system |
Financial | Real estate appraisal, loan advising, mortgage screening, corporate bond rating, credit-line use analysis, credit card activity tracking, portfolio trading program, corporate financial analysis, and currency price prediction |
Defense | Weapon steering, target tracking, object discrimination, facial recognition, new kinds of sensors, sonar, radar and image signal processing including data compression, feature extraction and noise suppression, and signal/image identification |
Medical | Breast cancer cell analysis, EEG and ECG analysis, prosthesis design, optimization of transplant times, hospital expense reduction, hospital quality improvement, and emergency-room test advisement |
Table 1: Applications of Neural Network
2.2.1 Simple Neuron
Figure 3 below shows a neuron with a single scalar input. A neuron with a single scalar input and no bias appears on the left below while with bias appears on right below. MATLAB’s Toolboxes Neural Network and build in function is ‘neuron with bias.’ [9]
Figure 3: Neuron with Single Scalar [8]
The scalar input p is feed through a connection that multiplies its strength by the scalar weight w to form the product wp, again it is a scalar. The weighted input wp is the argument of the transfer function f, which produces the scalar output a. The neuron on the right side has a scalar bias, b included. This bias is just simply being added to the product wp. The bias is act just like a weight with constant input of 1. [9]
The transfer function net input n, again a scalar, is the sum of the weighted input wp with the bias b. This sum is the argument of the transfer function f. Here f is a transfer function, typically a step function, which takes the argument n and produces the output a. The w and b are both adjustable scalar parameters of the neuron. [9]As a result, neural network can be trained to perform a particular task by adjusting the bias or weight parameters, or the network itself will adjust these parameters to achieve some desired result. [9]
2.2.2 Transfer Functions
There are 3 types of transfer functions commonly used shown in Figure 4.
Figure 4: Transfer Functions [8]
Since this project is based on Hard-Limit transfer function, so the explanations just focus only on Hard-Limit transfer function. Hard-limit transfer function is limits the output of the neuron to either 0, if the net input argument n is less than 0, or 1 if n is greater than or equal to 0. This function will be used in Perceptron, which is to create neurons that make classification decisions. Perceptron is the programs that learn concepts. [8]
There are 2 types of training available in Neural Network which are Supervised Training and Unsupervised Training. Peceptron is fall under Supervised Training. [9]
2.2.3 Supervised Training
Supervised training is based on the system trying to predict the outcomes for known examples. It compares its predictions with the target answer and ‘learns’ from its mistakes by adjusting the weight and bias accordingly.
First, the input data is feed into the input layer neurons. Then, the neurons pass the inputs along to the next nodes. As inputs are passed along, the weighting, also known as ‘connection,’ is adjusted and when the inputs reach to the next node, the weighting’s values are summed. This summed value is either weakened or intensified. This process continues until the data reach the output layer.
Then it will come to the classification step, where the predicted output from the network is compare with the actual output. If the predicted output is equal to the actual output, no change is made to the weights in the network. But, if the predicted output value is different from the actual output, there is an error and the error is propagated back through the network and the weights are adjusted accordingly. [9]
2.2.4 Unsupervised Training
This type of neural network did not have any outputs or answers as part of the training process. Unsupervised training works by group the ‘similar’ case together and performs the clustering analysis.
The system starts with a clean slate and is not biased about which factors should be most important. Example of unsupervised training is the Kohonen network. [23]
In this project, supervised training was preferred because this project is about classification and pattern recognition. Supervised training able to compare these hand gestures with the correct/real target and ‘learns’ by the system itself. While unsupervised training is focus more in clustering due to it works by placing ‘similar’ cases together. Unsupervised training performs better efficiency on data compression and data mining. [10]
2.3 Advantages of Neural Network
Neural Network has variety of advantages especially for those analysts. Below are some the of Neural Network’s advantages:
a) Neural Network system is developed through learning rather than programming. This will definitely save some time for programmer to do programming because programming is much more time consuming and require them to specify the exact behavior of the model. This allows programmer/analyst focus more on result rather than design the program.
b) Neural Network is flexible in a changing environment. Normal programmed systems are limited to certain situation. When the condition changed, the program no longer functions well. Neural Network is excellent in adapting constantly changing information although it does take some time to learn a sudden drastic change.
c) Neural Network able to build informative models where most conventional methods fail. Neural Network can easily model data which is very difficult to model because Neural Network works by through traditional approaches such as programming logic and inferential statistics.
d) Neural Network pattern recognition is a powerful and robust approach for harnessing the information in the data. Neural Network learns to recognize patterns from the data set that presented to it. [15]
2. 4 Limitations of Neural Network
Even though Neural Network has variety of benefits, but every system also has their own limitation. Neural Network has its own limitation as well. Below are some of the Neural Network’s limitations:
a) Neural Network unable to explain the model or network that it has built in a useful way. Neural Network always get better results but have a hard time to explain how it’s got here. This explanation is important especially for analysts who want to know how the model behaves. [16]
b) Neural Network won’t produce good results if the input data are not representative of the problem. This situation classified as ‘garbage in’ produce ‘garbage out’. So analyst has to spend time to understanding the problem or the outcome that expected. And, analyst must select appropriate data used to train the system and are measured in a way that reflects the behavior of the factors. [16]
c) Neural Network takes time to train a model when very complex data set present to it. This technique will slow down on low end computer or machine that without math coprocessors. But overall, it still faster than other data analysis approaches. Anyway, this is not big problem, because nowadays, most computers’ processor is fast enough to train this Neural Network. [15]
2.5 Perceptron Learning Rule
Perceptron is one of the programs of Neural Network that ‘learns’ concepts. For example, this program is able to learn to respond for the inputs that present to it whether it is True(1) or False(0), by repeatedly "studying" examples presented to it. This program is the suitable for classification and pattern recognition. [12]
Single perceptron’s structure is quite simple. There are few inputs, depends on input data, a bias with and output. The inputs and outputs must be in binary form which the value can only be 0 or 1. Figure 3 shows Perceptron Schematic diagram.
Figure 5: Perceptron Schematic diagram [8]
Inside the Perceptron layer, the input data (0 or 1) are feed into weight by multiply with it and the weight is generally a real number between 0 and 1. The value are then feed into neuron together with bias, bias is real value as well range from 0 to 1. Inside neuron, the both these values are summed up. After that, the summed values will fed into Hard-Limiter. Hard-Limiter is a function which defined the threshold values as discussed earlier. For example, f(x) = {x ≤ 0.5 à 0 or x ≥ 0.5 à 1}. This mean we set the threshold value of Hard-Limiter function to 0.5, if the sum of the input multiplied by the weight is less than 0.5, the limiter function would return value 0 else if the sum of input multiplied by the weight is more than 0.5, the limiter function would return value 1. [8]
Once the value is obtained, the next step process is adjusting the weights. The way of Perceptron learning is through modifying its weights. [12]
The value obtained from the limiter function is also known as actual output. Perceptron adjust its weights by the difference between the desired output and the actual output. This can be written as:
Change in Weight i = Current Value of Input i × (Desired Output - Current Output) [12]
Perceptron will continue adjust its weights until the actual output value is no difference with desired output value or with minimum difference value.
2.5.1 Perceptron Convergence Algorithms
Perceptron is so powerful because they can be trained behavior of certain way. Error-correction learning algorithm is part of Perceptron learning process. Perceptron will have converged (a technical name for learned) for to behave that way.
Figure 4 below shows signal-flow graph how error-correction learning algorithm works in a single-layer Perceptron. In this case the threshold θ(n) is treated as synaptic weight connected to a fixed input equal to -1. [16]
Figure 6: Perceptron Signal Flow Graph [17]
Firstly, define the (p+1)-by-1 input vector:
x(n) = [-1, x1 (n), x2 (n),.....xp (n)] T
After that, define the (p+1)-by-1 weight vector:
w(n) = [θ(n), w1(n), w2(n),.....w p (n)] T
Here are some variable and parameters used in the convergence algorithm for further explanation: [23]
- θ(n) =threshold
- y(n) = actual response
- d(n) = desired response
- η = learning rate parameter, 0< η <1
Let’s focus the 4-step algorithm in more detail: [23]
Ø 1st step: Initialization
Set the weight w (0) =0. Then perform computations for time n=1, 2, 3 …
Ø 2nd step: Activation
At time n, activate the Perceptron layer by applying continuous-valued input vector x(n) and desired response d(n).
Ø 3rd step: Computation of Actual Response
Compute the actual response of the Perceptron using following equation:
y(n) = sgn[wT(n)x(n)]
Linear output is written in the form:
u(n) = wT(n)x(n)
Where: if u > 0, sgn(u) = +1
If u < 0, sgn(u) = -1
Ø 4th step: Adaptation of Weight Vector
Adaptation of weight vector equation:
w(n+1) = w(n) + η [d(n) – y(n)]x(n)
Where: if x(n) belongs to class C1, then d(n) = +1
If x(n) belongs to class C2, then d(n) = -1
Ø 5th step: Increment time n by one unit then repeat back step 2.
2.5.2 Perceptron For Linear Separable Only
One of the limitations of Perceptron is it can only solve problems of the solutions that only can be divided by a line, plane, hyper plane dividing points in 3-D space, or beyond that. Let’s discuss this more details using example. Let’s use basic function to explain, which is AND gate function. This function reads as ‘Y = X 1 OR X2’. Table 2 below shows OR gate truth table. [16]
X1 | X2 | Y |
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 1 |
Table 2: OR gate truth table
Assume that we run the OR gate’s function through a Perceptron layer, and the weights converge at 0 for the bias, and 1, -1 for the inputs. Then calculate the net value which is the sum of the weight (neuron). We get: [16]
wixi = (0)x0 + (1)x1 + (-1)x2 = x1 – x2
Figure 7: Linear Separable Graph
Figure 7 above shows the Perceptron correctly draw a line that divides the two groups of points. As mentioned, it can be any other than a line such as plane or hyper plane. This is advantages of Perceptron but this Perceptron also had it own limitation. This Perceptron does not able to solve the XOR binary function. [21]
Perceptron can also solve non-linear separable problems provided an appropriate set of first-layer processing elements exists. This processing layer is a fixed processing unit computes a set of functions whose values depend on the input pattern. This processing layer able to ‘change’ the non-linear data set of input pattern becomes linearly separable. Further discussion will be under ‘Preprocessing Layer’.
2.6 Introduction of MATLAB
MATLAB is uses for:
v Math and calculation
v Development of algorithms
v Prototyping, modeling and simulation
v Data analysis, exploration and visualization
v Scientific and engineering graphics
v Application/software development [13]
MATLAB is an interactive system which allows us to solve many technical computing problems, especially those problems involved with vector and matrix formulations. This because MATLAB’s basic data element is an array which does not require dimensioning. This would save user’s time that takes to write a program if compared with non-interactive language such as Fortran or C.
MATLAB already developed and evolved over a period of years by researchers and is still developing. In study environment, MATLAB is advanced courses in engineering, science and mathematics and is a standard instructional tool for introductory. In industry, MATLAB’s tool is widely use for high-productivity research, development and analysis. [13]
Another thing that makes MATLAB so special is its Toolboxes. Toolboxes are MATLAB features a family of application-specific solutions. Toolboxes allow user to learn and apply specialized technology. Toolboxes are comprehensive collections of MATLAB functions (M-files) that extend the MATLAB environment to solve particular classes of problems. Examples of Toolboxes available are fuzzy logic, neural network, signal processing, image processing, wavelets, image acquisition and many more. [13]
2.7 Computer Hardware Architecture
Chapter 3: Methodology
3.1 Image Databases
The first steps that need to do in this project are creation of image database. These images database base will be used for Artificial Neural Network training and testing.
The image database can have different formats. For example, images can be digitalized photographed, 3D dimensional hand and hand drawn. For this project, digitalized photographed were used because they are the most realistic approach. The images are captured using laptop with webcam.
The images database can be any image file format such as ‘.jpg’, ‘.tif’, ‘.bmp’ and many more. Images used by this project are ‘.jpg’ file.
There is an operation need to be carried out for all the images.
Figure 8 below shows example of images database captured using webcam. Those images are converted to grayscale with webcam function.
Figure 9: Example of Train and Test Image
These images will go through a transformation approach called Transformation T [18]. This transformation will convert an image into a feature vector, which will be then compared with other feature vectors of a training set of gestures. This is use Euclidean distance metric. Figure 9 shows how the pattern recognition system works.
Figure 10: Pattern Recognition System
This method is chosen because it is fast and simple algorithms. [14]
One of the aspects of gesture recognition is translation invariance. The position of hand inside the image should not affect the feature vector. This can be done by forming a local histogram of the local orientations. This approach will treat each orientation element the same but independent of location. [14]
Another aspect of gesture recognition is illumination variance which is shown at Figure 10. Comparing a pixel-by-pixel difference of the same image but under different lightning condition would misclassify these two same gestures but orientation histogram is robust in lighting change condition. [14]
Figure 11: Illumination Variance
Therefore, orientation analysis able to provides robustness in illumination changes while histogram able to provides translational invariance. [14]
3.1.1 Orientation Histogram
There are 8 steps in this program to convert those images into orientation histogram in feature vector using MATLAB:
1st Step
The first step in this program is to read those images database. ‘for’ loop is used to read an entire folder of the hand gesture images and store them under MATLAB’s workspace. Those images are captured by using webcam and were made grayscale.
2nd Step
Next step is resizing all the images that stored in MATLAB’s workspace into 320x240 pixels. This size offer enough detail for processing while keeping the processing speed fast.
3nd Step
The images are slightly darkened. This is to make sure all the background images are completely black. An image background not completely black will produce edges at later stages.
4th Step
The next thing to do is to find the edges. 2 filters are used for this step. First filter is for ‘x’ direction (horizontal) x = [0 -1 1] and for ‘y’ direction (vertical) y =. Figure 11 below shows images of the result from convolving a ‘5’ sign with x and y filter. [19]
Figure 12: X-Y filter
5th Step
These two resulting matrices (images) element are then dividing by element. After divided, take the tan-1 (atan) and this will provide us the gradient orientation.
6th Step
Then the images pixel color code is set to single values by using ‘(:,:,1)’ . Example, an image with color in 1 pixel contain 3 types of values (Red = 420, Blue = 240, Green =0). This step is to set these 3 values into same value.
7th Step
Next is rearranging the image block into columns. This can be done by using function im2col built-in inside MATLAB. This step is necessary to display the orientation histogram which is shown at Figure 13. Rose’s function is used to create the angle histogram. Each group of numeric range is shown as one bin.
Figure 13: Orientation Histogram of Image
8th Step
The last step for image processing is converting the radian values of column matrix into degree values. This step is needed in order to scan the vector for values ranging from 0° to 90°. As we can see from the orientation histogram, the values only lie at the first and last quarter. This is because for the elements of x, tan-1(x) is between the range of –π/2 and π/2.
Next is set the histogram bins to 19 bins, which each bin is set to 10°.
The feature vector values of the image are saved in text (.txt) file using ‘fprintf’ function that built-in inside MATLAB. This file will feed into Perceptron as input values.
3.2 Perceptron
As discussed before, Perceptron is one of the Neural Network to ‘learns’ concept. The next part will be based on Perceptron learning rule to train the network to perform pattern recognition. Figure 13 below shows the flow chart for Perceptron program.
3.2.1 Perceptron Process
1st Step: Read text file from database
This text file (training.txt) is generated from combination of all the images feature vector values which obtained in earlier step at Image Processing chapter.
2nd Step: Get target text file from database
The target text file (target.txt) which is contained target vector values are constructed by user according to how many output (hand gesture) set by user. Table 3 below shows the target vector for each hand gesture sign.
Hand Gesture Sign | 1 | 5 | A | C | H | L | V | B |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |
0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | |
0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | |
0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | |
0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | |
0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | |
0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Table 3: Target Vectors
3rd Step: Define number of neurons of the Neural Networkalue, the faster the Perceptron converges. The Perceptron neurons are set to ‘100’ because this value is optimal enough to perform the classification.
4th Step: Initialize pre-processing layer
Before the input data (feature vector) move into network learning layer, the input data have to go through this layer. This layer will transform decimal values of the feature vector into binary matrix form. Figure 14 shows the pre-processing source code in MATLAB.
% INITP generates initial weights % and biases for the network: The number of neurons can set to any value, the higher the v % Initialize pre-processing layer. [W1,b1] = initp(P,S1); % Initialize learning layer. [W2,b2] = initp(S1,T); % The first layer is used to preprocess the input vectors: A1 = simup(P,W1,b1); |
Figure 5: Pre-processing source code
The Perceptron network takes P as input data processes and converted it into X1.X2 which is 100x24 matrices. Note the ‘100’ is the number of neurons and ‘24’ is total training images. The reason for pre-processing is to reduce the dimensionality of the input raw data so the training and simulation would be faster.
5th Step: Initialize learning layer
At this layer, the pre-processed data will be ready to feed into the learning network called Perceptron. This learning layer will train the network to perform the recognition.
6th Step: Train Perceptron
At this stage, the pre-processed data are feed into Perceptron layer. This layer contains the number of neurons where we define during 3rd step. The feature vectors value are feed into network then multiplied with the neurons (weight) and then summed up the total with the bias. After that, the summed values will feed into function (hard-limiter). After go through the function, this will produce an output.
7th Step: Plot Error Graph
The output will compare with the Target vector. After that, if there is an error, the Perceptron network will re-adjust the weights value until there is no error or minimized and then it will stop. Each pass through the input vectors is called epoch. Figure 16 below shows the Perceptron plotting error graph.
Figure 16: Perceptron Plotting Error Graph
3.3 Testing Perceptron Network Implementation
Figure 17: Network Testing Flow Chart
1st Step: Select test set image / Get Image from Webcam
Once the Perceptron Network is completed trained, the network is ready for testing. First, we select test set of image which already converted into feature vector form or get image through webcam then process it into feature vector form. The image can be come from any type of hand gesture sign, not necessary to be trained hand gesture sign since this is just for testing.
Then feed the feature vector into the trained network.
2nd Step: Process by Perceptron Network
Now the image in feature vector form is feed into the network. These feature vector values will go through all the adjusted weights (neurons) inside the Perceptron network and the will come out an output.
3rd Step: Display Matched Output
The system will display the matched output which presented in vector format. Improvement is made so the output will display both vector format and the meaning of gesture sign in graphical form.
Chapter 4: Results
4.1 Results for Each Gesture Sign
After performed those steps at previous pages, now is time to get the results. Table on next pages showed the results obtained from the system. Specific hand gesture signs only will be used as comparison because each image is varies from each other. Therefore, results for each image would not be same at all.
The Table provided the information about the pre-processing operations under ‘effect added’ performed such as blurring, noise and also different databases were used. These operations can be performed using ‘Adobe Photoshop’ and all the value of pre-processing able to obtain because ‘Adobe Photoshop’ offer such great functions. Examples of functions that had been used are:
Ø Sharpen – Make the images clearer by enhance the edges inside the images.
Ø Gaussian Noise – Is statistical noise that produces hazy effect on image.
Ø Motion Blur – Is to blur the image such as image taken when the object is moving.
Ø Lightning Effect – Is the adjustable light effect position on the image.
The system display result in form of column format. Each column is a classified image vector which set earlier at ‘Table 3: Target Vector’. But improvement had been made to make the results display in graphical form.
Figure 18: One
Image # | Effect Added | Same Database | Classified |
1 | - | YES | CORRECTLY |
2 | Sharpen | YES | CORRECTLY |
3 | Sharpen More, 3 pixel | YES | UNKNOWN |
4 | Gaussian Noise 1.0 | YES | CORRECTLY |
5 | Gaussian Noise 1.5 | YES | WRONGLY, AS ‘1’ OR ‘L’ |
6 | Gaussian Noise 2.0 | YES | UNKNOWN |
7 | Motion Blur, 10 pixel | YES | UNKNOWN |
8 | Noise 5% | YES | WRONGLY, as ‘5’ |
9 | Lighting effect | YES | WRONGLY, as ‘L’ |
10 | - | DIFFERENT | CORRECTLY |
11 | - | DIFFERENT | WRONGLY, AS ‘L’ |
12 | Noise 5% | DIFFERENT | UNKNOWN |
Table 4: ‘One’ Test Results
For the ‘1’ classification error is quite big. From the results, the ‘1’ classified mostly as ‘L’ especially under noise. Once the Gaussian Noise above 1.0, the network unable to classify the image correctly. Same as sharpen, once perform sharpen more, which at 3 pixel the system unable to classify it. For different database, classification not that robust as well, as the system classify wrong as ‘L’ at image 11. Classification mostly is wrong because ‘1’ sign is similar the various sign that used in training such as ‘A’ and ‘L’.
Figure 19: Five
Image # | Effect Added | Same Database | Classified |
1 | - | YES | CORRECTLY |
2 | Sharpen | YES | CORRECTLY |
3 | Sharpen More, 3 pixel | YES | UNKNOWN |
4 | Gaussian Noise 1.0 | YES | CORRECTLY |
5 | Gaussian Noise 1.5 | YES | UNKNOWN |
6 | Gaussian Noise 2.0 | YES | |
7 | Motion Blur, 10 pixel | YES | UNKNOWN |
8 | Noise 5% | YES | CORRECTLY |
9 | Lighting effect | YES | UNKNOWN |
10 | - | DIFFERENT | UNKNOWN |
11 | - | DIFFERENT | UNKNOWN |
12 | Noise 5% | DIFFERENT | UNKNOWN |
Table 5: ‘Five’ Test Results
For the ‘5’ classification error is quite big. From the results, the ‘5’ classify as unknown for most of the case. Once the Gaussian Noise above 1.0, the network unable to classify the image correctly. Same as sharpen, once perform sharpen more, which at 3 pixel the system unable to classify it. For different database, classification not robust at all, as the system classify wrong all. This is happened because finger produces a lot of edges, thus the system unable to classify it correctly.
Figure 20: H
Image # | Effect Added | Same Database | Classified |
1 | - | YES | CORRECTLY |
2 | Sharpen | YES | CORRECTLY |
3 | Sharpen More, 3 pixel | YES | UNKNOWN |
4 | Gaussian Noise 1.0 | YES | CORRECTLY |
5 | Gaussian Noise 1.5 | YES | WRONGLY, as ‘H’ or ‘A’ |
6 | Gaussian Noise 2.0 | YES | UNKNOWN |
7 | Motion Blur, 10 pixel | YES | UNKNOWN |
8 | Noise 5% | YES | WRONGLY, as ‘5’ |
9 | Lighting effect | YES | CORRECTLY |
10 | - | DIFFERENT | CORRECTLY |
11 | - | DIFFERENT | WRONGLY, as ‘H’ or ‘A’ |
12 | Noise 5% | DIFFERENT | UNKNOWN |
Table 6: ‘H’ Test Results
For the ‘H’ classification error is small, is the best among other ASL test. From the results, the ‘H’ classify as correctly under low effect level such as sharpen, Gaussian Noise not more than 1 pixel. Once the Gaussian Noise above 1.0, the network classify it as 2 signs which are ‘H’ or ‘A’. For sharpen, once perform sharpen more, which at 3 pixel the system unable to classify it. For different database, classification is quite robust, as long as the angle of the image taken is kept same. This because ‘H’ sign is quite different from other sign that used for training.
Figure 21: L
Image # | Effect Added | Same Database | Classified |
1 | - | YES | CORRECTLY |
2 | Sharpen | YES | CORRECTLY |
3 | Sharpen More, 3 pixel | YES | WRONGLY, as ‘1’ |
4 | Gaussian Noise 1.0 | YES | CORRECTLY |
5 | Gaussian Noise 1.5 | YES | WRONGLY, AS ‘A’ |
6 | Gaussian Noise 2.0 | YES | UNKNOWN |
7 | Motion Blur, 10 pixel | YES | UNKNOWN |
8 | Noise 5% | YES | UNKNOWN |
9 | Lighting effect | YES | UNKNOWN |
10 | - | DIFFERENT | CORRECTLY |
11 | - | DIFFERENT | WRONGLY, AS ‘1’ |
12 | Noise 5% | DIFFERENT | WRONGLY, as ‘1’ or ‘L’ |
Table 7: ‘L’ Test Results
For the ‘L’ classification error is not that satisfied as well. From the results, the ‘L’ classify as correctly under low effect level such as sharpen, Gaussian Noise not more than 1 pixel. Once the Gaussian Noise above 1.0, the network classify it as unknown or wrongly. For sharpen, once perform sharpen more, which at 3 pixel the system classify it wrongly. For different database, the system classifies it mostly wrong, either as ‘1’ or ‘L’. This may because the ‘L’ sign almost similar to ‘1’ sign.
4.2 Testing with Various Untrained Signs
Figure 22: Various Sign
Figure 22 above shows the three signs that never been used for training the network but for testing purpose only. So this may not need full test of ASL because these signs either classify as unknown or wrongly. For the ‘thumb’ sign, the network classify unknown mostly but sometimes as ‘A’ sign. For ‘W’ sign, the network is either classifies it as ‘V’ or unknown. For the flipped ‘5’ sign, the network classifies it as unknown.
Chapter 5: Discussion
1) From the results, we can see that the system is very robust if ‘exact’ same image as training were applied to the system.
2) This system is quite robust in small radius of blur and noise. It also good in classification of different database which is different size of image and gesture inside hand.
3) But, we can see that this system is not safe and robust enough. Moreover, if we use back similar image from result and we try applied to the system again, the result from testing may not similar. This is because the initial weights and bias from the system is not same, so each time the system run the test, result is not same but won’t be much difference.
4) Initially during test process, the classification produces lot of errors. Then found out that the background taken from webcam is not completely black thus the background generated some edges as well. Solution taken is slightly darken the images with MATLAB code.
5) We cannot set certain threshold from this network as well because the initial system weights and bias not same each time.
6) The tolerance of the images for the system such as blur and noise is limited to certain values. This means the system only can classify good quality images unless the system is train with the blurred images.
7) For the Perceptron number of neurons, larger number of neurons definitely make the system converges faster, but it will more prone to provide more than a single result. For example, while testing ‘A’, the result mostly is ‘A’ and ‘1’, the system come out with 2 results. But if put the smaller the number of neurons, the Perceptron unable to converges to zero which lead the system unable to classify at all. This is something that not that straight forward; user has to spend time for testing to get to the optimal setting.
Chapter 6: Conclusion
The objectives and goal of this project are achieved successfully. This project successfully implemented pattern recognition using Neural Network in MATLAB, able to obtain static image as input through webcam on spot and provide user friendly application. What important is there is no extra hardware to perform pattern recognition other than webcam and a laptop itself.
But this project is not robust and safe enough as it each time the network retrain, the results is not guaranteed to be same. Apart from not stable enough, there are a lot factors that need to be consider such as number of layers and number of neurons. This setting has no straight forward explanation.
The approach developed by McConnell (orientation histogram) may not good enough to allow the network classify each gesture correctly especially with different databases. Further improvement could be continued to enhance this pattern recognition approach.
Chapter 7: Future Improvement
1) Improvement could be made on McConell idea of orientation histogram. There are other approaches that could be used to perform classification using Neural Network. Example, Euclidean distance is a straight forward approach to it.
2) Next improvement is making the system able to recognize more gesture signs. Since this system only able to recognize 8 types of gesture signs, small modification to the coding, the system able to recognize more than 8 type gesture signs.
3) Another improvement could be made is the background of images. Since this data sets background are deliberately made it black, future improvement could develop an algorithms which could ignore the background color as background color is static.
4) Last improvement is change the database image into real live video database. Real live video input could straight recognize the gesture sign without having to take picture.
Chapter 8: References
8.1 Books & Journals
[1] Gallaudet University Press (2004). 1,000 Signs of life: basic ASL for everyday conversation. Gallaudet Univerisity Press.
[2] Don Harris, Constantine Stephanidis, Julie A Jacko (2003). Human Computer Interaction: Theory and Practice. Lawrence Erlbaum Associates.
[3] Hans-Jeorg Bullinger (1998). Human-Computer Interaction
[4] Kim Daijin (2009). Automated Face Analysi: Emerging Technologies and Research. Ideal Group Inc (IGI).
[5] C. Nugent and J.C. Augusto (2006). Smart Homes and Beyond. IOS Press.
[11] Minsky and Papert’s 1989 book, Perceptron
[15] Allianna J. Maren, Craig T. Harston. Robert M. Pap (1990). Handbook of Neural Computing Applications. Michigan: Academic Press.
[16] Sergios Theodoridis, Konstantinos Koutroumbas (2006). Pattern Recognition. 3rd edition. Academic Press.
[23] Klimis Symeonidis (2000). Hand Gesture Recognition using Neural Network. University of Surrey.
8. 2 Websites
[9] http://www.makhfi.com/tutorial/introduction.htm (5/5/2009)
[10] http://www.waset.org/pwaset/v38/v38-84.pdf (10/5/2009)
[13] http://www.merl.com/papers/docs/TR94-03.pdf (13/5/2009)
[14] http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html (14/5/2009)
[17] http://www.cis.hut.fi/ahonkela/dippa/node41.html (20/5/2009)
[18] http://www.math.ucsd.edu/~lindblad/102/l8.pdf (1/6/2009)
[19] http://www.mathworks.com/matlabcentral/fileexchange/8060 (4/6/2009)
[21] http://cse.stanford.edu/class/sophomore-college/projects-00/neural-%09networks/Neuron/index.html (26/6/2009)
[22] http://www.vaughns-1-pagers.com/computer/pc-block-diagram.htm (27/6/2009)
5 comments:
sir pls send me future colour face recognition code ,doing my project sir have code send me please
sir , will you be able to send the matlab codes for this project that is "hand gesture recognition using neural network" . it'll help me studying image processing and i'll further modify this code to build my project .
if possible mail me at ghaisahil11@gmail.com
sir can u send me the source code
email id: poojabafnabeauty@gmail.com
Sir please send me code plzzzzzzzzzzzzzzzzz
sir pls send me source code
narendrac266@gmail.com
Post a Comment