UKH Journal of Science and Engineering | Volume 5 • Number 1 • 2021 80
Adapting Viola-Jones Method for Online Hand / Glove
Taib Shamsadin Abdulsamad
, Mahmud Abdulla Mohammad
, Faraidoon Hassan Ahmad
Department of Computer, College of Basic Education, University of Raparin, Sulaymaniyah, Iraq
Department of Computer Science, College of Science and Technology, University of Human Development,
Sulaymaniyah, Iraq
Department of Information Technology, College of Science and Technology, University of Human Development,
Sulaymaniyah, Iraq
Department of Pharmacognosy & Pharmaceutical Chemistry, College of Pharmacy, University of Sulaimani,
Sulaymaniyah, Iraq
Department of Information Technology, University College of Goizha, Sulaymaniyah, Iraq
[email protected],
faraidoon.ah[email protected],
1. Introduction
Nowadays, due to the fast growing use of image processing, it forms a core research area within engineering and
computer science disciplines. Digital image processing techniques help in the manipulation of the digital images by using
computers. Image processing has numerous applications like visual inspection, remotely sensed image analysis, medical
diagnosis, defense surveillance, content-based image retrieval (CBIR), image and video compression, moving object
tracking etc. (Acharya & Ray, 2005).
An object detector’s objective is to find or recognize all object instances of one or more given object class regardless
of scale, location, pose, view with respect to the camera, partial occlusions, and illumination conditions (Verschae &
Ruiz-del-Solar, 2015). Object detection has been playing a key role in many applications, which arise in many different
fields including industrial automation, consumer electronics, medical imaging, military, video surveillance (Murthy et al.,
2020), food safety (Cevallos et al., 2020), autonomous vehicles, and situational awareness (Mohammad et al., 2016;
Access this article online
Received on: December 30, 2020
Accepted on: Febraury 25, 2021
Published on: June 30, 2021
DOI: 10.25079/ukhjse.v5n1y2021.pp80-90e.v5n1y2021.ppxx-xx
E-ISSN: 2520-7792
Copyright © 2021 Taib et al. This is an open access article with Creative Commons Attribution Non-Commercial No Derivatives License 4.0 (CC
BY-NC-ND 4.0)
Research Article
This article proposes a method for hand identification, adapting the method of Viola-Jones for identifying two
different objects. The main objective of this work is to solve the problems of hand identification. Thus, our approach
based on learning for two objects as one package. Also, the proposed method folds into three parts; the first part is
training for both objects, second detection of both objects, and third the identification step to identify if the hand
is wearing a glove or not, then labeling each one with a suitable state. Moreover, to test our method, we have
proposed a new dataset, which includes a variety of cases with different compositions of hand. As a result, 8 cases
were used to test the method. The method was able to detect a human hand successfully. Additionally, it could
identify whether the hand was or was not wearhing a glove. The accuracy of detecting a hand without a glove was
about 63%, and the accuracy of detecting a hand with a glove on was about 61%. Even though the tests scored
different accuracy, as a first step towards solving this problem, it is a big achievement to even reach this level of
Keywords: Computer Vision, Image Processing, Object Detection, Viola-Jones, Hand Detection, Identification.
UKH Journal of Science and Engineering | Volume 5 • Number 1 • 2021 81
Muhammad, 2016). More precisely, the applications of object detection include; pedestrian detection, road detection,
lane detection, obstacle detection, face detection, crop detection, and hand detection.
Hand identification is considered as an important application that is strongly connected to our health. Indeed, in some
circumstances, it is crucial to monitor people for checking to ensure whether they are wearing gloves or not especially,
in industrial related, food related, and patient related environments. Also, according to WHO reports, wearing gloves
and a mask are two important factors to reduce the transmission of COVID-19 pandemic disease (Ahmed et al., 2020;
Dey et al., 2021). Thus, in these situation, especially in medical centers, monitoring people to ensure that they are wearing
gloves is crucial. Hand identification methods offer a monitoring process by identifying the hands of those people who
are not wearing gloves, along with those who are wearing them.
Generally, hand detection, is the process of extracting and bounding a box of the hand region from a given scene. It
is an advanced topic and has received more attention from researchers for hand gesturing and posturing recognition
A number of detection methods have been used in the literature, still Viola-Jones (Viola & Jones, 2001) is one of the
fastest and more robust learning-based object detector methods with high detection rate, and it plays an important role
in many detection and recognition fields. The Viola-Jones is a well-known and robust appearance-based face detector
method. Firstly, the query image is represented in the form of an “Integral Image”, which makes feature computation
very fast, the integral image for any pixel is equal to the sum of pixels above and to the left of it. Viola-Jones uses
AdaBoost classifier that interactively builds a powerful classifier from a conjunction of simple classifiers with specific
weights, a series of simple classifiers applied to every sub-region in the image, the sub-region classified as "Not Face" if
it fails to pass in any classifier. When a classifier passes an image region, it goes to the next classifier in the series, the
image region will be classified as "Face" if it passes through all classifiers in the series (Hendra et al., 2019).
Authors (Da’San et al., 2015; Hazim et al., 2016) used the Viola-Jones algorithm for face region detecting and cropping
for face recognition systems. Ahmad (2015) presented a real time ethnicity identification system which the Viola-Jones
method applied to extract the face area from the rest of the images. Mathias and Matthew (Kolsch & Turk, 2004),
proposed a detection method depending on Viola-Jones with three contributions: frequency analysis-based method for
instantaneous estimation of class separability without the need for any training. They built detectors for the most
promising candidates and they discovered that with more expressive feature types the classification accuracy increases.
In Nguyen et al. (2012), based on Viola-Jones work a new approach was addressed for hand detection by detecting
the internal region of the hand using its local features without a background. Chouvatut et al. (2015) solved the problem
of hand detection from various orientation angles of hand positions using the Viola-Jones detector and SAMME
classifier. An automatic hand gesture recognition framework was prevented using the steps in the Viola-Jones method
for detection and for the recognition phase Hu invariant moments feature vectors of the detected hand gesture are
extracted and a Support Vector Machines (SVMs) classifier is trained for final recognition (Yun & Peng, 2009).
Kovalenko et al. (2014), proposed a real time system for hand gesture recognition based on the Viola-Jones detector for
the hand detection and thereafter used the Continuously Adaptive Mean Shift Algorithm (CAMShift) to track the
position of the extracted hand in the image. Mao et al. (2009) combined Viola-Jones detection algorithm with the skin-
color detection method to perform hand detection and tracking against complex backgrounds. The salience and the fast
spread of Covid-19 coronavirus epidemic caught the attention of researchers to new research fields. Wang et al. (2020)
proposed a system for a facial mask detection task and a masked face recognition task using three types of masked face
datasets, including Masked Face Detection Dataset (MFDD), Real-world Masked Face Recognition Dataset (RMFRD)
and Simulated Masked Face Recognition Dataset (SMFRD).
In the literature, much work has been conducted in the area of hand detection, hand identification problems still remain
unsolved, which is a difficult topic, due to the fact that a hand with a glove on is very similar to a hand with a glove off.
Hence, this work is aimed to propose the Viola-Jones method to be used for hand identification.
To assess the performance of the proposed method, the paper introduces a new dataset consisting of real-world videos
illustrating several cases of hands with gloves on and hands with gloves off. The experimental results indicate that the
proposed framework is capable of identifying hands with convenience accuracy. The remainder of the paper is organized
as follows. In Section 2, the proposed method is discussed. The definition and format of the proposed dataset is
discussed in Section 3. Experimental results are given in Section 4. Finally, conclusions and future work are discussed
in Section 5.
2. The Proposed Method
The research focuses on identification process by adapting the Viola-Jones algorithm for hand state. It is obvious that
the Viola-Jones algorithm has been designed for single object detection. In this work we adapted this method to use to
identify two different objects. Thus, our approach was based on learning both objects as one package, i.e., hands with
UKH Journal of Science and Engineering | Volume 5 • Number 1 • 2021 82
gloves on and hands with gloves off. The method was successfully able to detect a human hand, and additionally
identified it with or without a glove.
The proposed method consists of three parts; the first part is training for both objects, the second is detection of both
objects, and the third part is the identification step for identifying if a hand is wearing a glove or not and then labeling
each one with the suitable state (i.e. the hand with or without a glove). Figure 1 shows the general scheme of the system
Figure 1. Operation of Proposed Methods.
2.1. Hand detection
The dataset used was prepared for both the training and the testing part. During training for hand detection, the method
had to have a positive result and a negative result for the Region of Interest (RoI), thus, a number of images/frames
were used to in training to indicate a positive result and a negative result. A positive result showed a cropped hand and
a negative result showed no hand at all. Then, these positive and negative results were fed into the Viola-Jones to build
a model for detecting a hand during the training step. As a result, an “XML file” was produced and this was known as
a model for hand detection. Figure 2 describes and illustrates how to apply the training part for hands without wearing
a glove from video frames in our dataset (i.e. Training Data).
Figure 2. Training steps for hand detection.