Human Motion Analysis: A Review

Text Preview:
Computer Vision and Image Understanding
Vol. 73, No. 3, March, pp. 428440, 1999
Article ID cviu.1998.0744, available online at on

                                              Human Motion Analysis: A Review
                                                               J. K. Aggarwal and Q. Cai
       Computer and Vision Research Center, Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, Texas 78712

                                                   Received October 22, 1997; accepted September 25, 1998

                                                                                     human body, which is a nonrigid form. Our discussion covers
   Human motion analysis is receiving increasing attention from                      three areas: (1) motion analysis of the human body structure,
computer vision researchers. This interest is motivated by a wide                    (2) tracking human motion without using the body parts from
spectrum of applications, such as athletic performance analysis,                     a single view or multiple perspectives, and (3) recognizing hu-
surveillance, manmachine interfaces, content-based image storage                    man activities from image sequences. The relationship among
and retrieval, and video conferencing. This paper gives an overview                  these three areas is depicted in Fig. 1. Motion analysis of the hu-
of the various tasks involved in motion analysis of the human body.                  man body usually involves low-level processing, such as body
We focus on three major areas related to interpreting human mo-
                                                                                     part segmentation, joint detection and identification, and the re-
tion: (1) motion analysis involving human body parts, (2) tracking a
                                                                                     covery of 3D structure from the 2D projections in an image
moving human from a single view or multiple camera perspectives,
and (3) recognizing human activities from image sequences. Motion                    sequence. Tracking moving individuals from a single view or
analysis of human body parts involves the low-level segmentation                     multiple perspectives involves applying visual features to detect
of the human body into segments connected by joints and recov-                       the presence of humans directly, i.e., without considering the ge-
ers the 3D structure of the human body using its 2D projections                      ometric structure of the body parts. Motion information such as
over a sequence of images. Tracking human motion from a single                       position and velocity, along with intensity values, is employed
view or multiple perspectives focuses on higher-level processing, in                 to establish matching between consecutive frames. Once feature
which moving humans are tracked without identifying their body                       correspondence between successive frames is solved, the next
parts. After successfully matching the moving human image from                       step is to understand the behavior of these features throughout
one frame to another in an image sequence, understanding the hu-                     the image sequence. Therefore, our discussion turns to recogni-
man movements or activities comes naturally, which leads to our
                                                                                     tion of human movements and activities.
discussion of recognizing human activities. c 1999 Academic Press
                                                                                        Two typical approaches to motion analysis of human body
                                                                                     parts are reviewed, depending on whether a priori shape models
                            1. INTRODUCTION                                          are used. Figure 2 lists a number of publications in this area
                                                                                     over the past years. In both model-based and nonmodel-based
   Human motion analysis is receiving increasing attention from                      approaches, the representation of the human body evolves from
computer vision researchers. This interest is motivated by ap-                       stick figures to 2D contours to 3D volumes as the complexity
plications over a wide spectrum of topics. For example, seg-                         of the model increases. The stick figure representation is based
menting the parts of the human body in an image, tracking the                        on the observation that human motion is essentially the move-
movement of joints over an image sequence, and recovering the                        ment of the supporting bones. The use of 2D contours is directly
underlying 3D body structure are particularly useful for analy-                      associated with the projection of the human figure in images.
sis of athletic performance, as well as medical diagnostics. The                     Volumetric models, such as generalized cones, elliptical cylin-
capability to automatically monitor human activities using com-                      ders, and spheres, attempt to describe the details of a human body
puters in security-sensitive areas such as airports, borders, and                    in 3D and, therefore, require more parameters for computation.
building lobbies is of great interest to the police and military.                    Various levels of representations could be used for graphical
With the development of digital libraries, the ability to automat-                   animation with different resolutions [2].
ically interpret video sequences will save tremendous human                             With regard to the tracking of human motion without involv-
effort in sorting and retrieving images or video sequences us-                       ing body parts, we differentiate the work based on whether the
ing content-based queries. Other applications include building                       subject is imaged at one time instant by a single camera or from
manmachine user interfaces, video conferencing, etc. This pa-                       multiple perspectives using different cameras. In both configu-
per gives an overview of recent development in human motion                          rations, the features to be tracked vary from points to 2D blobs
analysis from image sequences using a hierarchical approach.                         and 3D volumes. There is always a trade-off between feature
   In contrast to our previous review of motion estimation of a                      complexity and tracking efficiency. Lower-level features, such
rigid body [1], this survey concentrates on motion analysis of the                   as points, are easier to extract but relatively more difficult to

1077-3142/99 $30.00
Copyright c 1999 by Academic Press
All rights of reproduction in any form reserved.
                                                         HUMAN MOTION ANALYSIS                                                           429

                            FIG. 1. Relationship among the three areas of human motion analysis addressed in the paper.

track than higher-level features such as blobs and 3D volumes.             naturally extended to recognition of a whole body movement,
Most of the work in this area is listed in Fig. 3.                         we also include them in our discussion.
   To recognize human activities from an image sequence, two                  This paper is an extension of the review in [3]. The organi-
types of approaches were addressed: approaches based on a                  zation of the paper is as follows: Section 2 reviews work on
state-space model and those using the template matching tech-              motion analysis of the human body structure. Section 3 covers
nique. In the first case, the features used for recognition could be       research on the higher-level tasks of tracking a human body as a
points, lines, and 2D blobs. Methods using template matching               whole. Section 4 extends the discussion to recognition of human
usually apply meshes of a subject image to identify a particular           activity in image sequences based upon successfully tracking
movement. Figure 4 gives an overview of the research in this               the features between consecutive frames. Finally, section 5 con-
area. In some of the publications, recognition is conducted us-            cludes the paper and delineates possible directions for future re-
ing only parts of the human figure. Since these methods can be             search.

                                          FIG. 2. Past research on motion analysis of human body parts.
430                                                        AGGARWAL AND CAI

                                  FIG. 3. Past research on tracking of human motion without using body parts.

  2. MOTION ANALYSIS OF HUMAN BODY PARTS                                 ence between the two methodologies is in establishing feature
                                                                         correspondence between consecutive frames. Methods which
   This section focuses on motion analysis of human body parts,          assume a priori shape models match the real images to a prede-
i.e., approaches which involve 2D or 3D analysis of the hu-              fined model. Feature correspondence is automatically achieved
man body structure throughout image sequences. Convention-               once matching between the real images and the model is estab-
ally, human bodies are represented as stick figures, 2D contours,        lished. When no a priori shape models are available, however,
or volumetric models [4]; therefore, body segments can be ap-            correspondence between successive frames is based upon pre-
proximated as lines, 2D ribbons, and 3D volumes, accordingly.            diction or estimation of features related to position, velocity,
Figures 5, 6, and and 7 show examples of the stick figure, 2D con-       shape, texture, and color. These two methodologies can also be
tour, and elliptical cylinder representations of the human body,         combined at various levels to verify the matching between con-
respectively. Human body motion is typically addressed by the            secutive frames and, finally, to accomplish more complex tasks.
movement of the limbs and hands [58], such as the velocities            Since we have surveyed these methods previously in [4], we will
of the hand or limb segments, or the angular velocity of various         restrict out discussion to very recent work.
body parts.
   Two general strategies are used, depending upon whether in-
                                                                         2.1. Motion Analysis without a priori Shape Models
formation about the object shape is employed in the motion
analysis, namely, model-based approaches and methods which                  Most approaches to 2D or 3D interpretation of human body
do not rely on a priori shape models. Both methodologies fol-            structure focus on motion estimation of the joints of body seg-
low the general framework of: (1) feature extraction, (2) feature        ments. When no a priori shape models are assumed, heuristic
correspondence, and (3) high-level processing. The major differ-         assumptions are usually used to establish the correspondence

                                               FIG. 4. Past work on human activity recognition.
                                                            HUMAN MOTION ANALYSIS                                                             431

                                                                               FIG. 7. A volumetric human model (derived from Hogg's work [15]).

                                                                             projected positions and velocities. Later, Webb and Aggarwal
                                                                             [11, 12] extended to 3D structure recovery of Johansson-type fig-
                                                                             ures in motion. They imposed the fixed axis assumption, which
FIG. 5. A stick-figure human model (based on Chen and Lee's work [11]).      assumes that the motion of each rigid object (or part of an artic-
                                                                             ulated object) is constrained so that its axis of rotation remains
                                                                             fixed in direction. Therefore, the depth of the joints can be esti-
of joints between successive frames. These assumptions im-                   mated from their 2D projections. Detailed review of [912] can
pose constraints on feature correspondence, decrease the search              be found in [4]. All of these approaches inevitably demand a high
space, and eventually, result in a unique match.                             degree of accuracy in extracting body segments and joints. The
   The simplest representation of a human body is the stick fig-             segmentation problem is avoided by directly using MLDs that
ure, which consists of line segments linked by joints. The motion            implies their restrictions to human images with natural clothing.
of joints provides the key to motion estimation and recognition                 Another way to describe the human body is using 2D contours.
of the whole figure. This concept was initially considered by                In such descriptions, the human body segments are analogous
Johansson [9], who marked joints as moving light displays                    to 2D ribbons or blobs. For example, Shio and Sklansky [15]
(MLD). Along this vein, Rashid [10] attempted to recover a con-              focused their work on 2D translational motion of human blobs.
nected human structure with projected MLD by assuming that                   The blobs were grouped based on the magnitude and direction of
points belonging to the same object have higher correlations in              the pixel velocity which were obtained using techniques similar
                                                                             to the optical flow method [16]. The velocity of each part was
                                                                             considered to converge to a global average value over several
                                                                             frames. This average velocity corresponds to the motion of the
                                                                             whole human body and leads to identification of the whole body
                                                                             via region grouping of blobs with a similar smoothed velocity.
                                                                             Kurakake and Nevatia [17] attempted to locate the joint locations
                                                                             in images of walking humans by establishing correspondence
                                                                             between extracted ribbons. They assumed small motion between
                                                                             two consecutive frames, and feature correspondence was con-
                                                                             ducted using various geometric constraints. Joints were finally
                                                                             identified as the center of the area, where two ribbons overlap.
                                                                             Recent work by Kakadiaris et al. [18, 19] focused on body part
                                                                             decomposition and joint location from image sequences of the
                                                                             moving subject using a physics-based framework. Unlike [17],
                                                                             where joints are located from shapes, here the joint is revealed
                                                                             only when the body segments connected to it involve motion.
                                                                             In the beginning, the subject image is assumed to be one de-
FIG. 6. A 2D contour human model (similar to Leung and Yang's model [26]).   formable model. As the subject moves and new postures occur,
432                                                      AGGARWAL AND CAI

multiple new models are produced to replace the old ones, with       between the real images and the model. The drawback to this
each of them representing an emerging subpart. Joints are de-        model is that it is view-based and sensitive to changes of the per-
termined based on the relative motion and shape of two moving        spective angle at which the images are captured. Huber's human
subparts. These methods usually require small image motion           model [23] is a refined version of the stick figure representa-
between successive frames which will not be a major concern as       tion. Joints are connected by line segments with a certain degree
the video sampling rate increases. Our main concern is still seg-    of relaxation as "virtual springs." Thus, this articulated kine-
mentation under normal circumstances. Among the addressed            matic model behaves analogously to a mass-spring-damper sys-
methods, Shio and Sklansky [15] relied on motion as the cue          tem. Motion and stereo measurements of joints are confined to a
for segmentation, while the latter two approaches are based on       three-dimensional space called proximity space (PS). The human
intensity or texture. To our best knowledge, robust segmentation     head serves as the starting point for tracking all PS locations.
technologies are yet to be developed. Since this problem is the      In the end, particular gestures were recognized based on the PS
major obstacle to most of the work involving low-level process-      states of the joints associated with the head, torso, and arms.
ing, we will not mention it repeatedly in later discussions.         The key to solving this problem requires the 3D positions of the
   Finally, we want to address the recent work by Rowley and         joints in an image sequence. A recent publication by Iwasawa
Rehg [20] which focuses on the segmentation of optical flow          [24] focused on real-time extraction of stick figures from monoc-
fields of articulated objects. It is an extension to motion anal-    ular thermal images. The height of the human image and the
ysis of rigid objects using the expectation-maximization (EM)        distance between the subject and the camera were precalibrated.
algorithm [21]. Compared to [21], the major contribution of [20]     Then the orientation of the upper half of the body was calculated
is to add kinematic motion constraints to each pixel data. The       as the principle axis of inertia of the human silhouette. Signifi-
strength of this work lies in its combination of motion segmen-      cant points, such as the top of the head and the tips of the hands
tation and estimation in EM computation, i.e., segmentation is       and feet, were heuristically located as the extreme points farthest
accomplished in the E-step, and motion analysis in the M-step.       from the center of the silhouette. Finally, major joints such as the
These two steps are computed iteratively in a forwardbackward       elbows and knees were estimated, based on the positions of the
manner to minimize the overall energy function of the whole im-      detected points through genetic learning. The drawback to this
age. The motion addressed in the paper is restricted to 2D affine    method is, again, that it is view-oriented and gesture-restricted.
transforms. We would expect to see its extension to 3D cases         For example, if the human arms are placed in front of the body,
under perspective projections.                                       there is no way to extract the finger tips using this method and,
                                                                     therefore, it fails to locate the elbow joints.
2.2. Model-Based Approaches
                                                                        Niyogi and Adelson [8] pursued another route to estimate
   In the above subsection, we examined several approaches to        the joint motion of human body segments. They first exam-
motion analysis that do not require a priori shape models. In this   ined the spatial-temporal (XYT) braided pattern produced by
type of approach, which is necessary when no a priori shape          the lower limb trajectories of a walking human and conducted
models are available, it is typically more difficult to establish    gait analysis for coarse human recognition. Then the projection
feature correspondence between consecutive frames. Therefore,        of head movements in the spatial-temporal domain was located,
most methods for the motion analysis of human body parts apply       followed by the identification of other joint trajectories. These
predefined models for feature correspondence and body struc-         joint trajectories were then utilized to outline the contour of a
ture recovery. Our discussion will be based on the representa-       walking human, based on the observation that the human body
tions of various models.                                             is spatially contiguous. Finally, a more accurate gait analysis
   Chen and Lee [22] recovered the 3D configuration of a moving      was performed using the outlined 2D contour, which led to a
subject according to its projected 2D image. Their model used 17     fine-level recognition of specific humans. The major concern
line segments and 14 joints to represent the features of the head,   we have with this work is how to obtain these XYT trajectories
torso, hip, arms, and legs (shown in Fig. 5). Various constraints    in real image sequences without attaching specific sensors to the
were imposed for the basic analysis of the gait. The method          head and feet during image acquisition.
was computationally expensive, as it searched through all pos-          In Akita's work [25], both stick figures and cone approxima-
sible combinations of 3D configurations, given the known 2D          tions were integrated and processed in a coarse-to-fine manner.
projection, and required accurate extraction of 2D stick figures.    A key frame sequence of stick figures indicates the approximate
Bharatkumar et al. [7] also used stick figures to model the lower    order of the motion and spatial relationships between the body
limbs of the moving human body. They aimed at constructing           parts. A cone model is included to provide knowledge of the
a general kinematic model for gait analysis in human walking.        rough shape of the body parts, whose 2D segments correspond
Medial-axis transformations were applied to extract 2D stick         to the counterparts of the stick figure model. The preliminary
figures of the lower limbs. The body segment angle and joint         condition to this approach is to obtain these key frames prior
displacement were measured and smoothed from real image se-          to body part segmentation and motion estimation. Perales and
quences, and then a common kinematic pattern was detected            Torres also made use of both stick figure and volumetric rep-
for each walking cycle. A high correlation (>0.95) was found         resentations in their work [26]. They introduced a predefined
Download Link:
Share Link: Forum Link:

More on Science & Technology

  • Picture: Math Boxes - Everyday Math - Login

    Math Boxes – Everyday Math – Login

    File Size: 1,819.53 KB, Pages: 5, Views: 1,477,419 views

    Math Boxes Objectives To introduce My Reference Book; and to introduce the t Math Boxes routine. ePresentations eToolkit Algorithms EM Facts Family Assessment Common Curriculum Interactive Practice Workshop Letters Management Core State Focal Points Teacher's GameTM Standards Lesson Guide Teaching the Lesson Ongoing Learning …
  • Picture: A Study of the Relationship Between Students Anxiety and

    A Study of the Relationship Between Students Anxiety and

    File Size: 72.91 KB, Pages: 7, Views: 1,462,670 views

    US-China Education Review B 4 (2011) 579-585 Earlier title: US-China Education Review, ISSN 1548-6613 A Study of the Relationship Between Students' Anxiety and Test Performance on State-Mandated Assessments Rosalinda Hernandez, Velma Menchaca, Jeffery Huerta University of Texas Pan American, Edinburg, USA This study examined whether …


    File Size: 534.22 KB, Pages: 27, Views: 1,453,484 views

    HIGH-EFFICIENCY UPFLOW FURNACE INSTALLER'S INFORMATION MANUAL D ES IG N CE R TI F I ED ATTENTION, INSTALLER! After installing the ATTENTION, USER! Your furnace installer should furnace, show the user how to turn off gas and electricity to give you the documents listed on …
  • Picture: Raven/Johnson Biology 8e Chapter 12 1.

    Raven/Johnson Biology 8e Chapter 12 1.

    File Size: 99.62 KB, Pages: 9, Views: 79,928 views

    Raven/Johnson Biology 8e Chapter 12 1. A true-breeding plant is one that-- a. produces offspring that are different from the parent b. forms hybrid offspring through cross-pollination c. produces offspring that are always the same as the parent d. can only reproduce with itself The …
  • Picture: Math Skills for Business- Full Chapters 1 U1-Full Chapter

    Math Skills for Business- Full Chapters 1 U1-Full Chapter

    File Size: 3,860.88 KB, Pages: 188, Views: 96,700 views

    Math Skills for Business- Full Chapters 1 U1-Full Chapter- Algebra Chapter3 Introduction to Algebra 3.1 What is Algebra? Algebra is generalized arithmetic operations that use letters of the alphabet to represent known or unknown quantities. We can use y to represent a company's profit or …

Leave a Reply

Your email address will not be published. Required fields are marked *