We present a vision system for the 3-D model- based tracking of unconstrained human movement. Using image sequences acquired simultaneously from multiple views, we recover the 3-D body pose at each time instant without the use of markers. The pose- recovery problem is formulated as a search problem and entails finding the pose parameters of a graphical human model whose synthesized appearance is most similar to the actual appearance of the real human in the multi-view images. The models used for this purpose are acquired from the images. We use a decomposition approach and a best-first technique to search through the high dimensional pose parameter space. A robust variant of chamfer matching is used as a fast similarity measure between synthesized and real edge images.

We present initial tracking results from a large new Humans-In-Action (HIA) database containing more than 2500 frames in each of four orthogonal views. They contain subjects involved in a variety of activities, of various degrees of complexity, ranging from the more simple one-person hand waving to the challenging two-person close interaction in the Argentine Tango.

The ability to recognize humans and their activities by vision is a key feature in the pursuit to design a machine capable of interacting intelligently and effortlessly with a human-inhabited environment. Besides this long-term goal, there are many applications possible in the more near term, e.g. in virtual reality, \smart” surveillance systems, motion analysis in sports, choreography of dance and ballet, sign language translation and gesture-driven user interfaces. In many of these applications a non-intrusive sensory method based on vision is preferable over a (in some cases not even feasible) method that relies on markers attached to the bodies of human subjects. Our approach to looking at humans and recognizing their activities has two major components:
1. body pose recovery and tracking
2. recognition of movement patterns

Download pdf 3-D model-based tracking of humans in action: a multi-view approach