Deep generative appearance modeling in visual tracking

Scope

Predicting object state in video streams is one of the fundamental challenges of computer vision with numerous application domains. Knowing where the object is at a given point in time can help autonomous vehicles avoid obstacles, alert if elderly people fall at home, analyze performance in professional sport, discover the behavior of animals, or help robots actively learn new concepts. These are just a few scenarios where methods that perform visual tracking can be used extensively. Yet, there are numerous open challenges that have to be solved to develop a general visual tracking method capable of handling scenarios, mentioned above. Visual object tracking without prior information about the object is an ill-posed problem, it cannot be solved by an on-line learning method alone for an arbitrary object. Humans, on the other hand, can solve complex tracking scenarios by relying on a massive amount of knowledge about the world accumulated through life-long learning. This knowledge contains info about object categories, their possible deformations, and appearance variations which are crucial for retaining a stable representation of the tracked object. In machine learning terms we can say that this knowledge is contained in a generative model of the object’s appearance. The challenge that we will address in this project is a robust design of such a generative model, training, and application in a visual tracking scenario. We believe that a generative appearance model of the entire object is a crucial step towards grounding visual object tracking in high-level concepts behind raw pixel values.

Workpackages

The work is divided into four work packages:

WP1: development of generative deep neural network models, suitable for appearance modeling of many object classes,
WP2: application of developed models to visual tracking,
WP3: training and testing data acquisition and generation,
WP4: dissemination.