Visual object tracking performance evaluation

Introduction

Visual tracking is one of the rapidly evolving fields of computer vision. Every year, literally dozens of new tracking algorithms are presented and evaluated in journals and at conferences. When considering the evaluation of these new trackers and comparison to the state-of-the-art, several questions arise. Is there a standard set of sequences that we can use for the evaluation? Is there a standardized evaluation protocol? What kind of performance measures should we use? Unfortunately, there are currently no definite answers to these questions. Unlike some other fields of computer vision, like object detection and classification, optical-flow computation, and automatic segmentation, where widely adopted evaluation protocols are used, visual tracking is still largely lacking these features.

Methodology

The problem of visual tracking evaluation is sporting an abundance of performance measures, which are used by various authors, and largely suffers from lack of consensus about which measures should be preferred. This is hampering the cross-paper tracker comparison and faster advancement of the field. In our research we show that several measures are equivalent from the point of information they provide for tracker comparison and, crucially, that some are more brittle than the others. Based on this analysis we narrow down the set of potential measures to only two complementary ones that can be intuitively interpreted and visualized, thus pushing towards homogenization of the tracker evaluation methodology. More details are presented in the paper Visual object tracking performance measures revisited, published in IEEE Transactions on Image processing. This paper also marks the beggining of our effort on promoting good evaluation methodology which evolved into VOT Innitiative. Within the innitiative we have further developed ranking methodology for large-scale visual tracker comparison that takes into account different aspects of tracking as well as statistical significance of performance difference. We made several iterations of the evaluation methodology that evolved with times and usage scenarios.

Visual Object Tracking Challenge (VOT)

The advances in evaluation methodology are promoted by the Visual Object Tracking Challenge that we organize. The results of the first VOT challenge in 2013 were presented at a workshop at the ICCV2013 in Sydney, Australia. In total, nine challenges were organized annually up to 2021. More details about the VOT Challenges can be found on VOT Challenge webpage.

Apparent Motion Patterns (AMP)

This approach uses omnidirectional videos to generate various motion patterns in a controlled manner. More information available here.

A Color and Depth Visual Object Tracking Dataset and Benchmark (CDTB)

A new color-and-depth visual object tracking dataset (CDTB) is recorded by several passive and active RGB-D setups and contains indoor as well as outdoor sequences acquired in direct sunlight. The sequences have been carefully recorded to contain significant object pose change, clutter, occlusion, and periods of long-term target absence to enable tracker evaluation under realistic conditions. Sequences are per-frame annotated with 13 visual attributes for detailed analysis.

Dataset and Experimental Results
The benchmark results and source code of all tested RGB-D trackers are available on the VOT Challenge 2019 webpage. The CDTB dataset can be downloaded automatically using the VOT evaluation toolkit (select vot2019_rgbd stack). More information about the RGB-D tracking benchmark can be found in the paper.

Alan Lukezic, Ugur Kart, Jani Kapyla, Ahmed Durmush, Joni-Kristian Kamarainen, Jiri Matas, Matej Kristan.
CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark.
The IEEE International Conference on Computer Vision (ICCV) 2019

Long-term Visual Object Tracking Performance Evaluation

A long-term visual object tracking performance evaluation methodology and a benchmark are proposed. Performance measures are designed by following a long-term tracking definition to maximize the analysis probing strength. The new measures outperform existing ones in interpretation potential and in better distinguishing between different tracking behaviors. We show that these measures generalize the short-term performance measures, thus linking the two tracking problems. Furthermore, the new measures are highly robust to temporal annotation sparsity and allow annotation of sequences hundreds of times longer than in the current datasets without increasing manual annotation labor. A new challenging dataset of carefully selected sequences with many target disappearances is proposed. A new tracking taxonomy is proposed to position trackers on the short-term/long-term spectrum. The benchmark contains an extensive evaluation of the largest number of long-term trackers and comparison to state-of-the-art short-term trackers. We analyze the influence of tracking architecture implementations to long-term performance and explore various re-detection strategies as well as influence of visual model update strategies to long-term tracking drift. The methodology is integrated in the VOT toolkit to automate experimental analysis and benchmarking and to facilitate future development of long-term trackers.
More information can be found in the paper.

Dataset and Experimental Results The benchmark results and source code of all tested long-term trackers are available on the VOT Challenge webpage (2018 or newer). The LTB50 dataset can be downloaded automatically using the VOT evaluation toolkit (use the votlt20xy stack)

Alan Lukežič, Luka Čehovin Zajc, Tomáš Vojíř, Jiří Matas, and Matej Kristan.
Performance Evaluation Methodology for Long-Term Single Object Tracking.
IEEE Transactions on Cybernetics, 2020