Matej_kristan on ViCoS Lab

1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results

Mon, 01 Jan 0001 00:00:00 +0000

The 1st Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi

2nd Workshop on Maritime Computer Vision (MaCVi) 2024: Challenge Results

Mon, 01 Jan 0001 00:00:00 +0000

The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and De- tection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obstacle Segmentation and Detec- tion features three sub-challenges, including a new em- bedded challenge addressing efficicent inference on real- world embedded devices. This report offers a comprehen- sive overview of the findings from the challenges. We pro- vide both statistical and qualitative analyses, evaluating trends from over 195 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi24.

A basic cognitive system for interactive continuous learning of visual concepts

Mon, 01 Jan 0001 00:00:00 +0000

Interactive continuous learning is an important characteristic of a cognitive agent that is supposed to operate and evolve in an everchanging environment. In this paper we present representations and mechanisms that are necessary for continuous learning of visual concepts in dialogue with a tutor. We present an approach for modelling beliefs stemming from multiple modalities and we show how these beliefs are created by processing visual and linguistic information and how they are used for learning. We also present a system that exploits these representations and mechanisms, and demonstrate these principles in the case of learning about object colours and basic shapes in dialogue with the tutor.

A basic cognitive system for interactive learning of simple visual concepts

Mon, 01 Jan 0001 00:00:00 +0000

In this work we present a system and underlying representations and mechanisms for continuous learning of visual concepts in dialogue with a human tutor.

A Bayes-Spectral-Entropy-Based Measure of Camera Focus Using a Discrete Cosine Transform

Mon, 01 Jan 0001 00:00:00 +0000

In this paper we present a novel measure of camera focus based on the Bayes spectral entropy of an image spectrum. In order to estimate the degree of focus, the image is divided into non-overlapping subimages of 8 by 8 pixels. Next, sharpness values are calculated separately for each sub-image and their mean is taken as a measure of the overall focus. The sub-image spectra are obtained by an 8×8 discrete cosine transform (DCT). Comparisons were made against four well-known measures that were chosen as reference, on images captured with a standard visible-light camera and a thermal camera. The proposed measure outperformed the reference measures by exhibiting a wider working range and a smaller failure rate. To assess its robustness to noise, additional tests were conducted with noisy images.

A Detect-and-Verify Paradigm for Low-Shot Counting - DAVE

Mon, 01 Jan 0001 00:00:00 +0000

Low-shot counters estimate the number of objects corresponding to a selected category, based on only few or no exemplars annotated in the image. The current state-of-the-art estimates the total counts as the sum over the object location density map, but does not provide individual object locations and sizes, which are crucial for many applications. This is addressed by detection-based counters, which, however fall behind in the total count accuracy. Furthermore, both approaches tend to overestimate the counts in the presence of other object classes due to many false positives. We propose DAVE, a low-shot counter based on a detect-and-verify paradigm, that avoids the aforementioned issues by first generating a high-recall detection set and then verifying the detections to identify and remove the outliers. This jointly increases the recall and precision, leading to accurate counts. DAVE outperforms the top density-based counters by ~20% in the total count MAE, it outperforms the most recent detection-based counter by ~20% in detection quality and sets a new state-of-the-art in zero-shot as well as text-prompt-based counting.

A Discriminative Single-Shot Segmentation Network for Visual Object Tracking

Mon, 01 Jan 0001 00:00:00 +0000

Template-based discriminative trackers are currently the dominant tracking paradigm due to their robustness, but are restricted to bounding box tracking and a limited range of transformation models, which reduces their localization accuracy. We propose a discriminative single-shot segmentation tracker – D3S2, which narrows the gap between visual object tracking and video object segmentation. A single-shot network applies two target models with complementary geometric properties, one invariant to a broad range of transformations, including non-rigid deformations, the other assuming a rigid object to simultaneously achieve robust online target segmentation. The overall tracking reliability is further increased by decoupling the object and feature scale estimation. Without per-dataset finetuning, and trained only for segmentation as the primary output, D3S2 outperforms all published trackers on the recent short-term tracking benchmark VOT2020 and performs very close to the state-of-the-art trackers on the GOT-10k, TrackingNet, OTB100 and LaSoT. D3S2 outperforms the leading segmentation tracker SiamMask on video object segmentation benchmarks and performs on par with top video object segmentation algorithms.

A Distractor-Aware Memory for Visual Object Tracking with SAM2

Mon, 01 Jan 0001 00:00:00 +0000

Memory-based trackers are video object segmentation methods that form the target model by concatenating recently tracked frames into a memory buffer and localize the target by attending the current image to the buffered frames. While already achieving top performance on many benchmarks, it was the recent release of SAM2 that placed memory-based trackers into focus of the visual object tracking community. Nevertheless, modern trackers still struggle in the presence of distractors. We argue that a more sophisticated memory model is required, and propose a new distractor-aware memory model for SAM2 and an introspection-based update strategy that jointly addresses the segmentation accuracy as well as tracking robustness. The resulting tracker is denoted as SAM2.1++. We also propose a new distractor-distilled DiDi dataset to study the distractor problem better. SAM2.1++ outperforms SAM2.1 and related SAM memory extensions on seven benchmarks and sets a solid new state-of-the-art on six of them. The code and the new dataset will be available on https://github.com/jovanavidenovic/DAM4SAM.

A graphical model for rapid obstacle image-map estimation from unmanned surface vehicles

Mon, 01 Jan 0001 00:00:00 +0000

A hierarchical dynamic model for tracking in sports

Mon, 01 Jan 0001 00:00:00 +0000

Dynamic models play a crucial role in tracking algorithms. In particle filters, for example, proper modelling of the target dynamics can help achieving the desired tracking accuracy using only a small number of particles and thus reducing the computa- tional complexity of the tracker. We propose a novel hierarchical model for tracking players in sports by combining a conservative and a liberal dynamic model to better describe the player’s dynamics. We show how parameters of the model can be estimated from prior knowledge about the players dynamics. The proposed dynamic model was compared to a widely used model and resulted in better performance in terms of estimating position and prediction.

A Local-motion-based probabilistic model for visual tracking

Mon, 01 Jan 0001 00:00:00 +0000

Color-based tracking is prone to failure in situations where visually similar targets are moving in a close proximity or occlude each other. To deal with the ambiguities in the visual information, we propose an additional color-independent visual model based on the target’s local motion. This model is calculated from the optical flow induced by the target in consecutive images. By modifying a color-based particle filter to account for the target’s local motion, the combined color/local-motion-based tracker is constructed. We compare the combined tracker to a purely color-based tracker on a challenging dataset from hand tracking, surveillance and sports. The experiments show that the proposed local-motion model largely resolves situations when the target is occluded by, or moves in front of, a visually similar object.

A Long-Term Discriminative Single Shot Segmentation Tracker

Mon, 01 Jan 0001 00:00:00 +0000

State-of-the-art long-term visual object tracking methods are limited to predict target position as an axis-aligned bounding box. Segmentation-based trackers exist, however they do not address long-term disappearances of the target. We propose a long-term discriminative single shot segmentation tracker – D3SLT, which addresses the above shortcomings. The previously developed short-term D3S tracker is upgraded with a global re-detection module, based on an image-wide discriminative correlation filter response and Gaussian motion model. An online learned confidence estimation module is employed for robust estimation target disappearance. Additional backtracking module enables recovery from tracking failures and further improves tracking performance. D3SLT performs close to the state-of-the-art long-term trackers on the bou-nding box based VOT-LT2021 Challenge, achieving F-score of 0.667, while additionally outputting segmentation masks.

A Low-Shot Object Counting Network With Iterative Prototype Adaptation

Mon, 01 Jan 0001 00:00:00 +0000

We consider low-shot counting of arbitrary semantic categories in the image using only few annotated exemplars (few-shot) or no exemplars (no-shot). The standard few-shot pipeline follows extraction of appearance queries from exemplars and matching them with image features to infer the object counts. Existing methods extract queries by feature pooling, but neglect the shape information (e.g., size and aspect), which leads to a reduced object localization accuracy and count estimates. We propose a Low-shot Object Counting network with iterative prototype Adaptation (LOCA). Our main contribution is the new object prototype extraction module, which iteratively fuses the exemplar shape and appearance queries with image features. The module is easily adapted to zero-shot scenario, enabling LOCA to cover the entire spectrum of low-shot counting problems. LOCA outperforms all recent state-of-the-art methods on FSC147 benchmark by 20-30% in RMSE on one-shot and few-shot and achieves state-of-the-art on zero-shot scenarios, while demonstrating better generalization capabilities.

A New Dataset and a Distractor-Aware Architecture for Transparent Object Tracking

Mon, 01 Jan 0001 00:00:00 +0000

Performance of modern trackers degrades substantially on transparent objects compared to opaque objects. This is largely due to two distinct reasons. Transparent objects are unique in that their appearance is directly affected by the background. Furthermore, transparent object scenes often contain many visually similar objects (distractors), which often lead to tracking failure. However, development of modern tracking architectures requires large training sets, which do not exist in transparent object tracking. We present two contributions addressing the aforementioned issues. We propose the first transparent object tracking training dataset Trans2k that consists of over 2k sequences with 104,343 images overall, annotated by bounding boxes and segmentation masks. Standard trackers trained on this dataset consistently improve by up to 16%. Our second contribution is a new distractor-aware transparent object tracker (DiTra) that treats localization accuracy and target identification as separate tasks and implements them by a novel architecture. DiTra sets a new state-of-the-art in transparent object tracking and generalizes well to opaque objects.

A Novel Performance Evaluation Methodology for Single-Target Trackers

Mon, 01 Jan 0001 00:00:00 +0000

This paper addresses the problem of single-target tracker performance evaluation. We consider the performance measures, the dataset and the evaluation system to be the most important components of tracker evaluation and propose requirements for each of them. The requirements are the basis of a new evaluation methodology that aims at a simple and easily interpretable tracker comparison. The ranking-based methodology addresses tracker equivalence in terms of statistical significance and practical differences. A fully-annotated dataset with per-frame annotations with several visual attributes is introduced. The diversity of its visual properties is maximized in a novel way by clustering a large number of videos according to their visual attributes. This makes it the most sophistically constructed and annotated dataset to date. A multi-platform evaluation system allowing easy integration of third-party trackers is presented as well. The proposed evaluation methodology was tested on the VOT2014 challenge on the new dataset and 38 trackers, making it the largest benchmark to date. Most of the tested trackers are indeed state-of-the-art since they outperform the standard baselines, resulting in a highly-challenging benchmark. An exhaustive analysis of the dataset from the perspective of tracking difficulty is carried out. To facilitate tracker comparison a new performance visualization technique is proposed.

A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation

Mon, 01 Jan 0001 00:00:00 +0000

Low-shot object counters estimate the number of objects in an image using few or no annotated exemplars. Objects are localized by matching them to prototypes, which are constructed by unsupervised image-wide object appearance aggregation. Due to potentially diverse object appearances, the existing approaches often lead to overgeneralization and false positive detections. Furthermore, the best-performing methods train object localization by a surrogate loss, that predicts a unit Gaussian at each object center. This loss is sensitive to annotation error, hyperparameters and does not directly optimize the detection task, leading to suboptimal counts. We introduce GeCo, a novel low-shot counter that achieves accurate object detection, segmentation, and count estimation in a unified architecture. GeCo robustly generalizes the prototypes across objects appearances through a novel dense object query formulation. In addition, a novel counting loss is proposed, that directly optimizes the detection task and avoids the issues of the standard surrogate loss. GeCo surpasses the leading few-shot detection-based counters by 25% in the total count MAE, achieves superior detection accuracy and sets a new solid state-of-the-art result across all low-shot counting setups. The code will be available on GitHub.

A segmentation-based approach for polyp counting in the wild

Mon, 01 Jan 0001 00:00:00 +0000

We address the problem of jellyfish polyp counting in underwater images. Modern methods utilize convolutional neural networks for feature extraction and work in two stages. First, hypothetical regions are proposed at potential locations, the features of the regions are extracted and classified according to the contained object. Such methods typically require a dense grid for region proposals, explicitly test various scales and are prone to failure in densely populated regions. We propose a segmentation-based polyp counter – SegCo. A convolutional neural network is trained to produce locally-circular segmentation masks on the polyps, which are then detected by localizing circularly symmetric areas in the segmented image. Detection stage is effcient and avoids a greedy search over position and scales. SegCo outperforms the current state-of-the-art object detector RetinaNet and the recent specialized polyp detection method PoCo by 2% and 24% in F-score, respectively, and sets a new state-of-the-art in polyp detection.

A system approach to interactive learning of visual concepts

Mon, 01 Jan 0001 00:00:00 +0000

In this work we present a system and underlying mechanisms for continuous learning of visual concepts in dialogue with a human.

A system for interactive learning in dialogue with a tutor

Mon, 01 Jan 0001 00:00:00 +0000

In this paper we present representations and mechanisms that facilitate continuous learning of visual concepts in dialogue with a tutor and show the implemented robot system. We present how beliefs about the world are created by processing visual and linguistic information and show how they are used for planning system behaviour with the aim at satisfying its internal drive – to extend its knowledge. The system facilitates different kinds of learning initiated by the human tutor or by the system itself. We demonstrate these principles in the case of learning about object colours and basic shapes.

A Template-Based Multi-Player Action Recognition of the Basketball Game

Mon, 01 Jan 0001 00:00:00 +0000

In this paper we present a method for fully automatic trajectory based analysis of basketball game in the form of large and small scale modelling of the game. The large-scale game model is obtained by dividing the game into several game phases. Every game phase is then individually modelled using mixture of Gaussian distributions. The Expectation-Maximization algorithm is used to determine the parameters of the Gaussian distributions. On the other hand, the small-scale modelling of the game deals with specific basketball actions which can be defined in the form of action templates that are used by the basketball experts to pass their instructions to the players. For the recognition purposes we define the basic game elements which are the building blocks of the more complex game actions. These elements are then used to semantically describe the observed basketball actions and the templates. To establish if the observed action corresponds to the template, the similarity of descriptions is calculated using Levenstein distance measure. Experiments show that the proposed method could become a powerful tool for the recognition of various basketball actions.

A Trajectory-Based Analysis of Coordinated Team Activity in a Basketball Game

Mon, 01 Jan 0001 00:00:00 +0000

A Two-Stage Dynamic Model for Visual Tracking

Mon, 01 Jan 0001 00:00:00 +0000

We propose a new dynamic model which can be used within blob trackers to track the target’s center of gravity. A strong point of the model is that it is designed to track a variety of motions which are usually encountered in applications such as pedestrian tracking, hand tracking and sports. We call the dynamic model a two-stage dynamic model due to its particular structure, which is a composition of two models: a liberal model and a conservative model. The liberal model allows larger perturbations in the target’s dynamics and is able to account for motions in between the random-walk dynamics and the nearly-constant-velocity dynamics. On the other hand, the conservative model assumes smaller perturbations and is used to further constrain the liberal model to the target’s current dynamics. We implement the two-stage dynamic model in a two-stage probabilistic tracker based on the particle filter and apply it to two separate examples of blob tracking: (i) tracking entire persons and (ii) tracking of a person’s hands. Experiments show that, in comparison to the widely used models, the proposed two-stage dynamic model allows tracking with smaller number of particles in the particle filter (e.g., 25 particles), while achieving smaller errors in the state estimation and a smaller failure rate. The results suggest that the improved performance comes from the model’s ability to actively adapt to the target’s motion during tracking.

A water-obstacle separation and refinement network for unmanned surface vehicles

Mon, 01 Jan 0001 00:00:00 +0000

Obstacle detection by semantic segmentation shows a great promise for autonomous navigation in unmanned surface vehicles (USV). However, existing methods suffer from poor estimation of the water edge in presence of visual ambiguities, poor detection of small obstacles and high false-positive rate on water reflections and wakes. We propose a new deep encoder-decoder architecture, a water-obstacle separation and refinement network (WaSR), to address these issues. Detection and water edge accuracy are improved by a novel decoder that gradually fuses inertial information from IMU with the visual features from the encoder. In addition, a novel loss function is designed to increase the separation between water and obstacle features early on in the network. Subsequently, the capacity of the remaining layers in the decoder is better utilised, leading to a significant reduction in false positives and increased true positives. Experimental results show that WaSR outperforms the current state-of-the-art by a large margin, yielding a 14% increase in F-measure over the second-best method.

A web-service for object detection using hierarchical models

Mon, 01 Jan 0001 00:00:00 +0000

This paper proposes an architecture for an object detection system suitable for a web-service running distributed on a cluster of machines. We build on top of a recently proposed architecture for distributed visual recognition system and extend it with the object detection algorithm. As sliding-window techniques are computationally unsuitable for web-services we rely on models based on state-of-the-art hierarchical compositions for the object detection algorithm. We provide implementation details for running hierarchical models on top of a distributed platform and propose an additional hypothesis verification step to reduce many false-positives that are common in hierarchical models. For a verification we rely on a state-of-the-art descriptor extracted from the hierarchical structure and use a support vector machine for object classification. We evaluate the system on a cluster of 80 workers and show a response time of around 10 seconds at throughput of around 60 requests per minute.

About different active learning approaches for acquiring categorical knowledge

Mon, 01 Jan 0001 00:00:00 +0000

In this paper we address the problem of acquiring categorical knowledge from the active learning perspective. We describe and implement several teacher and learnerdrivenapproaches that require different levels of teacher competencies and consider different types of knowledge for selection of training samples. The experimental results show that the active learning approach outperforms the passive one and that the adaptation of the learning process to the learners knowledge significantly improves the learning performance.

Adding discriminative power to a generative hierarchical compositional model using histograms of compositions

Mon, 01 Jan 0001 00:00:00 +0000

In this paper we identify two types of problems with excessive feature sharing and the lack of discriminative learning in hierarchical compositional models: (a) similar category misclassifications and (b) phantom detections in background objects. We propose to overcome those issues by fully utilizing a discriminative features already present in the generative models of hierarchical compositions. We introduce descriptor called Histogram of Compositions to capture the information important for improving discriminative power and use it with a classifier to learn distinctive features important for successful discrimination. The generative model of hierarchical compositions is combined with the discriminative descriptor by performing hypothesis verification of detections produced by the hierarchical compositional model. We evaluate proposed descriptor on five datasets and show to improve the misclassification rate between similar categories as well as the misclassification rate of phantom detections on backgrounds. Additionally, we compare our approach against a state-of-the-art convolutional neural network and show to outperform it under significant occlusions.

Adding discriminative power to hierarchical compositional models for object class detection

Mon, 01 Jan 0001 00:00:00 +0000

In recent years, hierarchical compositional models have been shown to possess many appealing properties for the object class detection such as coping with potentially large number of object categories. The reason is that they encode categories by hierarchical vocabularies of parts which are shared among the categories. On the downside, the sharing and purely reconstructive nature causes problems when categorizing visually-similar categories and separating them from the background. In this paper we propose a novel approach that preserves the appealing properties of the generative hierarchical models, while at the same time improves their discrimination properties. We achieve this by introducing a network of discriminative nodes on top of the existing generative hierarchy. The discriminative nodes are sparse linear combinations of activated generative parts. We show in the experiments that the discriminative nodes consistently improve a state-of-the-art hierarchical compositional model. Results show that our approach considers only a fraction of all nodes in the vocabulary (less than $10%$) which also makes the system computationally efficient.

An adaptive coupled-layer visual model for robust visual tracking

Mon, 01 Jan 0001 00:00:00 +0000

This paper addresses the problem of tracking objects which undergo rapid and significant appearance changes. We propose a novel coupled-layer visual model that combines the target’s global and local appearance. The local layer in this model is a set of local patches that geometrically constrain the changes in the target’s appearance. This layer probabilistically adapts to the target’s geometric deformation, while its structure is updated by removing and adding the local patches. The addition of the patches is constrained by the global layer that probabilistically models target’s global visual properties such as color, shape and apparent local motion. The global visual properties are updated during tracking using the stable patches from the local layer. By this coupled constraint paradigm between the adaptation of the global and the local layer, we achieve a more robust tracking through significant appearance changes. Indeed, the experimental results on challenging sequences confirm that our tracker outperforms the related state-of-the-art trackers by having smaller failure rate as well as better accuracy.

An Analysis Of Basketball Players' Movements In The Slovenian Basketball League Play-Offs Using The Sagit Tracking System

Mon, 01 Jan 0001 00:00:00 +0000

An integrated system for interactive continuous learning of categorical knowledge

Mon, 01 Jan 0001 00:00:00 +0000

This article presents an integrated robot system capable of interactive learning in dialogue with a human. Such a system needs to have several competencies and must be able to process different types of representations. In this article, we describe a collection of mechanisms that enable integration of heterogeneous competencies in a principled way. Central to our design is the creation of beliefs from visual and linguistic information, and the use of these beliefs for planning system behaviour to satisfy internal drives. The system is able to detect gaps in its knowledge and to plan and execute actions that provide information needed to fill these gaps. We propose a hierarchy of mechanisms which are capable of engaging in different kinds of learning interactions, e.g. those initiated by a tutor or by the system itself. We present the theory these mechanisms are build upon and an instantiation of this theory in the form of an integrated robot system. We demonstrate the operation of the system in the case of learning conceptual models of objects and their visual properties.

An integrated system for interactive continuous learning of categorical knowledge

Mon, 01 Jan 0001 00:00:00 +0000

This article presents an integrated robot system capable of interactive learning in dialogue with a human. Such a system needs to have several competencies and must be able to process dierent types of representations. In this article we describe a collection of mechanisms that enable integration of heterogeneous competencies in a principled way. Central to our design is the creation of beliefs from visual and linguistic information, and the use of these beliefs for planning system behaviour to satisfy internal drives. The system is able to detect gaps in its knowledge and to plan and execute actions that provide information needed to ll these gaps. We propose a hierarchy of mechanisms which are capable of engaging in dierent kinds of learning interactions, e.g. those initiated by a tutor or by the system itself. We present the theory these mechanisms are build upon and an instantiation of this theory in the form of an integrated robot system. We demonstrate the operation of the system in the case of learning conceptual models of objects and their visual properties.

Analysis of multi-agent activity using Petri nets

Mon, 01 Jan 0001 00:00:00 +0000

This paper presents the use of Place/Transition Petri Nets (PNs) for the recognition and evaluation of complex multi-agent activities. The PNs were built automatically from the activity templates that are routinely used by experts to encode domain-specific knowledge. The PNs were built in such a way that they encoded the complex temporal relations between the individual activity actions. We extended the original PN formalism to handle the propagation of evidence using net tokens. The evaluation of the spatial and temporal properties of the actions was carried out using trajectory-based action detectors and probabilistic models of the action durations. The presented approach was evaluated using several examples of real basketball activities. The obtained experimental results suggest that this approach can be used to determine the type of activity that a team has performed as well as the stage at which the activity ended.

Anomalous Sound Detection by Feature-Level Anomaly Simulation

Mon, 01 Jan 0001 00:00:00 +0000

Recently a growing number of works focus on machine defect detection from anomalous audio patterns. The datasets for the machine audio domain are scarce and recent methods that perform well on benchmarks such as DCASE2020 Task 2, rely on auxiliary information such as annotated data from other training classes in the domain to extract information that can be used in deep-learning classification-based anomaly detection approaches. However, in practical scenarios, annotated data from the same domain may not be readily available so annotation-free methods that can learn appropriate audio representations from unannotated data are needed. We propose AudDSR, a simulation-based anomaly detection method that learns to detect anomalies without additional annotated data and instead focuses on a discrete feature space sampling method for an anomaly simulation process. AudDSR outperforms competing methods that do not rely on annotated data on the DCASE2020 anomalous sound detection benchmark and even matches the performance of some methods that utilize additional annotation information.

Application of the HIDRA2 deep-learning model for sea level forecasting along the Estonian coast of the Baltic Sea

Mon, 01 Jan 0001 00:00:00 +0000

Approximating Distributions Through Mixtures of Gaussians

Mon, 01 Jan 0001 00:00:00 +0000

Automatic Evaluation of Organized Basketball Activity

Mon, 01 Jan 0001 00:00:00 +0000

In this article the trajectory-based evaluation of multi-player basketball activity is addressed. The organized basketball activity consists of a set of key elements and their temporal relations. The activity evaluation is performed by analyzing individually each of them and the final reasoning about the activity is achieved using the Bayesian network. The network structure is obtained automatically from the activity template which is a standard tool used by the basketball experts. The experimental results suggest that our approach can successfully evaluate the quality of the observed activity.

Bayes Spectral Entropy-Based Measure of Camera Focus

Mon, 01 Jan 0001 00:00:00 +0000

Beyond standard benchmarks: Parameterizing performance evaluation in visual object tracking

Mon, 01 Jan 0001 00:00:00 +0000

Object-to-camera motion produces a variety of apparent motion patterns that significantly affect performance of short-term visual trackers. Despite being crucial for designing robust trackers, their influence is poorly explored in standard benchmarks due to weakly defined, biased and overlapping attribute annotations. In this paper we propose to go beyond pre-recorded benchmarks with post-hoc annotations by presenting an approach that utilizes omnidirectional videos to generate realistic, consistently annotated, short-term tracking scenarios with exactly parameterized motion patterns. We have created an evaluation system, constructed a fully annotated dataset of omnidirectional videos and generators for typical motion patterns. We provide an in-depth analysis of major tracking paradigms which is complementary to the standard benchmarks and confirms the expressiveness of our evaluation approach.

CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark

Mon, 01 Jan 0001 00:00:00 +0000

A long-term visual object tracking performance evaluation methodology and a benchmark are proposed. Performance measures are designed by following a long-term tracking definition to maximize the analysis probing strength. The new measures outperform existing ones in interpretation potential and in better distinguishing between different tracking behaviors. We show that these measures generalize the short-term performance measures, thus linking the two tracking problems. Furthermore, the new measures are highly robust to temporal annotation sparsity and allow annotation of sequences hundreds of times longer than in the current datasets without increasing manual annotation labor. A new challenging dataset of carefully selected sequences with many target disappearances is proposed. A new tracking taxonomy is proposed to position trackers on the short-term/long-term spectrum. The benchmark contains an extensive evaluation of the largest number of long-term tackers and comparison to state-of-the-art short-term trackers. We analyze the influence of tracking architecture implementations to long-term performance and explore various re-detection strategies as well as influence of visual model update strategies to long-term tracking drift. The methodology is integrated in the VOT toolkit to automate experimental analysis and benchmarking and to facilitate future development of long-term trackers.

Cheating Depth: Enhancing 3D Surface Anomaly Detection via Depth Simulation

Mon, 01 Jan 0001 00:00:00 +0000

RGB-based surface anomaly detection methods have advanced significantly. However, certain surface anomalies remain practically invisible in RGB alone, necessitating the incorporation of 3D information. Existing approaches that employ point-cloud backbones suffer from suboptimal representations and reduced applicability due to slow processing. Re-training RGB backbones, designed for faster dense input processing, on industrial depth datasets is hindered by the limited availability of sufficiently large datasets. We make several contributions to address these challenges. (i) We propose a novel Depth-Aware Discrete Autoencoder (DADA) architecture, that enables learning a general discrete latent space that jointly models RGB and 3D data for 3D surface anomaly detection. (ii) We tackle the lack of diverse industrial depth datasets by introducing a simulation process for learning informative depth features in the depth encoder. (iii) We propose a new surface anomaly detection method 3DSR, which outperforms all existing state-of-the-art on the challenging MVTec3D anomaly detection benchmark, both in terms of accuracy and processing speed. The experimental results validate the effectiveness and efficiency of our approach, highlighting the potential of utilizing depth information for improved surface anomaly detection.

Closed-world tracking of multiple interacting targets for indoor-sports applications

Mon, 01 Jan 0001 00:00:00 +0000

In this paper we present an efficient algorithm for tracking multiple players during indoor sports matches. A sports match can be considered as a semi-controlled environment for which a set of closed-world assumptions regarding the visual as well as the dynamical properties of the players and the court can be derived. These assumptions are then used in the context of particle filtering to arrive at a computationally fast, closed-world, multi-player tracker. The proposed tracker is based on multiple, single-player trackers, which are combined using a closed-world assumption about the interactions among players. With regard to the visual properties, the robustness of the tracker is achieved by deriving a novel sports-domain-specific likelihood function and employing a novel background-elimination scheme. The restrictions on the player’s dynamics are enforced by employing a novel form of local smoothing. This smoothing renders the tracking more robust and reduces the computational complexity of the tracker. We evaluated the proposed closed-world, multi-player tracker on a challenging data set. In comparison with several similar trackers that did not utilize all of the closed-world assumptions, the proposed tracker produced better estimates of position and prediction as well as reducing the number of failures.

Cognitive Systems

Mon, 01 Jan 0001 00:00:00 +0000

Comparing different learning approaches in categorical knowledge acquisition

Mon, 01 Jan 0001 00:00:00 +0000

In this paper we address the problem of acquiring categorical knowledge from the active learning perspective. We describe and implement several teacher and learner-driven approaches that require different levels of teacher competencies and consider different types of knowledge for selection of training samples. The experimental results show that the active learning approach outperforms the passive one and that the adaptation of the learning process to the learner’s knowledge significantly improves the learning performance.

Continuous Learning of Simple Visual Concepts using Incremental Kernel Density Estimation

Mon, 01 Jan 0001 00:00:00 +0000

In this paper we propose a method for continuous learning of simple visual concepts. The method continuously associates words describing observed scenes with automatically extracted visual features. Since in our setting every sample is labelled with multiple concept labels, and there are no negative examples, reconstructive representations of the incoming data are used. The associated features are modelled with kernel density probability distribution estimates, which are built incrementally. The proposed approach is applied to the learning of object properties and spatial relations.

CRITER 1.0: a coarse reconstruction with iterative refinement network for sparse spatio-temporal satellite data

Mon, 01 Jan 0001 00:00:00 +0000

Satellite observations of sea surface temperature (SST) are essential for accurate weather forecasting and climate modeling. However, these data often suffer from incomplete coverage due to cloud obstruction and limited satellite swath width, which requires development of dense reconstruction algorithms. The current state of the art struggles to accurately recover high-frequency variability, particularly in SST gradients in ocean fronts, eddies, and filaments, which are crucial for downstream processing and predictive tasks. To address this challenge, we propose a novel two-stage method CRITER (Coarse Reconstruction with ITerative Refinement Network), which consists of two stages. First, it reconstructs low-frequency SST components utilizing a Vision Transformer-based model, leveraging global spatio-temporal correlations in the available observations. Second, a UNet type of network iteratively refines the estimate by recovering high-frequency details. Extensive analysis on datasets from the Mediterranean, Adriatic, and Atlantic seas demonstrates CRITER’s superior performance over the current state of the art. Specifically, CRITER achieves up to 44 % lower reconstruction errors of the missing values and over 80 % lower reconstruction errors of the observed values compared to the state of the art.

D3S - A Discriminative Single Shot Segmentation Tracker

Mon, 01 Jan 0001 00:00:00 +0000

Template-based discriminative trackers are currently the dominant tracking paradigm due to their robustness, but are restricted to bounding box tracking and a limited range of transformation models, which reduces their localization accuracy. We propose a discriminative single-shot segmentation tracker – D3S, which narrows the gap between visual object tracking and video object segmentation. A single-shot network applies two target models with complementary geometric properties, one invariant to a broad range of transformations, including non-rigid deformations, the other assuming a rigid object to simultaneously achieve high robustness and online target segmentation. Without per-dataset finetuning and trained only for segmentation as the primary output, D3S outperforms all trackers on VOT2016, VOT2018 and GOT-10k benchmarks and performs close to the state-of-the-art trackers on the TrackingNet. D3S outperforms the leading segmentation tracker SiamMask on video object segmentation benchmarks and performs on par with top video object segmentation algorithms, while running an order of magnitude faster, close to real-time.

DAL: A Deep Depth-Aware Long-term Tracker

Mon, 01 Jan 0001 00:00:00 +0000

Dana36: A Multi-Camera Image Dataset for Object Identification in Surveillance Scenarios

Mon, 01 Jan 0001 00:00:00 +0000

We present a novel dataset for evaluation of object matching and recognition methods in surveillance scenarios. Dataset consists of more than 23,000 images, depicting 15 persons and nine vehicles. A ground truth data - the identity of each person or vehicle - is provided, along with the coordinates of the bounding box in the full camera image. The dataset was acquired from 36 stationary camera views using a variety of surveillance cameras with resolutions ranging from standard VGA to three megapixel. 27 cameras observed the persons and vehicles in an outdoor environment, while the remaining nine observed the same persons indoors. The activity of persons was planned in advance, they drive the cars to the parking lot, exit the cars and walk around the building, through the main entrance, and up the stairs, towards the first floor of the building. The intended use of the dataset is performance evaluation of computer vision methods that aim to (re)identify people and objects from many different viewpoints in different environments and under variable conditions. Due to variety of camera locations, vantage points and resolutions, the dataset provides means to adjust the difficulty of the identification task in a controlled and documented manner. An interface for easy use of dataset within Matlab is provided as well, and the data is complemented by baseline results using a basic color histogram-based descriptor. While the cropped images of persons and vehicles represent the primary data in our dataset, we also provide full-frame images and a set of tracklets for each object as a courtesy to the dataset users.

Deep-learning transformer-based sea level modeling ensemble for the Adriatic basin

Mon, 01 Jan 0001 00:00:00 +0000

Storm surges and coastal floods are persistent threats to civil and economic safety in the Northern Adriatic. Meteorologically induced sea level signal is, however, often difficult to forecast deterministically due to the resonant character of the Adriatic basin. A standard solution is therefore resorting to ensembles of numerical ocean models, which are numerically expensive. In recent years, deep-learning-based methods have shown significant potential for numerically cheap alternatives. This is the venue followed in our work. We propose a new deep-learning transformer-based architecture HIDRA-T, a continuation of our recent model HIDRA2 (Rus et al., GMD 2023), which outperformed both state-of-the-art deep-learning network design HIDRA1 and two state-of-the-art numerical ocean models (a NEMO engine and a SCHISM ocean modeling system). HIDRA-T is our latest attempt at sea level forecasting, employing novel transformer-based atmospheric and sea level encoders. Transformers are designed for sequential data, and in HIDRA-T we use self-attention blocks to extract features from the atmospheric data firstly by tokenizing over spatial dimension, then over temporal dimension. HIDRA-T was trained on surface wind and pressure fields from the ECMWF atmospheric ensemble and on Koper tide gauge observations. On an independent and challenging test set, HIDRA-T outperforms all other models, reducing previous best mean absolute forecast error in storm events of HIDRA2 by 2.6 %.

Deformable Parts Correlation Filters for Robust Visual Tracking

Mon, 01 Jan 0001 00:00:00 +0000

Deformable parts models show a great potential in tracking by principally addressing non-rigid object deformations and self occlusions, but according to recent benchmarks, they often lag behind the holistic approaches. The reason is that potentially large number of degrees of freedom have to be estimated for object localization and simplifications of the constellation topology are often assumed to make the inference tractable. We present a new formulation of the constellation model with correlation filters that treats the geometric and visual constraints within a single convex cost function and derive a highly efficient optimization for MAP inference of a fully-connected constellation. We propose a tracker that models the object at two levels of detail. The coarse level corresponds a root correlation filter and a novel color model for approximate object localization, while the mid-level representation is composed of the new deformable constellation of correlation filters that refine the object location. The resulting tracker is rigorously analyzed on a highly challenging OTB, VOT2014 and VOT2015 benchmarks, exhibits a state-of-the-art performance and runs in real-time.

Detekcija, lokalizacija in identifikacija oseb z več kamerami ter mapami značilnic

Mon, 01 Jan 0001 00:00:00 +0000

V clanku je predstavljen sistem za detekcijo, lokalizacijo in identifikacijo oseb v posameznih trenutkih, brez casovnega filtriranja, ki je prisotno v vecini tovrstnih sistemov. Glavni cilj predstavljenega pristopa je odpravljanje katastrofalnih napak, ki onemogocajo popolnoma samodejno obdelavo realisticno dolgih video posnetkov. Sistem temelji na zlivanju (fuziji) vec šibkih znacilnic, zapisanih v obliki map znacilnic, zlivanje pa je izvedeno s pomocjo enega ali vec naucenih razvršcevalnikov.

Dimensionality Reduction for Distributed Vision Systems Using Random Projection

Mon, 01 Jan 0001 00:00:00 +0000

Dimensionality reduction is an important issue in the context of distributed vision systems. Processing of dimensionality reduced data requires far less network resources (e.g., storage space, network bandwidth) than processing of original data. In this paper we explore the performance of the random projection method for distributed smart cameras. In our tests, random projection is compared to principal component analysis in terms of recognition efficiency (i.e., object recognition). The results obtained on the COIL-20 image data set show good performance of the random projection in comparison to the principal component analysis, which requires distribution of a subspace and therefore consumes more resources of the network. This indicates that random projection method can elegantly solve the problem of subspace distribution in embedded and distributed vision systems. Moreover, even without explicit orthogonalization or normalization of random projection transformation subspace, the method achieves good object recognition efficiency.

Discriminative Correlation Filter Tracker with Channel and Spatial Reliability

Mon, 01 Jan 0001 00:00:00 +0000

Short-term tracking is an open and challenging problem for which discriminative correlation filters (DCF) have shown excellent performance. We introduce the channel and spatial reliability concepts to DCF tracking and provide a learning algorithm for its efficient and seamless integration in the filter update and the tracking process. The spatial reliability map adjusts the filter support to the part of the object suitable for tracking. This both allows to enlarge the search region and improves tracking of non-rectangular objects. Reliability scores reflect channel-wise quality of the learned filters and are used as feature weighting coefficients in localization. Experimentally, with only two simple standard feature sets, HoGs and Colornames, the novel CSR-DCF method – DCF with Channel and Spatial Reliability – achieves state-of-the-art results on VOT 2016, VOT 2015 and OTB100. The CSR-DCF runs close to real-time on a CPU.

Discriminative Correlation Filter with Channel and Spatial Reliability

Mon, 01 Jan 0001 00:00:00 +0000

Short-term tracking is an open and challenging problem for which discriminative correlation filters (DCF) have shown excellent performance. We introduce the channel and spatial reliability concepts to DCF tracking and provide a novel learning algorithm for its efficient and seamless integration in the filter update and the tracking process. The spatial reliability map adjusts the filter support to the part of the object suitable for tracking. This both allows to enlarge the search region and improves tracking of non-rectangular objects. Reliability scores reflect channel-wise quality of the learned filters and are used as feature weighting coefficients in localization. Experimentally, with only two simple standard features, HoGs and Colornames, the novel CSR-DCF method – DCF with Channel and Spatial Reliability – achieves state-of-the-art results on VOT 2016, VOT 2015 and OTB100. The CSR-DCF runs in real-time on a CPU.

Domain-specific adaptations for region proposals

Mon, 01 Jan 0001 00:00:00 +0000

In this work we propose a novel approach towards the detection of all traffic sign boards. We propose to employ state-of-the-art region proposals as the first step to reduce the initial search space and provide a way to use a strong classifier for a fine-grade classification. We evaluate multiple region proposals on the domain of traffic sign detection and further propose various domain-specific adaptations to improve their performance. We show that edgeboxes with domain-specific learning and re-scoring based on trained shape information are able to significantly outperform remaining methods on German Traffic Sign Database. Furthermore, we show they achieve higher rate of recall with high-quality regions at the lower number of regions than the remaining methods.

DRAEM -- A discriminatively trained reconstruction embedding for surface anomaly detection

Mon, 01 Jan 0001 00:00:00 +0000

Visual surface anomaly detection aims to detect local image regions that significantly deviate from normal appearance. Recent surface anomaly detection methods rely on generative models to accurately reconstruct the normal areas and to fail on anomalies. These methods are trained only on anomaly-free images, and often require hand-crafted post-processing steps to localize the anomalies, which prohibits optimizing the feature extraction for maximal detection capability. In addition to reconstructive approach, we cast surface anomaly detection primarily as a discriminative problem and propose a discriminatively trained reconstruction anomaly embedding model (DRAEM). The proposed method learns a joint representation of an anomalous image and its anomaly-free reconstruction, while simultaneously learning a decision boundary between normal and anomalous examples. The method enables direct anomaly localization without the need for additional complicated post-processing of the network output and can be trained using simple and general anomaly simulations. On the challenging MVTec anomaly detection dataset, DRAEM outperforms the current state-of-the-art unsupervised methods by a large margin and even delivers detection performance close to the fully-supervised methods on the widely used DAGM surface-defect detection dataset, while substantially outperforming them in localization accuracy.

DSR – A Dual Subspace Re-Projection Network for Surface Anomaly Detection

Mon, 01 Jan 0001 00:00:00 +0000

The state-of-the-art in discriminative unsupervised surface anomaly detection relies on external datasets for synthesizing anomaly-augmented training images. Such approaches are prone to failure on near-in-distribution anomalies since these are difficult to be synthesized realistically due to their similarity to anomaly-free regions. We propose an architecture based on quantized feature space representation with dual decoders, DSR, that avoids the image-level anomaly synthesis requirement. Without making any assumptions about the visual properties of anomalies, DSR generates the anomalies at the feature level by sampling the learned quantized feature space, which allows a controlled generation of near-in-distribution anomalies. DSR achieves state-of-the-art results on the KSDD2 and MVTec anomaly detection datasets. The experiments on the challenging real-world KSDD2 dataset show that DSR significantly outperforms other unsupervised surface anomaly detection methods, improving the previous top-performing methods by 10% AP in anomaly detection and 35% AP in anomaly localization.

Efficient Dimensionality Reduction Using Random Projection

Mon, 01 Jan 0001 00:00:00 +0000

Dimensionality reduction techniques are especially important in the context of embedded vision systems. A promising dimensionality reduction method for a use in such systems is the random projection. In this paper we explore the performance of therandom projection method, which can be easily used in embedded cameras. Random projection is compared to Principal Component Analysis in the terms of recognition efficiency on the COIL-20 image data set. Results show surprisingly good performance of the random projection in comparison to the principal component analysis even without explicit orthogonalization or normalization of transformation subspace. These results support the use of random projection in our hierarchical feature-distribution scheme in visual-sensor networks, where random projection elegantly solves the problem of shared subspace distribution.

Efficient Feature Distribution for Object Matching in Visual-Sensor Networks

Mon, 01 Jan 0001 00:00:00 +0000

In this paper, we propose a framework of hierarchical feature distribution for object matching in a network of visual sensors. In our approach, we hierarchically distribute the information in such a way that each individual node maintains only a small amount of information about the objects seen by the network. Nevertheless, this amount is sufficient to efficiently route queries through the network without any degradation of the matching performance. A set of requirements that have to be fulfilled by the object-matching method to be used in such a framework is defined. We provide examples of mapping four well-known, object-matching methods to a hierarchical feature-distribution scheme. The proposed approach was tested on a standard COIL-100 image database and in a basic surveillance scenario using our own distributed network simulator. The results show that the amount of data transmitted through the network can be significantly reduced in comparison to naive feature-distribution schemes such as flooding.

Efficient spring system optimization for part-based visual tracking

Mon, 01 Jan 0001 00:00:00 +0000

Part-based trackers typically use visual and geometric constraints to find the most optimal positions of the parts in the constellation. Recently, spring systems was successfully applied to model these constraints. In this paper we propose an optimization method developed for multi-dimensional spring systems, which can be integrated in the part-based tracking model. The experimental analysis shows that our optimization method outperforms theconjugated gradient descend optimization in terms of convergence speed, accuracy and numerical stability.

Entropy Based Measure of Camera Focus

Mon, 01 Jan 0001 00:00:00 +0000

A new measure for assessing camera focusing via recorded image is presented in this paper. The proposed measure bases on calculating entropy in image frequency domain, and we call it frequency domain entropy or FDE. First an intuitive explanation of measure is presented, and next tests for some classical properties that such measure should meet are conducted and commented.

eWaSR — An Embedded-Compute-Ready Maritime Obstacle Detection Network

Mon, 01 Jan 0001 00:00:00 +0000

Maritime obstacle detection is critical for safe navigation of autonomous surface vehicles (ASVs). While the accuracy of image-based detection methods has advanced substantially, their computational and memory requirements prohibit deployment on embedded devices. In this paper, we analyze the current best-performing maritime obstacle detection network, WaSR. Based on the analysis, we then propose replacements for the most computationally intensive stages and propose its embedded-compute-ready variant, eWaSR. In particular, the new design follows the most recent advancements of transformer-based lightweight networks. eWaSR achieves comparable detection results to state-of-the-art WaSR with only a 0.52% F1 score performance drop and outperforms other state-of-the-art embedded-ready architectures by over 9.74% in F1 score. On a standard GPU, eWaSR runs 10× faster than the original WaSR (115 FPS vs. 11 FPS). Tests on a real embedded sensor OAK-D show that, while WaSR cannot run due to memory restrictions, eWaSR runs comfortably at 5.5 FPS. This makes eWaSR the first practical embedded-compute-ready maritime obstacle detection network. The source code and trained eWaSR models are publicly available.

Exploring levels of stereo fusion for obstacle detection in marine environment

Mon, 01 Jan 0001 00:00:00 +0000

Fast image-based obstacle detection from unmanned surface vehicles

Mon, 01 Jan 0001 00:00:00 +0000

Obstacle detection plays an important role in unmanned surface vehicles (USV). The USVs operate in highly diverse environments in which an obstacle may be a floating piece of wood, a scuba diver, a pier, or a part of a shoreline, which presents a significant challenge to continuous detection from images taken onboard. This paper addresses the problem of online detection by constrained unsupervised segmentation. To this end, a new graphical model is proposed that affords a fast and continuous obstacle image-map estimation from a single video stream captured onboard a USV. The model accounts for the semantic structure of marine environment as observed from USV by imposing weak structural constraints. A Markov random field framework is adopted and a highly efficient algorithm for simultaneous optimization of model parameters and segmentation mask estimation is derived. Our approach does not require computationally intensive extraction of texture features and comfortably runs in real-time. The algorithm is tested on a new, challenging, dataset for segmentation and obstacle detection in marine environments, which is the largest annotated dataset of its kind. Results on this dataset show that our model outperforms the related approaches, while requiring a fraction of computational effort.

Fast Spatially Regularized Correlation Filter Tracker

Mon, 01 Jan 0001 00:00:00 +0000

Discriminative correlation filters (DCF) have attracted significant attention of the tracking community. Standard formulation of the DCF affords a closed form solution, but is not robust and constrained to learning and detection using a relatively small search region. Spatial regularization was proposed to address learning from larger regions. But this prohibits a closed form solution and leads to an iterative optimization with significant computational load, resulting in slow model learning and tracking. We propose to reformulate the spatially regularized filter cost function such that it offers an efficient optimization. This significantly speeds up the tracker (approximately 14 times) and results in real-time tracking at the same or better accuracy.

Filtering out nondiscriminative keypoints by geometry based keypoint constellations

Mon, 01 Jan 0001 00:00:00 +0000

Keypoint-based object detection typically utilizes the nearest neighbour matching technique in order to mach discriminative and reject nondiscriminative keypoints. A detected keypoint is found to be nondiscriminative if it is similar enough to more than one model keypoint. This strategy does not always prove efficient, especially in cases where objects consist of repeating patterns, such as letters in logotypes, where potentially useful keypoints can get rejected. In this paper we propose a geometry-based approach for filtering out nondiscriminative keypoints. Our approach is not affected by repeating patterns and filters out non discriminative keypoints by means of prelearned geometry constraints. We evaluate our proposed method on a challenging dataset depicting logotypes in real-world environments under strong illumination and viewpoint changes.

Formalization of different learning strategies in a continuous learning framework

Mon, 01 Jan 0001 00:00:00 +0000

While the ability to learn on its own is an important feature of a learning agent, another, equally important feature is ability to interact with its environment and to learn in an interaction with other cognitive agents and humans. In this paper we analyze such interactive learning and define several learning strategies requiring different levels of tutor involvement and robot autonomy. We propose a new formal model for describing the learning strategies. The formalism takes into account different levels and types of communication between the robot and the tutor and different actions that can be undertaken. We also propose appropriate performance measures and show the experimental results of the evaluation of the proposed learning strategies.

FuCoLoT - A Fully-Correlational Long-Term Tracker

Mon, 01 Jan 0001 00:00:00 +0000

A Fully Correlational Long-term Tracker (FuCoLoT) exploits the novel DCF constrained filter learning method to design a detector that is able to re-detect the target in the whole image efficiently. FuCoLoT maintains several correlation filters trained on different time scales that act as the detector components. A novel mechanism based on the correlation response is used for tracking failure estimation. FuCoLoT achieves state-of-the-art results on standard short-term benchmarks and it outperforms the current best-performing tracker on the long-term UAV20L benchmark by over 19%. It has an order of magnitude smaller memory footprint than its best-performing competitors and runs at 15fps in a single CPU thread.

Fusion of non-visual modalities into the probabilistic occupancy map framework for person localization

Mon, 01 Jan 0001 00:00:00 +0000

In this paper we investigate the possibilities for fusion of non-visual sensor modalities into state-of-the-art visionbased framework for person detection and localization, the Probabilistic Occupancy Map (POM), with the aim of improving the frame-by-frame localization results in a realistic (cluttered) indoor environment. We point out the aspects that need to be considered when fusing non-visual sensor information into POM and provide a mathematical model for it. We demonstrate the proposed fusion method on the example of multi-camera and radio-based person localization setup. The performance of both systems is evaluated, showing their strengths and weaknesses. We show that localization results may be significantly improved by fusing the information from the radio-based system into the camera-based POM framework using the proposed model.

Fusion of Non-Visual Modalities Into the Probabilistic Occupancy Map Framework for Person Localization

Mon, 01 Jan 0001 00:00:00 +0000

In the recent years, the problem of person detection and localization has received much attention, with two strong areas of application being surveillance/security and tracking of players in sports. Different solutions based on different sensor modalities have been proposed, and recently sensor fusion has gained prominence as a paradigm for overcoming the limitations of the individual sensor modalities. We investigate the possibilities for fusion of additional, nonvisual, sensor modalities into state-of-the-art vision-based framework for person detection and localization, the Probabilistic Occupancy Map(POM), with the aim of improving the localization results in realistic, cluttered, indoor environment. We point out the aspects that need to be considered when fusing an additional sensor information into POM and provide a possible mathematical model for it. Finally, we experimentally demonstrate the proposed fusion on the example of person localization in a cluttered environment.The performance of a system comprising visual cameras and POM and a radio-based localization system is experimentally evaluated, showing their strengths and weaknesses. We then improve the localization results by fusing the information from the radio-based system into POM using the proposed model. Index Terms—sensor fusion, Probabilistic Occupancy Map, multi-camera, radio, person localization.

Guided Video Object Segmentation by Tracking

Mon, 01 Jan 0001 00:00:00 +0000

The paper presents Guided video object segmentation by tracking (gVOST) method for a human-in-the-loop video object segmentation which significantly reduces the manual annotation effort. The method is designed for an interactive object segmentation in a wide range of videos with a minimal user input. User to iteratively selects and annotates a small set of anchor frames by just a few clicks on the object border. The segmentation then is propagated to intermediate frames. Experiments show that gVOST performs well on diverse and challenging videos used in visual object tracking (VOT2020 dataset) where it achieves an IoU of 73% at only 5% of the user annotated frames. This shortens the annotation time by 98% compared to the brute force approach. gVOST outperforms the state-of-the-art interactive video object segmentation methods on the VOT2020 dataset and performs comparably on a less diverse DAVIS video object segmentation dataset.

HIDRA 1.0: deep-learning-based ensemble sea level forecasting in the northern Adriatic

Mon, 01 Jan 0001 00:00:00 +0000

Interactions between atmospheric forcing, topographic constraints to air and water flow, and resonant character of the basin make sea level modelling in the Adriatic a challenging problem. In this study we present an ensemble deep-neural-network-based sea level forecasting method HIDRA, which outperforms our set-up of the general ocean circulation model ensemble (NEMO v3.6) for all forecast lead times and at a minuscule fraction of the numerical cost (order of 2×10−6). HIDRA exhibits larger bias but lower RMSE than our set-up of NEMO over most of the residual sea level bins. It introduces a trainable atmospheric spatial encoder and employs fusion of atmospheric and sea level features into a self-contained network which enables discriminative feature learning. HIDRA architecture building blocks are experimentally analysed in detail and compared to alternative approaches. Results show the importance of sea level input for forecast lead times below 24 h and the importance of atmospheric input for longer lead times. The best performance is achieved by considering the input as the total sea level, split into disjoint sets of tidal and residual signals. This enables HIDRA to optimize the prediction fidelity with respect to atmospheric forcing while compensating for the errors in the tidal model. HIDRA is trained and analysed on a 10-year (2006–2016) time series of atmospheric surface fields from a single member of ECMWF atmospheric ensemble. In the testing phase, both HIDRA and NEMO ensemble systems are forced by the ECMWF atmospheric ensemble. Their performance is evaluated on a 1-year (2019) hourly time series from a tide gauge in Koper (Slovenia). Spectral and continuous wavelet analysis of the forecasts at the semi-diurnal frequency (12 h)−1 and at the ground-state basin seiche frequency (21.5 h)−1 is performed. The energy at the basin seiche in the HIDRA forecast is close to that observed, while our set-up of NEMO underestimates it. Analyses of the January 2015 and November 2019 storm surges indicate that HIDRA has learned to mimic the timing and amplitude of basin seiches.

HIDRA-D: deep-learning model for dense sea level forecasting using sparse altimetry and tide gauge data

Mon, 01 Jan 0001 00:00:00 +0000

This paper introduces HIDRA-D, a novel deep-learning model for basin scale dense (gridded) sea level prediction using sparse satellite altimetry and in situ tide gauge data. Accurate sea level prediction is crucial for coastal risk management, marine operations, and sustainable development. While traditional numerical ocean models are computationally expensive, especially for probabilistic forecasts over many ensemble members, HIDRA-D offers a faster, numerically cheaper, observation-driven alternative. Unlike previous HIDRA models (HIDRA1, HIDRA2 and HIDRA3) that focused on point predictions at tide gauges, HIDRA-D provides dense, two-dimensional, gridded sea level forecasts. The core innovation lies in a new algorithm that effectively leverages sparse and unevenly distributed satellite altimetry data in combination with tide gauge observations, to learn the complex basin-scale dynamics of sea level. HIDRA-D achieves this by integrating a HIDRA3 module for point predictions at tide gauges with a novel Dense decoder module, which generates low-frequency spatial components of the sea level field in the Fourier domain, whose Fourier inverse is an hourly sea level forecast over a 3 d horizon. When comparing 3 d forecasts against satellite absolute dynamic topography (ADT) data in the Adriatic, HIDRA-D achieves a 28.0 % reduction in mean absolute error relative to the NEMO general circulation model. However, while HIDRA-D performs well in open waters, leave-one-out cross-validation at tide gauges indicates limitations in areas with complex bathymetry, such as the Neretva estuary located in a narrow bay, and in regions with sparse satellite ADT data, like the northern Adriatic. Importantly, the model shows robustness to spatially-limited tide gauge coverage, maintaining acceptable performance even when trained using data from distant stations. This suggests its potential for broader applicability in areas with limited in situ observations.

HIDRA-T – A Transformer-Based Sea Level Forecasting Method

Mon, 01 Jan 0001 00:00:00 +0000

Sea surface height forecasting is critical for timely prediction of coastal flooding and mitigation of is impact on coastal comminities. Traditional numerical ocean models are limited in terms of computational cost and accuracy, while deep learning models have shown promising results in this area. However, there is still a need for more accurate and efficient deep learning architectures for sea level and storm surge modeling. In this context, we propose a new deep-learning architecture HIDRA-T for sea level and storm tide modeling, which is based on transformers and outperforms both state-of-the-art deep-learning network designs HIDRA1 and HIDRA2 and two state-of-the-art numerical ocean models (a NEMO engine with sea level data assimilation and a SCHISM ocean modeling system), over all sea level bins and all forecast lead times. Compared to its predecessor HIDRA2, HIDRA-T employs novel transformer-based atmospheric and sea level encoders, as well as a novel feature fusion and regression block. HIDRA-T was trained on surface wind and pressure fields from ECMWF atmospheric ensemble and on Koper tide gauge observations. Compared to other models, a consistent superior performance over all other models is observed in the extreme tail of the sea level distribution.

HIDRA2: deep-learning ensemble sea level and storm tide forecasting in the presence of seiches – the case of the northern Adriatic

Mon, 01 Jan 0001 00:00:00 +0000

We propose a new deep-learning architecture HIDRA2 for sea level and storm tide modeling, which is extremely fast to train and apply and outperforms both our previous network design HIDRA1 and two state-of-the-art numerical ocean models (a NEMO engine with sea level data assimilation and a SCHISM ocean modeling system), over all sea level bins and all forecast lead times. The architecture of HIDRA2 employs novel atmospheric, tidal and sea surface height (SSH) feature encoders as well as a novel feature fusion and SSH regression block. HIDRA2 was trained on surface wind and pressure fields from a single member of the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric ensemble and on Koper tide gauge observations. An extensive ablation study was performed to estimate the individual importance of input encoders and data streams. Compared to HIDRA1, the overall mean absolute forecast error is reduced by 13 %, while in storm events it is lower by an even larger margin of 25 %. Consistent superior performance over HIDRA1 as well as over general circulation models is observed in both tails of the sea level distribution: low tail forecasting is relevant for marine traffic scheduling to ports of the northern Adriatic, while high tail accuracy helps coastal flood response. To assign model errors to specific frequency bands covering diurnal and semi-diurnal tides and the two lowest basin seiches, spectral decomposition of sea levels during several historic storms is performed. HIDRA2 accurately predicts amplitudes and temporal phases of the Adriatic basin seiches, which is an important forecasting benefit due to the high sensitivity of the Adriatic storm tide level to the temporal lag between peak tide and peak seiche.

HIDRA2: deep-learning ensemble sea level and storm tide forecasting in the presence of seiches – the case of the northern Adriatic

Mon, 01 Jan 0001 00:00:00 +0000

We propose a new deep-learning architecture HIDRA2 for sea level and storm tide modeling, which is extremely fast to train and apply and outperforms both our previous network design HIDRA1 and two state-of-the-art numerical ocean models (a NEMO engine with sea level data assimilation and a SCHISM ocean modeling system), over all sea level bins and all forecast lead times. The architecture of HIDRA2 employs novel atmospheric, tidal and sea surface height (SSH) feature encoders as well as a novel feature fusion and SSH regression block. HIDRA2 was trained on surface wind and pressure fields from a single member of the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric ensemble and on Koper tide gauge observations. An extensive ablation study was performed to estimate the individual importance of input encoders and data streams. Compared to HIDRA1, the overall mean absolute forecast error is reduced by 13 %, while in storm events it is lower by an even larger margin of 25 %. Consistent superior performance over HIDRA1 as well as over general circulation models is observed in both tails of the sea level distribution: low tail forecasting is relevant for marine traffic scheduling to ports of the northern Adriatic, while high tail accuracy helps coastal flood response. Power spectrum analysis indicates that HIDRA2 most accurately represents the energy density peak centered on the ground state sea surface eigenmode (seiche) and comes a close second to SCHISM in the energy band of the first excited eigenmode. To assign model errors to specific frequency bands covering diurnal and semi-diurnal tides and the two lowest basin seiches, spectral decomposition of sea levels during several historic storms is performed. HIDRA2 accurately predicts amplitudes and temporal phases of the Adriatic basin seiches, which is an important forecasting benefit due to the high sensitivity of the Adriatic storm tide level to the temporal lag between peak tide and peak seiche.

HIDRA3: a deep-learning model for multipoint ensemble sea level forecasting in the presence of tide gauge sensor failures

Mon, 01 Jan 0001 00:00:00 +0000

Accurate modeling of sea level and storm surge dynamics with several days of temporal horizons is essential for effective coastal flood responses and the protection of coastal communities and economies. The classical approach to this challenge involves computationally intensive ocean models that typically calculate sea levels relative to the geoid, which must then be correlated with local tide gauge observations of sea surface height (SSH). A recently proposed deep-learning model, HIDRA2 (HIgh-performance Deep tidal Residual estimation method using Atmospheric data, version 2), avoids numerical simulations while delivering competitive forecasts. Its forecast accuracy depends on the availability of a sufficiently long history of recorded SSH observations used in training. This makes HIDRA2 less reliable for locations with less abundant SSH training data. Furthermore, since the inference requires immediate past SSH measurements as input, forecasts cannot be made during temporary tide gauge failures. We address the aforementioned issues using a new architecture, HIDRA3, that considers observations from multiple locations, shares the geophysical encoder across the locations, and constructs a joint latent state that is decoded into forecasts at individual locations. The new architecture brings several benefits: (i) it improves training at locations with scarce historical SSH data, (ii) it enables predictions even at locations with sensor failures, and (iii) it reliably estimates prediction uncertainties. HIDRA3 is evaluated by jointly training on 11 tide gauge locations along the Adriatic. Results show that HIDRA3 outperforms HIDRA2 and the Mediterranean basin Nucleus for European Modelling of the Ocean (NEMO) setup of the Copernicus Marine Environment Monitoring Service (CMEMS) by ∼ 15 % and ∼ 13 % mean absolute error (MAE) reductions at high SSH values, creating a solid new state of the art. The forecasting skill does not deteriorate even in the case of simultaneous failure of multiple sensors in the basin or when predicting solely from the tide gauges far outside the Rossby radius of a failed sensor. Furthermore, HIDRA3 shows remarkable performance with substantially smaller amounts of training data compared with HIDRA2, making it appropriate for sea level forecasting in basins with high regional variability in the available tide gauge data.

HIDRA3: A Robust Deep-Learning Model for Multi-Point Sea-Surface Height and Storm Surges Forecasting

Mon, 01 Jan 0001 00:00:00 +0000

Accurate forecasting of storm surges and extreme sea levels is crucial for mitigating coastal flooding and safeguarding communities. While recent advancements have seen machine learning models surpass state-of-the-art physics-based numerical models in sea surface height (SSH) prediction, challenges persist, particularly in areas with limited SSH measurement history and instances of sensor failures. In this study, we developed HIDRA3, a novel deep-learning approach designed to address these challenges by jointly predicting SSH at multiple locations, allowing the training even in the presence of data scarcity and enabling predictions at locations with sensor failures. Compared to the state-of-the-art model HIDRA2 and the numerical model NEMO, HIDRA3 demonstrates notable improvements, achieving, on average, 5.0% lower Mean Absolute Error (MAE) and 11.3% lower MAE on extreme sea surface heights.

HIDRA3: A Robust Deep-Learning Model for Multi-Point Sea-Surface Height Forecasting

Mon, 01 Jan 0001 00:00:00 +0000

Accurate sea surface height (SSH) forecasting is crucial for predicting coastal flooding and protecting communities. Recently, state-of-the-art physics-based numerical models have been outperformed by machine learning models, which rely on atmospheric forecasts and the immediate past measurements obtained from the prediction location. The reliance on past measurements brings several drawbacks. While the atmospheric training data is abundantly available, some locations have only a short history of SSH measurement, which limits the training quality. Furthermore, predictions cannot be made in cases of sensor failure even at locations with abundant past training data. To address these issues, we introduce a new deep learning method HIDRA3, that jointly predicts SSH at multiple locations. This allows improved training even in the presence of data scarcity at some locations and enables making predictions at locations with failed sensors. HIDRA3 surpasses the state-of-the-art model HIDRA2 and the numerical model NEMO, on average obtaining a 5.0% lower Mean Absolute Error (MAE) and an 11.3% lower MAE on extreme sea surface heights.

Hierarchical Feature Encoding for Object Recognition in Visual Sensor Networks

Mon, 01 Jan 0001 00:00:00 +0000

Hierarchical Spatial Model for 2D Range Data Based Room Categorization

Mon, 01 Jan 0001 00:00:00 +0000

The next generation service robots are expected to co-exist with humans in their homes. Such a mobile robot requires an efficient representation of space, which should be compact and expressive, for effective operation in real-world environments. In this paper we present a novel approach for 2D ground-plan-like laser-range-data-based room categorization that builds on a compositional hierarchical representation of space, and show how an additional abstraction layer, whose parts are formed by merging partial views of the environment followed by graph extraction, can achieve improved categorization performance. A new algorithm is presented that finds a dictionary of exemplar elements from a multi-category set, based on the affinity measure defined among pairs of elements. This algorithm is used for part selection in new layer construction. Room categorization experiments have been performed on a challenging publicly available dataset, which has been extended in this work. State-of-the-art results were obtained by achieving the most balanced performance over all categories.

Histogram of oriented gradients and region covariance descriptor in hierarchical feature-distribution scheme

Mon, 01 Jan 0001 00:00:00 +0000

Hierarchical feature-distribution scheme is a recently proposed framework for distribution of features in visual-sensor networks. It is intended for tasks, where one needs to establish a correspondence between two objects, seen by different cameras at different occasions. In visual-sensor networks, such pair of cameras may be very distant in network terms. Therefore, the hierarchical scheme results in significant reduction of network traffic, compared to naive approaches, which rely on flooding. In this paper we explore the performance of two state-of-the-art feature descriptors (histogram of oriented gradients and region covariance descriptor) in such featuredistribution scheme. Both methods are compared inthe terms of network load on the COIL-100 data set. Results show that even state-of-the-art feature descriptors benefit from hierarchical feature-distribution scheme.

Histograms of optical flow for efficient representation of body motion

Mon, 01 Jan 0001 00:00:00 +0000

Hypothesis verification with histogram of compositions improves object detection of hierarchical models

Mon, 01 Jan 0001 00:00:00 +0000

This paper focuses on applying and evaluating the additional hypothesis verification step for the detections of learnthierarchy-of-parts (LHOP) method. The applied method reduces the problem of false positives that are a common problem of hierarchical methods specifically in highly textured or cluttered images. We use a Histogram of Compositions (HoC) with a Support Vector Machine in hypothesis verification step. Using HoC descriptor ensures that the additional computation cost is as minimal as possible since HoC descriptor shares the LHOP tree structure. We evaluate the method on the ETHZ Shape Classes dataset and show that our method outperforms the original baseline LHOP method by around 5 percent.

Implementacija CONDENSATION Algoritma v domeni zaprtega sveta

Mon, 01 Jan 0001 00:00:00 +0000

People tracking in general is a challenging task and over the last two decades various computer vision algorithms dealing with this problem were proposed. Given a highly unpredictable nature of human motion stochastic based approaches such as CONDENSATION introduced by M. Issard and A. Blake in 1998, gained a lot of popularity among researchers in this field. In this paper we present an implementation of CONDENSATION algorithm for tracking people in sports. Since sport games usually take place in semicontrolled environments a closed world assumption, introduced by S.S. Intille and A.. Bobick in 1995 has been adopted. We present an architecture of such condensation based tracking algorithm within a closed world domain and show some results.

Improvements of the Adriatic Deep-Learning Sea Level Modeling Network HIDRA

Mon, 01 Jan 0001 00:00:00 +0000

Improving vision-based obstacle detection on USV using inertial sensor

Mon, 01 Jan 0001 00:00:00 +0000

We present a new semantic segmentation algorithm for obstacle detection in unmanned surface vehicles. The novelty lies in the graphical model that incorporates boat tilt measurements from the on-board inertial measurement unit (IMU). The IMU readings are used to estimate the location of horizon line in the image, and automatically adjusts the priors in the probabilistic semantic segmentation algorithm. We derive the necessary horizon projection equations, an efficient optimization algorithm for the proposed graphical model, and a practical IMU-camera-USV calibration. A new challenging dataset, which is the largest multi-sensor dataset of its kind, is constructed. Results show that the proposed algorithm significantly outperforms state of the art, with 32% improvement in water-edge detection accuracy, an over 15 % reduction of false positive rate, an over 70 % reduction of false negative rate, and an over 55 % increase of true positive rate, while running in real-time on a single core in Matlab.

Increased complexity of low-level structures improves histograms of compositions

Mon, 01 Jan 0001 00:00:00 +0000

While low-level visual features, such as histogram of oriented gradients (HOG), have been successfully used for object detection and categorization, we have been able to improve upon their performance by introducing histogram of compositions (HoC) in our previous work. In this paper we propose an extended version of HoC descriptor that uses additional layers from hierarchical model. We experimentally show that extended HoC surpasses the performance of the original descriptor by approximately 5% as additional layer provides higher complexity of compositions. Furthermore, with additional layer we show to produce competitive results to original HoC descriptor combined with HOG and can even further increase performance by adding HOG on top of HoC with additional layer.

Incremental learning with Gaussian mixture models

Mon, 01 Jan 0001 00:00:00 +0000

In this paper we propose a new incremental estimation of Gaussian mixture models which can be used for applications of online learning. Our approach allows for adding new samples incrementally as well as removing parts of the mixture by the process of unlearning. Low complexity of the mixtures is maintained through a novel compression algorithm. In contrast to the existing approaches, our approach does not require fine-tuning parameters for a specific application, we do not assume specific forms of the target distributions and temporal constraints are not assumed on the observed data. The strength of the proposed approach is demonstrated with an example of online estimation of a complex distribution, an example of unlearning, and with an interactive learning of basic visual concepts.

Interaktiven sistem za kontinuirano učenje vizualnih konceptov

Mon, 01 Jan 0001 00:00:00 +0000

We present an artifficial cognitive system for learning visual concepts. It comprises of vision, communication and manipulation subsystems, which provide visual input, enable verbal and non-verbal communication with a tutor and allow interaction with a given scene. The main goal is to learn associations between automatically extracted visual features and words that describe the scene in an open-ended, continuous manner. In particular, we address the problem of cross-modal learning of visual properties and spatial relations and analyse several learning modes requiring different levels of tutor supervision.

Is my new tracker really better than yours?

Mon, 01 Jan 0001 00:00:00 +0000

The problem of visual tracking evaluation is sporting an abundance of performance measures, which are used by various authors, and largely suffers from lack of consensus about which measures should be preferred. This is hampering the cross-paper tracker comparison and faster advancement of the field. In this paper we provide a critical analysis of the popular measures and evaluate them experimentally by a large-scale tracking experiment. We also analyze various visualizations of the performance measures. We show that several measures are equivalent from the point of information they provide for tracker comparison and, crucially, that some are more brittle than the others. Based on our analysis we narrow down the specter of measures to only a few complementary ones, thus pushing towards homogenization of the tracker evaluation methodology.

Izvedba algoritma računalniškega vida na omrežni kameri

Mon, 01 Jan 0001 00:00:00 +0000

Keep DRÆMing: Discriminative 3D anomaly detection through anomaly simulation

Mon, 01 Jan 0001 00:00:00 +0000

Recent surface anomaly detection methods rely on pretrained backbone networks for efficient anomaly detection. On standard RGB anomaly detection benchmarks these methods achieve excellent results but fail on 3D anomaly detection due to a lack of pretrained backbones that suit this domain. Additionally, there is a lack of industrial depth data that would enable the backbone network training that could be used in 3D anomaly detection models. Discriminative anomaly detection methods do not require pretrained networks and are trained using simulated anomalies. The process of simulating anomalies that fit the domain of industrial depth data is not trivial and is necessary for training discriminative methods. We propose a novel 3D anomaly simulation process that follows the natural characteristics of industrial depth data and generates diverse deformations, making it suitable for training discriminative anomaly detection methods. We demonstrate its effectiveness by adapting the DRÆM method to work on 3D anomaly detection, thus obtaining 3DRÆM, a strong discriminative 3D anomaly detection model. The proposed approach achieves excellent results on the MVTec3D anomaly detection benchmark where it achieves state-of-the-art results on both 3D and RGB+3D problem setups, significantly outperforming competing methods.

Knowledge gap detection for interactive learning of categorical knowledge

Mon, 01 Jan 0001 00:00:00 +0000

In interactive machine learning the process of labeling training instances and introducing them to the learner may be expensive in terms of human effort and time. In this paper we present different strategies for detecting gaps in the learner’s knowledge and communicating these gaps to the teacher. These strategies are considered from the viewpoint of extrospective and introspective behavior of the learner – this new perspective is also the main contribution of our paper. The experimental results indicate that the analyzed strategies are successful in reducing the number of training instances required to reach the needed recognition rate. Such a facilitation may be an important step towards the broader use of interactive autonomous systems.

LaRS: A Diverse Panoptic Maritime Obstacle Detection Dataset and Benchmark

Mon, 01 Jan 0001 00:00:00 +0000

The progress in maritime obstacle detection is hindered by the lack of a diverse dataset that adequately captures the complexity of general maritime environments. We present the first maritime panoptic obstacle detection benchmark LaRS, featuring scenes from Lakes, Rivers and Seas. Our major contribution is the new dataset, which boasts the largest diversity in recording locations, scene types, obstacle classes, and acquisition conditions among the related datasets. LaRS is composed of over 4000 per-pixel labeled key frames with nine preceding frames to allow utilization of the temporal texture, amounting to over 40k frames. Each key frame is annotated with 8 thing, 3 stuff classes and 19 global scene attributes. We report the results of 27 semantic and panoptic segmentation methods, along with several performance insights and future research directions. To enable objective evaluation, we have implemented an online evaluation server. The LaRS dataset, evaluation toolkit and benchmark are publicly available at: https://lojzezust.github.io/lars-dataset

Learning Maritime Obstacle Detection from Weak Annotations by Scaffolding

Mon, 01 Jan 0001 00:00:00 +0000

Coastal water autonomous boats rely on robust perception methods for obstacle detection and timely collision avoidance. The current state-of-the-art is based on deep segmentation networks trained on large datasets. Per-pixel ground truth labeling of such datasets, however, is labor-intensive and expensive. We observe that far less information is required for practical obstacle avoidance – the location of water edge on static obstacles like shore and approximate location and bounds of dynamic obstacles in the water is sufficient to plan a reaction. We propose a new scaffolding learning regime (SLR) that allows training obstacle detection segmentation networks only from such weak annotations, thus significantly reducing the cost of ground-truth labeling. Experiments show that maritime obstacle segmentation networks trained using SLR substantially outperform the same networks trained with dense ground truth labels. Thus accuracy is not sacrificed for labelling simplicity but is in fact improved, which is a remarkable result.

Learning part-based spatial models for laser-vision-based room categorization

Mon, 01 Jan 0001 00:00:00 +0000

Room categorization, i.e., recognizing the functionality of a never before seen room, is a crucial capability for a household mobile robot. We present a new approach for room categorization that is based on 2D laser range data. The method is based on a novel spatial model consisting of mid-level parts that are built on top of a low-level part-based representation. The approach is then fused with a vision-based method for room categorization, which is also based on a spatial model consisting of mid-level visual-parts. In addition, we propose a new discriminative dictionary learning technique that is applied for part-dictionary selection in both laser-based and vision-based modalities. Finally, we present a comparative analysis between laser-based, vision-based, and laser-vision-fusion-based approaches in a uniform part-based framework that is evaluated on a large dataset with several categories of rooms from the domestic environments.

Learning statistically relevant edge structure improves low-level visual descriptors

Mon, 01 Jan 0001 00:00:00 +0000

Over the recent years, low-level visual descriptors, among which the most popular is the histogram of oriented gradients (HOG), have shown excellent performance in object detection and categorization. We form a hypothesis that the low-level image descriptors can be improved by learning the statistically relevant edge structures from natural images. We validate this hypothesis by introducing a new descriptor called the histogram of compositions (HoC). HoC exploits a learnt vocabulary of parts from a state-of-the-art hierarchical compositional model. Furthermore, we show that HoC is a complementary descriptor to HOG. We experimentally compare our descriptor to the popular HOG descriptor on the task of object categorization. We have observed approximately 4% improved categorization performance of HoC over HOG at lower dimensionality of the descriptor. Furthermore, in comparison to HOG, we show a categorization improvement of approximately 11% when combining HOG with the proposed HoC.

Learning with Weak Annotations for Robust Maritime Obstacle Detection

Mon, 01 Jan 0001 00:00:00 +0000

Robust maritime obstacle detection is critical for safe navigation of autonomous boats and timely collision avoidance. The current state-of-the-art is based on deep segmentation networks trained on large datasets. However, per-pixel ground truth labeling of such datasets is labor-intensive and expensive. We propose a new scaffolding learning regime (SLR) that leverages weak annotations consisting of water edges, the horizon location, and obstacle bounding boxes to train segmentation-based obstacle detection networks, thereby reducing the required ground truth labeling effort by a factor of twenty. SLR trains an initial model from weak annotations and then alternates between re-estimating the segmentation pseudo-labels and improving the network parameters. Experiments show that maritime obstacle segmentation networks trained using SLR on weak annotations not only match but outperform the same networks trained with dense ground truth labels, which is a remarkable result. In addition to the increased accuracy, SLR also increases domain generalization and can be used for domain adaptation with a low manual annotation load. The SLR code and pre-trained models are freely available online.

Mitigating Objectness Bias and Region-to-Text Misalignment for Open-Vocabulary Panoptic Segmentation

Mon, 01 Jan 0001 00:00:00 +0000

Open-vocabulary panoptic segmentation remains hindered by two coupled issues: (i) mask selection bias, where objectness heads trained on closed vocabularies suppress masks of categories not observed in training, and (ii) limited regional understanding in vision-language models such as CLIP, which were optimized for global image classification rather than localized segmentation. We introduce OVRCOAT, a simple, modular framework that tackles both. First, a CLIP-conditioned objectness adjustment (COAT) updates background/foreground probabilities, preserving high-quality masks for out-of-vocabulary objects. Second, an open-vocabulary mask-to-text refinement (OVR) strengthens CLIP’s region-level alignment to improve classification of both seen and unseen classes with markedly lower memory cost than prior fine-tuning schemes. The two components combine to jointly improve objectness estimation and mask recognition, yielding consistent panoptic gains. Despite its simplicity, OVRCOAT sets a new state of the art on ADE20K (+5.5% PQ) and delivers clear gains on Mapillary Vistas and Cityscapes (+7.1% and +3% PQ, respectively). The code is available at: this URL.

Mobile Robots : New Research

Mon, 01 Jan 0001 00:00:00 +0000

In this paper a global vision scheme for estimation of positions and orientations of mobile robots is presented. It is applied to robot soccer application which is a fast dynamic game and therefore needs an efficient and robust vision system implemented. General applicability of the vision system can be found in other robot applications such as mobile transport robots in production, warehouses, attendant robots, fast vision tracking of targets of interest and entertainment robotics. Basic operation of the vision system is divided into two steps. In the first, the incoming image is scanned and pixels are classified into a finite number of classes. At the same time, a segmentation algorithm is used to find corresponding regions belonging to one of the classes. In the second step, all the regions are examined. Selection of the ones that are a part of the observed object is made by means of simple logic procedures. The novelty is focused on optimization of the processing time needed to finish the estimation of possible object positions. Better results of the vision system are achieved by implementing camera calibration and shading correction algorithm. The former corrects camera lens distortion, while the latter increases robustness to irregular illumination conditions.

MODS--A USV-Oriented Object Detection and Obstacle Segmentation Benchmark

Mon, 01 Jan 0001 00:00:00 +0000

Small-sized unmanned surface vehicles (USV) are coastal water devices with a broad range of applications such as environmental control and surveillance. A crucial capability for autonomous operation is obstacle detection for timely reaction and collision avoidance, which has been recently explored in the context of camera-based visual scene interpretation. Owing to curated datasets, substantial advances in scene interpretation have been made in a related field of unmanned ground vehicles. However, the current maritime datasets do not adequately capture the complexity of real-world USV scenes and the evaluation protocols are not standardised, which makes cross-paper comparison of different methods difficult and hinders the progress. To address these issues, we introduce a new obstacle detection benchmark MODS, which considers two major perception tasks: maritime object detection and the more general maritime obstacle segmentation. We present a new diverse maritime evaluation dataset containing approximately 81k stereo images synchronized with an on-board IMU, with over 60k objects annotated. We propose a new obstacle segmentation performance evaluation protocol that reflects the detection accuracy in a way meaningful for practical USV navigation. Nineteen recent state-of-the-art object detection and obstacle segmentation methods are evaluated using the proposed protocol, creating a benchmark to facilitate development of the field. The proposed dataset, as well as evaluation routines, are made publicly available at vicos.si/resources.

Multi-camera and radio fusion for person localization in a cluttered environment

Mon, 01 Jan 0001 00:00:00 +0000

We investigate the problem of person localization in a cluttered environment. We evaluate the performance of an Ultra-Wideband radio localization system and a multi-camera system based on the Probabilistic Occupancy Map algorithm. After demonstrating the strengths and weaknesses of both systems, we improve the localization results by fusing both the radio and the visual information within the Probabilistic Occupancy Map framework. This is done by treating the radio modality as an additional independent sensory input that contributes to a given cell’s occupancy likelihood.

Multi-camera and radio fusion for person localization in a cluttered environment

Mon, 01 Jan 0001 00:00:00 +0000

Multi-modal tracking by identification

Mon, 01 Jan 0001 00:00:00 +0000

In this paper, we demonstrate, by performing quantitative evaluation, the benefit of tracking by identification over state-of-the-art identification by tracking. We evaluate four localization and tracking systems: a commercial localization system based on radio technology, a state-ofthe- art computer-vision algorithm that uses multiple calibrated cameras to perform identification by tracking, and two multi-modal tracking-by-identification systems that have been developed in our laboratory. We briefly describe all four systems and evaluation metric, and present evaluation on a challenging indoor dataset.

Multiple interacting targets tracking with application to team sports

Mon, 01 Jan 0001 00:00:00 +0000

The interest in the field of computer aided analysis of sport events is ever growing and the ability of tracking objects during a sport event has become an elementary task for nearly every sport analysis system. We present in this paper a color based probabilistic tracker that is suitable for tracking players on the playground during a sport game. Since the players are being tracked in their natural environment, and this environment is subjected to certain rules of the game, we use the concept of closed worlds, to model the scene context and thus improve the reliability of tracking.

Multivariate Online Kernel Density Estimation

Mon, 01 Jan 0001 00:00:00 +0000

We propose an approach for online kernel density estimation (KDE) which enables building probability density functions from data by observing only a single data-point at a time. The method maintains a non-parametric model of the data itself and uses this model to calculate the corresponding KDE. We propose an new automatic bandwidth selection rule, which can be computed directly from the non-parametric model of the data. Low complexity of the model is maintained through a novel compression and refinement scheme. We compare the online KDE to some state-of-the-art batch KDEs on examples of estimating distributions and on an example of classification. The results show that the online KDE generally achieves comparable performance to the batch approaches, while producing models with lower complexity and allowing online updating using only a single observation at a time.

Multivariate Online Kernel Density Estimation with Gaussian Kernels

Mon, 01 Jan 0001 00:00:00 +0000

We propose a novel approach to online estimation of probability density functions, which is based on kernel density estimation (KDE). The method maintains and updates a non-parametric model of the observed data, from which the KDE can be calculated. We propose an online bandwidth estimation approach and a compression/revitalization scheme which maintains the KDE’s complexity low. We compare the proposed online KDE to the state-of-the-art approaches on examples of estimating stationary and non-stationary distributions, and on examples of classification. The results show that the online KDE outperforms or achieves a comparable performance to the state-of-the-art and produces models with a significantly lower complexity while allowing online adaptation.

MVL Lab5: Multi-modal Indoor Person Localization Dataset

Mon, 01 Jan 0001 00:00:00 +0000

This technical report describes MVL Lab5, a multi-modal indoor person localization dataset. The dataset contains a sequence of video frames obtained from four calibrated and time-synchronized video cameras and location event data stream from a commercially-available radio-based localization system. The scenario involves five individuals walking around a realistically cluttered room. Provided calibration data and ground truth annotations enable evaluation of person detection, localization and identification approaches. These can be either purely computer-vision based, or based on fusion of video and radio information. This document is intended as the primary documentation source for the dataset, presenting its availability, acquisition procedure, and organization. The structure and format of data is described in detail, along with documentation for bundled Matlab code and examples of its use.

Napredne metode računalniškega vida za avtonomno navigacijo robotskega plovila

Mon, 01 Jan 0001 00:00:00 +0000

The aim of our project is development of computer vision algorithms for autonomous navigation of a sea vessel by means of image segmentation and stabilization, long-term tracking, inference of 3D structure from motion, and horizon detectio

Non-sequential Multi-view Detection, Localization and Identification of People Using Multi-modal Feature Maps

Mon, 01 Jan 0001 00:00:00 +0000

Object Tracking by Reconstruction with View-Specific Discriminative Correlation Filters

Mon, 01 Jan 0001 00:00:00 +0000

Standard RGB-D trackers treat the target as an inherently 2D structure, which makes modelling appearance changes related even to simple out-of-plane rotation highly challenging. We address this limitation by proposing a novel long-term RGB-D tracker - Object Tracking by Reconstruction (OTR). The tracker performs online 3D target reconstruction to facilitate robust learning of a set of view-specific discriminative correlation filters (DCFs). The 3D reconstruction supports two performance-enhancing features: (i) generation of accurate spatial support for constrained DCF learning from its 2D projection and (ii) point cloud based estimation of 3D pose change for selection and storage of view-specific DCFs which are used to robustly localize the target after out-of-view rotation or heavy occlusion. Extensive evaluation of OTR on the challenging Princeton RGB-D tracking and STC Benchmarks shows it outperforms the state-of-the-art by a large margin.

Observing Human Motion Using Far-Infrared (FLIR) Camera -- Some Preliminary Studies

Mon, 01 Jan 0001 00:00:00 +0000

Far infrared imaging technology is becoming an interesting choice for many civilian uses. We explored the potential of using far infrared camera for human motion analysis, especially from the viewpoint of possible automated image and video analysis. In this article, we present the main characteristics of far infrared imagery that should be of interest to computer vision researchers and seek to eliminate some common misunderstandings about the far infrared imagery which may influence the choice of far infrared technology over other alternatives. We provide images that illustrate the problems and advances of using the far infrared imaging technology, especially for the purpose of observing humans.

Obstacle Detection for USVs by Joint Stereo-View Semantic Segmentation

Mon, 01 Jan 0001 00:00:00 +0000

We propose a stereo-based obstacle detection approach for unmanned surface vehicles. Obstacle detection is cast as a scene semantic segmentation problem in which pixels are assigned a probability of belonging to water or non-water regions. We extend a single-view model to a stereo system by adding a constraint which prefers consistent class labels assignment to pixels in the left and right camera images corresponding to the same parts of a 3D scene. Our approach jointly fits a semantic model to both images, leading to an improved class-label posterior map from which obstacles and water edge are extracted. In overall F-measure, our approach outperforms the current state-of-the-art monocular approach by 0.495, a monocular CNN by 0.798 and their stereo extensions by 0.059 and 0.515, respectively on the task of obstacle detection while running real-time on a single CPU.

Obstacle Tracking for Unmanned Surface Vessels using 3D Point Cloud

Mon, 01 Jan 0001 00:00:00 +0000

We present a method for detecting and tracking waterborne obstacles from an unmanned surface vehicle (USV) for the purpose of short-term obstacle avoidance. A stereo camera system provides a point cloud of the scene in front of the vehicle. The water surface is estimated by fitting a plane to the point cloud and outlying points are further processed to find potential obstacles. We propose a new plane fitting algorithm for water surface detection that applies a fast approximate semantic segmentation to filter the point cloud and utilizes an external IMU reading to constrain the plane orientation. A novel histogram-like depth appearance model is proposed to keep track of the identity of the detected obstacles through time and to filter out false detections, which negatively impact vehicle’s automatic guidance system. The improved plane fitting algorithm and the temporal verification using depth fingerprints result in notable improvement on the challenging MODD2 dataset, by significantly reducing the amount of false positive detections. The proposed method is able to run in real time on board of a small-sized USV, which was used to acquire the MODD2 dataset as well.

Online Discriminative Kernel Density Estimation

Mon, 01 Jan 0001 00:00:00 +0000

We propose a new method for online estimation of probabilistic discriminative models. The method is based on the recently proposed online Kernel Density Estimation \mbox(oKDE) framework which produces Gaussian mixture models and allows adaptation using only a single data point at a time. The oKDE builds reconstructive models from the data, and we extend it to take into account the interclass discrimination through a new distance function between the classifiers. We arrive at an online discriminative Kernel Density Estimators \mbox(odKDE). We compare the odKDE to oKDE, batch state-of-the-art KDEs and support vector machine (SVM) on a standard database. The odKDE achieves comparable classification performance to that of best batch KDEs and SVM, while allowing online adaptation, and produces models of lower complexity than the oKDE.

Online Discriminative Kernel Density Estimator With Gaussian Kernels

Mon, 01 Jan 0001 00:00:00 +0000

We propose a new method for a supervised online estimation of probabilistic discriminative models for classification tasks. The method estimates the class distributions from a stream of data in form of Gaussian mixture models (GMM). The reconstructive updates of the distributions are based on the recently proposed online Kernel Density Estimator (oKDE). We maintain the number of components in the model low by compressing the GMMs from time to time. We propose a new cost function that measures loss of interclass discrimination during compression, thus guiding the compression towards simpler models that still retain discriminative properties. The resulting classifier thus independently updates the GMM of each class, but these GMMs interact during their compression through the proposed cost function. We call the proposed method the online discriminative Kernel Density Estimator (odKDE). We compare the odKDE to oKDE, batch state-of-the-art KDEs and batch/incremental support vector machines (SVM) on the publicly-available datasets. The odKDE achieves comparable classification performance to that of best batch KDEs and SVM, while allowing online adaptation from large datasets, and produces models of lower complexity than the oKDE.

Online Kernel Density Estimation For Interactive Learning

Mon, 01 Jan 0001 00:00:00 +0000

In this paper we propose a Gaussian-kernel-based online kernel density estimation which can be used for applications of online probability density estimation and online learning. Our approach generates a Gaussian mixture model of the observed data and allows online adaptation from positive examples as well as from the negative examples. The adaptation from the negative examples is realized by a novel concept of unlearning in mixture models. Low complexity of the mixtures is maintained through a novel compression algorithm. In contrast to the existing approaches, our approach does not require fine-tuning parameters for a specific application, we do not assume specific forms of the target distributions and temporal constraints are not assumed on the observed data. The strength of the proposed approach is demonstrated with examples of online estimation of complex distributions, an example of unlearning, and with an interactive learning of basic visual concepts.

PanSR: An Object-Centric Mask Transformer for Panoptic Segmentation

Mon, 01 Jan 0001 00:00:00 +0000

Panoptic segmentation is a fundamental task in computer vision and a crucial component for perception in autonomous vehicles. Recent mask-transformer-based methods achieve impressive performance on standard benchmarks but face significant challenges with small objects, crowded scenes and scenes exhibiting a wide range of object scales. We identify several fundamental shortcomings of the current approaches: (i) the query proposal generation process is biased towards larger objects, resulting in missed smaller objects, (ii) initially well-localized queries may drift to other objects, resulting in missed detections, (iii) spatially well-separated instances may be merged into a single mask causing inconsistent and false scene interpretations. To address these issues, we rethink the individual components of the network and its supervision, and propose a novel method for panoptic segmentation PanSR. PanSR effectively mitigates instance merging, enhances small-object detection and increases performance in crowded scenes, delivering a notable +3.4 PQ improvement over state-of-the-art on the challenging LaRS benchmark, while reaching state-of-the-art performance on Cityscapes. URL

Part-Based Room Categorization for Household Service Robots

Mon, 01 Jan 0001 00:00:00 +0000

A service robot that operates in a previously-unseen home environment should be able to recognize the functionality of the rooms it visits, such as a living room, a bathroom, etc. We present a novel part-based model and an approach for room categorization using data obtained from a visual sensor. Images are represented with sets of unordered parts that are obtained by object-agnostic region proposals, and encoded using state-of-the-art image descriptor extractor — a convolutional neural network (CNN). An approach is proposed that learns category-specific discriminative parts for the part-based model. The proposed approach was compared to the state-of-the-art CNN trained specifically for place recognition. Experimental results show that the proposed approach outperforms the holistic CNN by being robust to image degradation, such as occlusions, modifications of image scaling, and aspect changes. In addition, we report non-negligible annotation errors and image duplicates in a popular dataset for place categorization and discuss annotation ambiguities.

Performance Evaluation Methodology for Long-Term Single Object Tracking

Mon, 01 Jan 0001 00:00:00 +0000

A long-term visual object tracking performance evaluation methodology and a benchmark are proposed. Performance measures are designed by following a long-term tracking definition to maximize the analysis probing strength. The new measures outperform existing ones in interpretation potential and in better distinguishing between different tracking behaviors. We show that these measures generalize the short-term performance measures, thus linking the two tracking problems. Furthermore, the new measures are highly robust to temporal annotation sparsity and allow annotation of sequences hundreds of times longer than in the current datasets without increasing manual annotation labor. A new challenging dataset of carefully selected sequences with many target disappearances is proposed. A new tracking taxonomy is proposed to position trackers on the short-term/long-term spectrum. The benchmark contains an extensive evaluation of the largest number of long-term trackers and comparison to state-of-the-art short-term trackers. We analyze the influence of tracking architecture implementations to long-term performance and explore various re-detection strategies as well as influence of visual model update strategies to long-term tracking drift. The methodology is integrated in the VOT toolkit to automate experimental analysis and benchmarking and to facilitate future development of long-term trackers.

Physics-Based Modelling of Human Motion using Kalman Filter and Collision Avoidance Algorithm

Mon, 01 Jan 0001 00:00:00 +0000

The paper deals with the problem of computer vision based multi-Peršon motion tracking, which in many cases suffers from lack of discriminating features of observed Peršons. To solve this problem, a physics based model of human motion is proposed, which includes intertial forces of the Peršons by the means of the Kalman filter, and the cylindrical envelopes, which produce collision avoiding forces when observed Peršons come to close proximity. We tested the proposed method on two sequences, one from squash match, and the other from the basketball play and found out that the number of tracker mistakes significantly decreased.

Prepletanje umetne inteligence in fizike pri napovedovanju obalnih poplav

Mon, 01 Jan 0001 00:00:00 +0000

Podnebne spremembe prek številnih mehanizmov povzročajo dvig srednje gladine globalnih oceanov, kar velja tudi za Slovensko morje. Modelske projekcije rasti globalne gladine morja predvidevajo, da bo do leta 2050 srednja gladina morja v Tržaškem zalivu najverjetneje narasla za 30 do 50 centimetrov, do konca stoletja pa za 40 do 100 cm. To pomeni, da naj bi do srede stoletja pogostost poplav narasla 10 do 20-krat, do konca stoletja pa naj bi bile poplave tudi do dvestokrat bolj pogoste. Napovedovanje poplav je zaradi specifike jadranskega bazena izredno zahtevno, saj vključuje simulacijo razvoja atmofserskefa modela in modela morja. V članku razložimo pristop k napovedovanju z globoko nevronsko mrežo, ki dosega ali presega natančnost napovedi fizikalnega modela.

Probabilistic tracking using optical flow to resolve color ambiguities

Mon, 01 Jan 0001 00:00:00 +0000

Color-based tracking is prone to failure in situations where visually similar targets are moving in close proximity to each other. To deal with the ambiguities in color information we propose an additional color-independent feature based on the target’s local motion, which is calculated from the optical flow induced by the target in consecutive images. By modifying a color-based particle filter to account for the target’s local-motion, the hybrid color/local-motion-based tracker is constructed. The hybrid tracker was compared to a purely color-based tracker on a challenging data-set that involved near-collisions and complete occlusions between visually similar Peršons. The optical flow was estimated using a robust and a nonrobust method. The experiments show that even if a nonrobust method is used to estimate the optical flow, the local-motion feature largely resolves ambiguities caused by the visual similarity between Peršons.

Prototipi značilk za adaptivno zaznavanje ovir na vodni površini

Mon, 01 Jan 0001 00:00:00 +0000

Unmanned surface vehicles (USV) rely on robust perception methods for obstacle detection. Current segmentation-based state-of-the-art methods lack the desired robustness and generalization capabilities required to adapt to new situations. To address this, we design WaSR-AD, a network with an explicit adaptation capability based on class prototypes. Initial prototypes are extracted during training and adapted during inference in an online fashion. The adapted prototypes are used to enrich the image features with additional adaptive context. Evaluation on the MODS benchmark reveals that such explicit adaptation of the prototypes significantly improves the detection performance, achieving 14% lower water segmentation error and 3.6% F1-score increase inside the critical 15m danger-zone area around the boat, with a negligible cost in inference time.

Razlike v opravljeni poti in povprečni hitrosti gibanja med različnimi tipi košarkarjev

Mon, 01 Jan 0001 00:00:00 +0000

V članku obravnavamo problem obremenitve košarkarjev na tekmah. Osnovni cilj raziskave je ugotavljanje intenzivnosti in obsega gibanja košarkarjev s pomočjo merilnega sistema SAGIT. Gre za razmeroma novo tehnologijo, ki temelji na metodah računalniškega vida in omogoča avtomatsko pridobivanje podatkov iz video posnetkov tekem. S pomočjo omenjenega sistema smo ugotavljali opravljeno pot in povprečno hitrost gibanja košarkarjev na treh tekmah končnice državnega prvenstva Slovenije za člane med ekipama Union Olimpija in Geoplin Slovan v sezoni 2004/05. Omenjene parametre smo ugotavljali za skupno 22 košarkarjev, ki so v posameznem polčasu tekme igrali vsaj 200 sekund. Glede na to, da v košarki poznamo več različnih tipov igralcev, ki imajo različne vloge v igri, smo opravljeno pot in povprečno hitrost gibanja izračunali za tri osnovne tipe igralcev (branilce, krila in centre) in s pomočjo enosmerne analize variance ugotavljali razlike med njimi. Ugotovili smo, da v aktivnem delu igre (ko ura za merjenje igralnega časa teče) v enem polčasu oz. 20 minutah igralci v povprečju opravijo pot dolgo 2227 metrov, v pasivnem delu pa še dodatnih 920 metrov. Povprečna hitrost gibanja igralcev v aktivnem delu igre znaša 1,84 m/s. Kar se tiče posameznih tipov igralcev lahko ugotovimo, da v aktivni fazi igre najdaljšo pot v povprečju opravijo branilci (2300 m), sledijo jim krila (2246 m) in nato centri (2118 m). Razlike med posameznimi tipi igralcev so statistično značilne na nivoju 1% napake. Enako velja tudi za povprečno hitrost gibanja, pri čemer se branilci gibljejo s povprečno hitrostjo 1,92 m/s, krila 1,87 m/s, centri pa 1,74 m/s.

Recognition of Multi-Agent Activities with Petri Nets

Mon, 01 Jan 0001 00:00:00 +0000

Reconstruction by inpainting for visual anomaly detection

Mon, 01 Jan 0001 00:00:00 +0000

Visual anomaly detection addresses the problem of classification or localization of regions in an image that deviate from their normal appearance. A popular approach trains an auto-encoder on anomaly-free images and performs anomaly detection by calculating the difference between the input and the reconstructed image. This approach assumes that the auto-encoder will be unable to accurately reconstruct anomalous regions. But in practice neural networks generalize well even to anomalies and reconstruct them sufficiently well, thus reducing the detection capabilities. Accurate reconstruction is far less likely if the anomaly pixels were not visible to the auto-encoder. We thus cast anomaly detection as a self-supervised reconstruction-by-inpainting problem. Our approach (RIAD) randomly removes partial image regions and reconstructs the image from partial inpaintings, thus addressing the drawbacks of auto-enocoding methods. RIAD is extensively evaluated on several benchmarks and sets a new state-of-the art on a recent highly challenging anomaly detection benchmark.

Robust and efficient vision system for group of cooperating mobile robots with application to soccer robots

Mon, 01 Jan 0001 00:00:00 +0000

Robust Visual Tracking using an Adaptive Coupled-layer Visual Model

Mon, 01 Jan 0001 00:00:00 +0000

This paper addresses the problem of tracking objects which undergo rapid and significant appearance changes. We propose a novel coupled-layer visual model that combines the target’s global and local appearance by interlacing two layers. The local layer in this model is a set of local patches that geometrically constrain the changes in the target’s appearance. This layer probabilistically adapts to the target’s geometric deformation, while its structure is updated by removing and adding the local patches. The addition of these patches is constrained by the global layer that probabilistically models target’s global visual properties such as color, shape and apparent local motion. The global visual properties are updated during tracking using the stable patches from the local layer. By this coupled constraint paradigm between the adaptation of the global and the local layer, we achieve a more robust tracking through significant appearance changes. We experimentally compare our tracker to eleven state-of-the-art trackers. The experimental results on challenging sequences confirm that our tracker outperforms the related trackers in many cases by having smaller failure rate as well as better accuracy. Furthermore, the parameter analysis shows that our tracker is stable over a range of parameter values.

Robust visual tracking using template anchors

Mon, 01 Jan 0001 00:00:00 +0000

Deformable part models exhibit excellent performance in tracking non-rigidly deforming targets, but are usually outperformed by holistic models when the target does not deform or in the presence of uncertain visual data. The reason is that part-based models require estimation of a larger number of parameters compared to holistic models and since the updating process is self-supervised, the errors in parameter estimation are amplified with time, leading to a faster accuracy reduction than in holistic models. On the other hand, the robustness of part-based trackers is generally greater than in holistic trackers. We address the problem of self-supervised estimation of a large number of parameters by introducing controlled graduation in estimation of the free parameters. We propose decomposing the visual model into several sub-models, each describing the target at a different level of detail. The sub-models interact during target localization and, depending on the visual uncertainty, serve for cross-sub-model supervised updating. A new tracker is proposed based on this model which exhibits the qualities of part-based as well as holistic models. The tracker is tested on the highly-challenging VOT2013 and VOT2014 benchmarks, outperforming the state-of-the-art.

Room Categorization Based on a Hierarchical Representation of Space

Mon, 01 Jan 0001 00:00:00 +0000

For successful operation in real-world environments, a mobile robot requires an effective spatial model. The model should be compact, should possess large expressive power and should scale well with respect to the number of modelled categories. In this paper we propose a new compositional hierarchical representation of space that is based on learning statistically significant observations, in terms of the frequency of occurrence of various shapes in the environment. We have focused on a two-dimensional space, since many robots perceive their surroundings in two dimensions with the use of a laser range finder or sonar. We also propose a new low-level image descriptor, by which we demonstrate the performance of our representation in the context of a room categorization problem. Using only the lower layers of the hierarchy, we obtain state-of-the-art categorization results in two different experimental scenarios. We also present a large, freely available, dataset, which is intended for room categorization experiments based on data obtained with a laser range finder.

Room Classification using a Hierarchical Representation of Space

Mon, 01 Jan 0001 00:00:00 +0000

Sekvenčne Monte Carlo metode za sledenje oseb v računalniškem vidu

Mon, 01 Jan 0001 00:00:00 +0000

People tracking is a part of a broad domain of computer vision, that has received a great attention from researchers over the last twenty years. An interesting aspect of the problem of tracking originates from the field of control theory and considers the object being tracked as a dynamical system with a hidden state, of which only the current measurements are available and observed. The classical methods that were used in the past to tackle this problem employed Kalman filters and their derivatives. These generally assume a Gaussian linear dynamical and measurement model, assumptions, which are usually too restrictive for the majority of natural processes. In the late 90’s, the advances in the sequential Monte Carlo methods on various fields of science gave rise to a family of methods that effectively deal with problems of this kind. Their main advantage over the Kalman filter is that they do not impose as restrictive assumptions and can be relatively easily implemented. In computer vision, the sequential Monte Carlo methods, also known as particle filters, became extremely popular with the introduction of the Condensation algorithm. Since then, a body of literature has been published regarding these methods. This thesis is dedicated to the problem of tracking people by means of sequential Monte Carlo methods, application of which is demonstrated on a system for tracking players in team sports. We first consider the problem of tracking in the context of statistical estimation and present the main parts of the Monte Carlo solutions. The well known Condensation algorithm, which comprises the central part of all the trackers presented here, is introduced as a sequential Monte Carlo method and a simple algorithm to track one player is presented. By considering a team sport in the context of a closed world, a set of assumptions that depicts a typical match is derived. Following these assumptions, a more robust single-player tracker is developed and then extended to the case of multiple players. Finally, two variants of trackers for tracking multiple players in the closed worlds are presented. A number of experiments are reported to evaluate the performance of the trackers and based on the results, the most suitable multi-player tracker is chosen. We also point out some guidelines for future development of the application for tracking multiple players.

Self-understanding and self-extension: a systems and representational approach

Mon, 01 Jan 0001 00:00:00 +0000

There are many different approaches to building a system that can engage in autonomous mental development. In this paper we present an approach based on what we term \em self-understanding, by which we mean the use of explicit representation of and reasoning about what a system does and doesn’t know, and how that understanding changes under action. We present a coherent architecture and a set of representations used in two robot systems that exhibit a limited degree of autonomous mental development, what we term \em self-extension. The contributions include: representations of gaps and uncertainty for specific kinds of knowledge, and a motivational and planning system for setting and achieving learning goals.

Sledenje objektov s kvadrokopterjem z gibljivo kamero

Mon, 01 Jan 0001 00:00:00 +0000

Sledenje objektov v robotskem nogometu

Mon, 01 Jan 0001 00:00:00 +0000

Sledenje objektov v robotskem nogometu

Mon, 01 Jan 0001 00:00:00 +0000

Robotski nogomet je visoko tehnološki šport, ki ga je leta 1995 na korejskem tehnološkem inštitutu razvil profesor Jong-Hwan Kim kot večnamensko okolje za učenje in testiranje aplikacij analize slik, umetne inteligence, senzorjev, komunikacij itd. V zadnjih osmih letih je robotski nogomet doživel velik razmah tako v sklopu zabavne elektronike kot v sklopu testiranja in razvoja novih tehnologij. Danes obstajata dve mednarodni zvezi robotskega nogometa in sicer Robocup in FIRA (Federation of International Robot Association). Vsaka izmed obeh zvez organizira ločena tekmovanja v različnih kategorijah, kategorija pa določa lastnosti izvedbe tekme in sicer od čistih simulacij na računalniku preko mikrorobotov do humanoidnih robotov. Na Fakulteti za Elektrotehniko v Ljubljani so se z robotskim nogometom kategorije MiroSot pričeli ukvarjati leta 2000, in ga poimenovali Robobrc. Robobrc deluje v dveh različicah kategorije MiroSot, ki se razlikujeta le v številu igralcev in dimenzijah igrišča. V prvi različici vključuje vsak tim po tri igralce (igra treh igralcev ali mala liga), v drugi različici pa po pet (igra petih igralcev ali srednja liga). S prehodom iz igre treh na igro petih igralcev se je pojavila potreba po učinkovitem sledilniku, ki bi ločil večje število barv in sledil desetim robotkom in žogici v realnem času. V diplomski nalogi je obravnavana aplikacija sledenja robotkov pri tekmah Robobrca, zato bomo najprej v uvodu predstavili le tista pravila obeh različic igre, ki so pomembna za nadzorni sistem računalniškega vida. V nadaljevanju bomo predstavili če področje računalniškega vida in sledenja objektov, kjer bomo podali kratek pregled literature o sledenju v športu in robotskem nogometu.

Sledenje več igralcev v športnih igrah na podlagi vizualne informacije

Mon, 01 Jan 0001 00:00:00 +0000

V članku je predstavljen sledilnik za sledenje več igralcev v dvoranskih športnih igrah, kot sta rokomet in košarka, na podlagi vizualne informacije, pridobljene s kamero, nameščeno nad igriščem. Sledenje posameznega igralca je postavljeno v kontekst Bayesovega filtriranja za rekurzivno ocenjevanje a posteriori porazdelitve stanja tarče in temelji na metodah filtrov z delci. V članku sta obdelana dva glavna dela sledilnika: sledilnik za sledenje posameznega igralca in mehanizem za sledenje več vizualno podobnih igralcev. V okviru slednjega mehanizma predlagamo originalno rešitev, kjer sliko v vsakem časovnem koraku razdelimo v take neprekrivajoče se regije, da vsaka vsebuje le po enega igralca, ter tako dosežemo poenostavitev problema sledenja več tarč, kadar med vizualno podobnimi tarčami prihaja do trkov. Predlagani sledilnik smo primerjali z nerobustnim sledilnikom, ki ni vseboval mehanizma za obvladovanje situacij, ko med tarčami prihaja do trkov. Ugotovili smo, da predlagani mehanizem zmanjša število potrebnih intervencij operaterja in tako omogoča robustno in hitro obdelavo velike količine videopodatkov.

Spatially-Adaptive Filter Units for Compact and Efficient Deep Neural Networks

Mon, 01 Jan 0001 00:00:00 +0000

Convolutional neural networks excel in a number of computer vision tasks. One of their most crucial architectural elements is the effective receptive field size, which has to be manually set to accommodate a specific task. Standard solutions involve large kernels, down/up-sampling and dilated convolutions. These require testing a variety of dilation and down/up-sampling factors and result in non-compact networks and large number of parameters. We address this issue by proposing a new convolution filter composed of displaced aggregation units (DAU). DAUs learn spatial displacements and adapt the receptive field sizes of individual convolution filters to a given problem, thus reducing the need for hand-crafted modifications. DAUs provide a seamless substitution of convolutional filters in existing state-of-the-art architectures, which we demonstrate on AlexNet, ResNet50, ResNet101, DeepLab and SRN-DeblurNet. The benefits of this design are demonstrated on a variety of computer vision tasks and datasets, such as image classification (ILSVRC 2012), semantic segmentation (PASCAL VOC 2011, Cityscape) and blind image de-blurring (GOPRO). Results show that DAUs efficiently allocate parameters resulting in up to 4× more compact networks in terms of the number of parameters at similar or better performance.

Spatially-Adaptive Filter Units for Deep Neural Networks

Mon, 01 Jan 0001 00:00:00 +0000

Classical deep convolutional networks increase receptive field size by either gradual resolution reduction or application of hand-crafted dilated convolutions to prevent increase in the number of parameters. In this paper we propose a novel displaced aggregation unit (DAU) that does not require hand-crafting. In contrast to classical filters with units (pixels) placed on a fixed regular grid, the displacement of the DAUs are learned, which enables filters to spatially-adapt their receptive field to a given problem. We extensively demonstrate the strength of DAUs on a classification and semantic segmentation tasks. Compared to ConvNets with regular filter, ConvNets with DAUs achieve comparable performance at faster convergence and up to 3-times reduction in parameters. Furthermore, DAUs allow us to study deep networks from novel perspectives. We study spatial distributions of DAU filters and analyze the number of parameters allocated for spatial coverage in a filter.

Stereo obstacle detection for unmanned surface vehicles by IMU-assisted semantic segmentation

Mon, 01 Jan 0001 00:00:00 +0000

A new obstacle detection algorithm for unmanned surface vehicles (USVs) is presented. A state-of-the-art graphical model for semantic segmentation is extended to incorporate boat pitch and roll measurements from the on-board inertial measurement unit (IMU), and a stereo verification algorithm that consolidates tentative detections obtained from the segmentation is proposed. The IMU readings are used to estimate the location of horizon line in the image, which automatically adjusts the priors in the probabilistic semantic segmentation model. We derive the equations for projecting the horizon into images, propose an efficient optimization algorithm for the extended graphical model, and offer a practical IMU–camera–USV calibration procedure. Using an USV equipped with multiple synchronized sensors, we captured a new challenging multi-modal dataset, and annotated its images with water edge and obstacles. Experimental results show that the proposed algorithm significantly outperforms the state of the art, with nearly 30% improvement in water-edge detection accuracy, an over 21% reduction of false positive rate, an almost 60% reduction of false negative rate, and an over 65% increase of true positive rate, while its Matlab implementation runs in real-time.

Superpixel Segmentation for Robust Visual Tracking

Mon, 01 Jan 0001 00:00:00 +0000

Tailgating Detection Using Histograms of Optical Flow

Mon, 01 Jan 0001 00:00:00 +0000

Temporal Context for Robust Maritime Obstacle Detection

Mon, 01 Jan 0001 00:00:00 +0000

Robust maritime obstacle detection is essential for fully autonomous unmanned surface vehicles (USVs). The currently widely adopted segmentation-based obstacle detection methods are prone to misclassification of object reflections and sun glitter as obstacles, producing many false positive detections, effectively rendering the methods impractical for USV navigation. However, water-turbulence-induced temporal appearance changes on object reflections are very distinctive from the appearance dynamics of true objects. We harness this property to design WaSR-T, a novel maritime obstacle detection network, that extracts the temporal context from a sequence of recent frames to reduce ambiguity. By learning the local temporal characteristics of object reflection on the water surface, WaSR-T substantially improves obstacle detection accuracy in the presence of reflections and glitter. Compared with existing single-frame methods, WaSR-T reduces the number of false-positive detections by 41% overall and by over 53% within the danger zone of the boat, while preserving a high recall, and achieving new state-of-the-art performance on the challenging MODS maritime obstacle detection benchmark.

Temporal Segmentation of Group Motion using Gaussian Mixture Models

Mon, 01 Jan 0001 00:00:00 +0000

This paper presents a new trajectory-based approach for probabilistic temporal segmentation of team sports. The probabilistic game model is applied to the player-trajectory data in order to segment individual game instants into one of the three game phases (offensive game, defensive game and time-outs) and a nonlinear or Gaussian smoothing kernel is used to enforce the temporal continuity of the game. The presented approach is compared to the Support Vector Machine (SVM) classifier on three basketball and three handball matches. The obtained results suggest that our approach is general and robust and as such could be applied to various team sports. It can handle unusual game situations such as player exclusions, substitution or injuries which may happen during the game.

The Eighth Visual Object Tracking VOT2020 Challenge Results

Mon, 01 Jan 0001 00:00:00 +0000

The Visual Object Tracking challenge VOT2020 is the eighth annual tracker benchmarking activity organized by the VOT initiative. Results of 58 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The VOT2020 challenge was composed of five sub-challenges focusing on different tracking domains: (i) VOT-ST2020 challenge focused on short-term tracking in RGB, (ii) VOT-RT2020 challenge focused on real-time’ short-term tracking in RGB, (iii) VOT-LT2020 focused on long-term tracking namely coping with target disappearance and reappearance, (iv) VOT-RGBT2020 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2020 challenge focused on long-term tracking in RGB and depth imagery. Only the VOT-ST2020 datasets were refreshed. A significant novelty is introduction of a new VOT short-term tracking evaluation methodology, and introduction of segmentation ground truth in the VOT-ST2020 challenge – bounding boxes will no longer be used in the VOT-ST challenges. A new VOT Python toolkit that implements all these novelites was introduced. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website.

The MaSTr1325 dataset for training deep USV obstacle detection models

Mon, 01 Jan 0001 00:00:00 +0000

The progress of obstacle detection via semantic segmentation on unmanned surface vehicles (USVs) has been significantly lagging behind the developments in the related field of autonomous cars. The reason is the lack of large curated training datasets from USV domain required for development of data-hungry deep CNNs. This paper addresses this issue by presenting MaSTr1325, a marine semantic segmentation training dataset tailored for development of obstacle detection methods in small-sized coastal USVs. The dataset contains 1325 diverse images captured over a two year span with a real USV, covering a range of realistic conditions encountered in a coastal surveillance task. The images are per-pixel semantically labeled. The dataset exceeds previous attempts in this domain in size, scene complexity and domain realism. In addition, a dataset augmentation protocol is proposed to address slight appearance differences of the images in the training set and those in deployment. The accompanying experimental evaluation provides a detailed analysis of popular deep architectures, annotation accuracy and influence of the training set size. MaSTr1325 will be released to reaserch community to facilitate progress in obstacle detection for USVs.

The Ninth Visual Object Tracking VOT2021 Challenge Results

Mon, 01 Jan 0001 00:00:00 +0000

The Visual Object Tracking challenge VOT2021 is the ninth annual tracker benchmarking activity organized by the VOT initiative. Results of 71 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in recent years. The VOT2021 challenge was composed of four sub-challenges focusing on different tracking domains: (i) VOT-ST2021 challenge focused on short-term tracking in RGB, (ii) VOT-RT2021 challenge focused on ``real-time’’ short-term tracking in RGB, (iii) VOT-LT2021 focused on long-term tracking, namely coping with target disappearance and reappearance and (iv) VOT-RGBD2021 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2021 dataset was refreshed, while VOT-RGBD2021 introduces a training dataset and sequestered dataset for winner identification. The source code for most of the trackers, the datasets, the evaluation kit and the results along with the source code for most trackers are publicly available at the challenge website.

The Second Visual Object Tracking Segmentation VOTS2024 Challenge Results

Mon, 01 Jan 0001 00:00:00 +0000

The Visual Object Tracking Segmentation VOTS2024 challenge is the twelfth annual tracker benchmarking activity of the VOT initiative. This challenge consolidates the new tracking setup proposed in VOTS2023, which merges short-term and long-term as well as single-target and multiple-target tracking with segmentation masks as the only target location specification. Two sub-challenges are considered. The VOTS2024 standard challenge, focusing on classical objects and the VOTSt2024, which considers objects undergoing a topological transformation. Both challenges use the same performance evaluation methodology. Results of 28 submissions are presented and analyzed. A leaderboard, with participating trackers details, the source code, the datasets, and the evaluation kit are publicly available on the website https://www.votchallenge.net/vots2024/.

The Seventh Visual Object Tracking VOT2019 Challenge Results

Mon, 01 Jan 0001 00:00:00 +0000

The Visual Object Tracking challenge VOT2019 is the seventh annual tracker benchmarking activity organized by the VOT initiative. Results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis as well as the standard VOT methodology for long-term tracking analysis. The VOT2019 challenge was composed of five challenges focusing on different tracking domains: (i) VOTST2019 challenge focused on short-term tracking in RGB, (ii) VOT-RT2019 challenge focused on “real-time” shortterm tracking in RGB, (iii) VOT-LT2019 focused on longterm tracking namely coping with target disappearance and reappearance. Two new challenges have been introduced: (iv) VOT-RGBT2019 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2019 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2019, VOT-RT2019 and VOT-LT2019 datasets were refreshed while new datasets were introduced for VOT-RGBT2019 and VOT-RGBD2019. The VOT toolkit has been updated to support both standard shortterm, long-term tracking and tracking with multi-channel imagery. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website.

The sixth Visual Object Tracking VOT2018 challenge results

Mon, 01 Jan 0001 00:00:00 +0000

The Visual Object Tracking challenge VOT2018 is the sixth annual tracker benchmarking activity organized by the VOT initiative. Results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis and a \real-time" experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. A long-term tracking subchallenge has been introduced to the set of standard VOT sub-challenges. The new subchallenge focuses on long-term tracking properties, namely coping with target disappearance and reappearance. A new dataset has been compiled and a performance evaluation methodology that focuses on long-term tracking capabilities has been adopted. The VOT toolkit has been updated to support both standard short-term and the new longterm tracking subchallenges. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website.

The Tenth Visual Object Tracking VOT2022 Challenge Results

Mon, 01 Jan 0001 00:00:00 +0000

The Visual Object Tracking challenge VOT2022 is the tenth annual tracker benchmarking activity organized by the VOT initiative. Results of 93 entries are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in recent years. The VOT2022 challenge was composed of seven sub-challenges focusing on different tracking domains: (i) VOT-STs2022 challenge focused on short-term tracking in RGB by segmentation, (ii) VOT-STb2022 challenge focused on short-term tracking in RGB by bounding boxes, (iii) VOT-RTs2022 challenge focused on real-time'' short-term tracking in RGB by segmentation, (iv) VOT-RTb2022 challenge focused on real-time’’ short-term tracking in RGB by bounding boxes, (v) VOT-LT2022 focused on long-term tracking, namely coping with target disappearance and reappearance, (vi) VOT-RGBD2022 challenge focused on short-term tracking in RGB and depth imagery, and (vii) VOT-D2022 challenge focused on short-term tracking in depth-only imagery. New datasets were introduced in VOT-LT2022 and VOT-RGBD2022, VOT-ST2022 dataset was refreshed, and a training dataset was introduced for VOT-LT2022. The source code for most of the trackers, the datasets, the evaluation kit and the results are publicly available at the challenge website.

The Thermal Infrared Visual Object Tracking VOT-TIR2015 Challenge Results

Mon, 01 Jan 0001 00:00:00 +0000

The Visual Object Tracking VOT2013 challenge results

Mon, 01 Jan 0001 00:00:00 +0000

Visual tracking has attracted a significant attention in the last few decades. The recent surge in the number of publications on tracking-related problems have made it almost impossible to follow the developments in the field. One of the reasons is that there is a lack of commonly accepted annotated data-sets and standardized evaluation protocols that would allow objective comparison of different tracking methods. To address this issue, the Visual Object Tracking (VOT) workshop was organized in conjunction with ICCV2013. Researchers from academia as well as industry were invited to participate in the first VOT2013 challenge which aimed at single-object visual trackers that do not apply pre-learned models of object appearance (model-free). Presented here is the VOT2013 benchmark dataset for evaluation of single-object visual trackers as well as the results obtained by the trackers competing in the challenge. In contrast to related attempts in tracker benchmarking, the dataset is labeled per-frame by visual attributes that indicate occlusion, illumination change, motion change, size change and camera motion, offering a more systematic comparison of the trackers. Furthermore, we have designed an automated system for performing and evaluating the experiments. We present the evaluation protocol of the VOT2013 challenge and the results of a comparison of 27 trackers on the benchmark dataset. The dataset, the evaluation tools and the tracker rankings are publicly available from the challenge website\footnote{http://votchallenge.net}.

The Visual Object Tracking VOT2014 challenge results

Mon, 01 Jan 0001 00:00:00 +0000

The Visual Object Tracking VOT2015 challenge results

Mon, 01 Jan 0001 00:00:00 +0000

The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 62 trackers are presented. The number of tested trackers makes VOT 2015 the largest benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the appendix. Features of the VOT2015 challenge that go beyond its VOT2014 predecessor are: (i) a new VOT2015 dataset twice as large as in VOT2014 with full annotation of targets by rotated bounding boxes and per-frame attribute, (ii) extensions of the VOT2014 evaluation methodology by introduction of a new performance measure. The dataset, the evaluation kit as well as the results are publicly available at the challenge website.

The Visual Object Tracking VOT2016 challenge results

Mon, 01 Jan 0001 00:00:00 +0000

The Visual Object Tracking challenge VOT2016 aims at comparing short-term single-object visual trackers that do not apply prelearned models of object appearance. Results of 70 trackers are presented, with a large number of trackers being published at major computer vision conferences and journals in the recent years. The number of tested state-of-the-art trackers makes the VOT 2016 the largest and most challenging benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the Appendix. The VOT2016 goes beyond its predecessors by (i) introducing a new semi-automatic ground truth bounding box annotation methodology and (ii) extending the evaluation system with the no-reset experiment. The dataset, the evaluation kit as well as the results are publicly available at the challenge website.

The Visual Object Tracking VOT2017 Challenge Results

Mon, 01 Jan 0001 00:00:00 +0000

The Visual Object Tracking challenge VOT2017 is the fifth annual tracker benchmarking activity organized by the VOT initiative. Results of 51 trackers are presented; many are state-of-the-art published at major computer vision conferences or journals in recent years. The evaluation included the standard VOT and other popular methodologies and a new “real-time” experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The VOT2017 goes beyond its predecessors by (i) improving the VOT public dataset and introducing a separate VOT2017 sequestered dataset, (ii) introducing a realtime tracking experiment and (iii) releasing a redesigned toolkit that supports complex experiments. The dataset, the evaluation kit and the results are publicly available at the challenge website.

The VOT2013 challenge: overview and additional results

Mon, 01 Jan 0001 00:00:00 +0000

Towards a large-scale category detection with a distributed hierarchical compositional model

Mon, 01 Jan 0001 00:00:00 +0000

In this paper we evaluate a visual object detection system implemented on a distributed processing platform, presented in our previous work, with the goal of assessing the scalability of the system to a large-scale category detection. While state-of-the-art detection methods based on sliding windows may not be capable of scaling to a higher number of categories, we provide initial evidence that using a hierarchical compositional method called learned-hierarchy-of-parts (LHOP) may be capable of scaling to a higher number of categories. We show with the library trained on an MPEG-7 Shape database that the method is capable of scaling from a system with 5 categories and 6 second averaged response time to a system with 70 categories and averaged response time of 27 seconds.

Towards automated scyphistoma census in underwater imagery : a useful research and monitoring too

Mon, 01 Jan 0001 00:00:00 +0000

Towards automated scyphistoma census in underwater imagery: a useful research and monitoring tool

Mon, 01 Jan 0001 00:00:00 +0000

Manual annotation and counting of entities in underwater photographs is common in many branches of marine biology. With a marked increase of jellyfish populations worldwide, understanding the dynamics of the polyp (scyphistoma) stage of their life-cycle is becoming increasingly important. In-situ studies of polyp population dynamics are scarce due to small size of the polyps and tedious manual work required to annotate and count large numbers of items in underwater photographs. We devised an experiment which shows a large variance between human annotators, as well as in annotations made by the same annotator. We have tackled this problem, which is present in many areas of marine biology, by developing a method for automated detection and counting. Our polyp counter (PoCo) uses a two-stage approach with a fast detector (Aggregated Channel Features) and a precise classifier consisting of a pre-trained Convolutional Neural Network and a Support Vector Machine. PoCo was tested on a year-long image dataset and performed with accuracy comparable to human annotators but with 70-fold reduction in time. The algorithm can be used in many marine biology applications, vastly reducing the amount of manual labor and enabling processing of much larger datasets. The source code is freely available on GitHub.

Towards Deep Compositional Networks

Mon, 01 Jan 0001 00:00:00 +0000

Hierarchical feature learning based on convolutional neural networks (CNN) has recently shown significant potential in various computer vision tasks. While allowing high-quality discriminative feature learning, the downside of CNNs is the lack of explicit structure in features, which often leads to overfitting, absence of reconstruction from partial observations and limited generative abilities. Explicit structure is inherent in hierarchical compositional models, however, these lack the ability to optimize a well-defined cost function. We propose a novel analytic model of a basic unit in a layered hierarchical model with both explicit compositional structure and a well-defined discriminative cost function. Our experiments on two datasets show that the proposed compositional model performs on a par with standard CNNs on discriminative tasks, while, due to explicit modeling of the structure in the feature units, affording a straight-forward visualization of parts and faster inference due to separability of the units.

Towards fast and efficient methods for tracking players in sports

Mon, 01 Jan 0001 00:00:00 +0000

An efficient algorithm for tracking a single player in a sporting match is presented in this paper. The sporting event is considered as a semi-controlled environment for which a set of closed-world assumptions regarding the visual as well as dynamical properties is derived. We show how these assumptions can be used in the context of particle filtering to arrive at a computationally-fast and reliable tracker. The proposed tracker was evaluated on a demanding data set. When compared to several similar trackers that did not utilize all of the closed-world assumptions, the proposed tracker, on average, resulted in a better performance regarding the failure rate as well as position and prediction estimation.

Towards hierarchical representation of space

Mon, 01 Jan 0001 00:00:00 +0000

Various robotic systems, performing efficient navigation, localization and place recognition in their surrounding environments, have already been developed. These systems posess a representation of space that is based on some engineered knowledge. There is still no such system that would know about the structure of space in general, and whose knowledge would be obtained by learning. We believe that people learn about properties of space through interaction with the environment. Therefore, since people perform really well in the spatial related tasks, we expect that a robotic system that would obtain such knowledge would also perform better. With this in mind, we are developing an algorithm for learning a compositional hierarchical representation of space that is based on statistically significant observations. For now, we have focused on a two dimensional space, since many robots perceive their surroundings in two dimensions with the use of a laser range finder or a sonar. In this paper we evaluate our early work on this topic through room categorization problem. Based on the lower layers of the hierarchy, we obtained encouraging classification results with three different types of rooms.

Towards Probabilistic Online Discriminative Models

Mon, 01 Jan 0001 00:00:00 +0000

Tracking and Segmentation of Transparent Objects

Mon, 01 Jan 0001 00:00:00 +0000

Transparent object tracking is a challenging, recently introduced, problem. Existing methods predict target location as a bounding box, which is often only a poor approximation of actual location. Segmentation mask is a more accurate prediction, but benchmarks for evaluating tracking and segmentation performance of transparent objects does not exist. In this paper we address this drawback by introducing a new dataset for tracking and segmentation of transparent objects. In particular we sparsely re-annotate the existing bounding box TOTB dataset with ground-truth segmentation masks. A comprehensive analysis demonstrates that existing segmentation methods perform surprisingly well on this task indicating good design generalization and potential for transparent object tracking tasks. In addition, we show that existing bounding box trackers can be easily transformed into segmentation trackers using modern mask refinement methods.

Tracking by Identification Using Computer Vision and Radio

Mon, 01 Jan 0001 00:00:00 +0000

We present a novel system for detection, localization and tracking of multiple people, which fuses a multi-view computer vision approach with a radio-based localization system. The proposed fusion combines the best of both worlds, excellent computer-vision-based localization, and strong identity information provided by the radio system, and is therefore able to perform tracking by identification, which makes it impervious to propagated identity switches. We present comprehensive methodology for evaluation of systems that perform person localization in world coordinate system and use it to evaluate the proposed system as well as its components. Experimental results on a challenging indoor dataset, which involves multiple people walking around a realistically cluttered room, confirm that proposed fusion of both systems significantly outperforms its individual components. Compared to the radio-based system, it achieves better localization results, while at the same time it successfully prevents propagation of identity switches that occur in pure computer-vision-based tracking.

Tracking Non-Rigid Objects by Combining Local and Global Visual Model

Mon, 01 Jan 0001 00:00:00 +0000

We present an appearance-based tracker which hierarchically combines a global and a local visual model in two layers. The bottom layer contains the local part of the visual model and consists of a set of sub-trackers, each of them observing only a local aspect of the object. The top layer constrains and focuses the movement of individual sub-tracker by accounting for the global part of the model - the spatial relations between the trackers. The visual model is updated by modifying the spatial relations and by reinitializing the sub-trackers which do not follow the target. By reinitializing a single or a small number of sub-trackers the tracker can adapt only a part of its visual model to the new appearance of the object. This makes the tracker less vulnerable to drifting. The implementation of the two-layered tracker that uses a SSD template matching for the sub-trackers is presented and tested on a demanding data set of non-rigid objects.

Tracking people in video data using probabilistic models

Mon, 01 Jan 0001 00:00:00 +0000

Traffic sign classification with batch and on-line linear support vector machines

Mon, 01 Jan 0001 00:00:00 +0000

This paper presents a comprehensive benchmark of several feature types and colorspace representations on the task of traffic sign classification. We focus on linear Support Vector Machine classifiers, and test several multi-class formulations, as well as a formulation that allows on-line training and updates. Experiments on two standard traffic sign classification datasets show that despite their relative simplicity, these classifiers offer competitive performance, and ultimately allow design of a flexible classification system in the context of application for automatic maintenance of traffic signalization inventory.

Trans2k: Unlocking the Power of Deep Models for Transparent Object Tracking

Mon, 01 Jan 0001 00:00:00 +0000

Visual object tracking has focused predominantly on opaque objects, while transparent object tracking received very little attention. Motivated by the uniqueness of transparent objects in that their appearance is directly affected by the background, the first dedicated evaluation dataset has emerged recently. We contribute to this effort by proposing the first transparent object tracking training dataset Trans2k that consists of over 2k sequences with 104,343 images overall, annotated by bounding boxes and segmentation masks. Noting that transparent objects can be realistically rendered by modern renderers, we quantify domain-specific attributes and render the dataset containing visual attributes and tracking situations not covered in the existing object training datasets. We observe a consistent performance boost (up to 16%) across a diverse set of modern tracking architectures when trained using Trans2k, and show insights not previously possible due to the lack of appropriate training sets. The dataset and the rendering engine will be publicly released to unlock the power of modern learning-based trackers and foster new designs in transparent object tracking.

Using discriminative analysis for improving hierarchical compositional models

Mon, 01 Jan 0001 00:00:00 +0000

In this paper we propose a method to extract discriminative information from a generative model produced by a compositional hierarchical approach. We present discriminative information as a score computed from a weighted summation of the activation vector. We base the activation vector on individual activations of features from a parse tree of the detection. We utilize the score to reduce false positive detections by removing generative models with poor discriminative information from the vocabulary and by thresholding the detections with low discriminative score. We evaluate our approach on the ETHZ Shape Classes database where we show a reduction in the number of false positives and a decrees in detection time without reducing the detection rate.

ViCoS Eye - a webservice for visual object categorization

Mon, 01 Jan 0001 00:00:00 +0000

In our paper we present an architecture for a system capable of providing back-end support for webservice by running a variety of computer vision algorithms distributed across a cluster of machines. We divide the architecture into learning, real-time processing and a request handling for web-service. We implement learning in MapReduce domain with Hadoop jobs, while we implement real-time processing as a Storm application. An additional website and Android application front-end are implemented as part of web-service to provide user interface. We evaluate the system on our own cluster and show that the system running on a cluster of our size can learn Caltech-101 dataset in 40 minutes while real-time processing can achieve response time of 2 seconds, which is adequate for multitude of online applications.

ViCoS Eye - Spletna storitev za kategorizacijo vizualnih objektov

Mon, 01 Jan 0001 00:00:00 +0000

V članku predstavimo arhitekturo sistema za spletno storitev, ki omogoča poganjanje naprednih algoritmov računalniškega vida porazdeljenih preko večjega števila računalnikov. Arhitekturno sistem ločimo na učenje, tokovno procesiranje v realnem času in uporabniški vmesnik za spletno storitev. Učenje implementiramo v domeni MapReduce s pomočjo Hadoop poslov, medtem ko implementiramo realno-časovno procesiranje kot aplikacijo na sistemu Storm. Kot spletni vmesnik za končnega uporabnika dodatno implementiramo tudi spletno stran in Android aplikacijo. Sistem testiramo na naši gruči računalnikov in pokažemo, da se lahko slike iz podatkovne zbirke Caltech-101 naučimo v 40 minutah, medtem ko lahko tokovno procesiranje v realnem času obdela posamezno vhodno zahtevo v manj kot dveh sekundah.

Video segmentation of water scenes using semi supervised learning

Mon, 01 Jan 0001 00:00:00 +0000

Obstacle detection is a crucial component in unmanned surface vehicles to prevent collisions and unnecessary stopping due to false detections. Autonomous vessels are a relatively unexplored area in comparison to autonomous ground vehicles, thus there are much fewer densely annotated datasets for training modern obstacle detectors. Since manual acquisition of ground truth segmentation data is time-consuming and expensive, a viable alternative is training with minimal supervision to evaluate unsupervised domain adaptation methods, trained on a labeled source dataset and an un-labeled target dataset. Four modern adaptation methods are tested (Intra-domain adaptation, Fourier domain adaptation, Instance matching and Bidirectional learning) for training the semantic segmentation network WaSR, which is currently the state-of-the-art for maritime obstacle detection. We consider the original WaSR as well as a modified version. The Fourier domain adaptation applied to a modified WaSR version outperforms the non-adapted original WaSR by 6.3% in F-measure.

Visual object tracking performance measures revisited

Mon, 01 Jan 0001 00:00:00 +0000

The problem of visual tracking evaluation is sporting a large variety of performance measures, and largely suffers from lack of consensus about which measures should be used in experiments. This makes the cross-paper tracker comparison difficult. Furthermore, as some measures may be less effective than others, the tracking results may be skewed or biased towards particular tracking aspects. In this paper we revisit the popular performance measures and tracker performance visualizations and analyze them theoretically and experimentally. We show that several measures are equivalent from the point of information they provide for tracker comparison and, crucially, that some are more brittle than the others. Based on our analysis we narrow down the set of potential measures to only two complementary ones, describing accuracy and robustness, thus pushing towards homogenization of the tracker evaluation methodology. These two measures can be intuitively interpreted and visualized and have been employed by the recent Visual Object Tracking (VOT) challenges as the foundation for the evaluation methodology.

Visual re-identification across large, distributed camera networks

Mon, 01 Jan 0001 00:00:00 +0000

We propose a holistic approach to the problem of re-identification in an environment of distributed smart cameras. We model the re-identification process in a distributed camera network as a distributed multi-class classifier, composed of spatially distributed binary classifiers. We treat the problem of re-identification as an open-world problem, and address novelty detection and forgetting. As there are many tradeoffs in design and operation of such a system, we propose a set of evaluation measures to be used in addition to the recognition performance. The proposed concept is illustrated and evaluated on a new many-camera surveillance dataset and SAIVT-SoftBio dataset.

Vrednotenje učinkovitosti Kalmanovega filtra pri sledenju ljudi

Mon, 01 Jan 0001 00:00:00 +0000

Kalman filtering (KF) is a standard technique for estimating position and uncertainty of a moving object based on noisy measurements and knowledge of object dynamics. In this paper we apply the Kalman filter algorithm to estimate the motion parameters (position and speed) of a moving Peršon from a video stream. To assess the efficiency of KF tracking various experiments with and without KF were performed. The results showed that modeling of a Peršon motion and measurement noise using KF algorithm can considerably improve the tracking performance in cases of human interactions and occlusions.

WaSR -- A Water Segmentation and Refinement Maritime Obstacle Detection Network

Mon, 01 Jan 0001 00:00:00 +0000

Obstacle detection using semantic segmentation has become an established approach in autonomous vehicles. However, existing segmentation methods, primarily developed for ground vehicles, are inadequate in an aquatic environment as they produce many false positive (FP) detections in the presence of water reflections and wakes. We propose a novel deep encoder-decoder architecture, a water segmentation and refinement (WaSR) network, specifically designed for the marine environment to address these issues. A deep encoder based on ResNet101 with atrous convolutions enables the extraction of rich visual features, while a novel decoder gradually fuses them with inertial information from the inertial measurement unit (IMU). The inertial information greatly improves the segmentation accuracy of the water component in the presence of visual ambiguities, such as fog on the horizon. Furthermore, a novel loss function for semantic separation is proposed to enforce the separation of different semantic components to increase the robustness of the segmentation. We investigate different loss variants and observe a significant reduction in false positives and an increase in true positives (TP). Experimental results show that WaSR outperforms the current state-of-the-art by approximately 4% in F1-score on a challenging USV dataset. WaSR shows remarkable generalization capabilities and outperforms the state of the art by over 24% in F1 score on a strict domain generalization experiment.

Wide-angle camera distortions and non-uniform illumination in mobile robot tracking

Mon, 01 Jan 0001 00:00:00 +0000

In this paper some fundamentals and solutions to accompanying problems in vision system design for mobile robot tracking are presented. The main topics are correction of camera lens distortion and compensation of non-uniform illumination. Both correction methods contribute to vision system performance if implemented in the appropriate manner. Their applicability is demonstrated by applying them to vision for robot soccer. The lens correction method successfully corrects the distortion caused by the camera lens, thus achieving a more accurate and precise estimation of object position. The illumination compensation improves robustness to irregular and non-uniform illumination that is nearly always present in real conditions.