<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Article on ViCoS Lab</title>
    <link>/publications/by-type/article/</link>
    <description>Recent content in Article on ViCoS Lab</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <atom:link href="/publications/by-type/article/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Bayes-Spectral-Entropy-Based Measure of Camera Focus Using a Discrete Cosine Transform</title>
      <link>/publications/kristan2006a/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/kristan2006a/</guid>
      <description>&lt;p&gt;In this paper we present a novel measure of camera focus based on the Bayes spectral entropy of an image spectrum. In order to estimate the degree of focus, the image is divided into non-overlapping subimages of 8 by 8 pixels. Next, sharpness values are calculated separately for each sub-image and their mean is taken as a measure of the overall focus. The sub-image spectra are obtained by an 8×8 discrete cosine transform (DCT). Comparisons were made against four well-known measures that were chosen as reference, on images captured with a standard visible-light camera and a thermal camera. The proposed measure outperformed the reference measures by exhibiting a wider working range and a smaller failure rate. To assess its robustness to noise, additional tests were conducted with noisy images.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A Discriminative Single-Shot Segmentation Network for Visual Object Tracking</title>
      <link>/publications/lukezic2021a/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/lukezic2021a/</guid>
      <description>&lt;p&gt;Template-based discriminative trackers are currently the dominant tracking paradigm due to their robustness, but are restricted to bounding box tracking and a limited range of transformation models, which reduces their localization accuracy. We propose a discriminative single-shot segmentation tracker &amp;ndash; D3S2, which narrows the gap between visual object tracking and video object segmentation. A single-shot network applies two target models with complementary geometric properties, one invariant to a broad range of transformations, including non-rigid deformations, the other assuming a rigid object to simultaneously achieve robust online target segmentation. The overall tracking reliability is further increased by decoupling the object and feature scale estimation. Without per-dataset finetuning, and trained only for segmentation as the primary output, D3S2 outperforms all published trackers on the recent short-term tracking benchmark VOT2020 and performs very close to the state-of-the-art trackers on the GOT-10k, TrackingNet, OTB100 and LaSoT. D3S2 outperforms the leading segmentation tracker SiamMask on video  object segmentation benchmarks and performs on par with top video object segmentation algorithms.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A framework for visual-context-aware object detection in still images</title>
      <link>/publications/perko2010a/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/perko2010a/</guid>
      <description>&lt;p&gt;Visual context provides cues about an object&amp;rsquo;s presence, position and size within the observed scene, which should be used to increase the performance of object detection techniques. However, in computer vision, object detectors typically ignore this information. We therefore present a framework for visual-context-aware object detection. Methods for extracting visual contextual information from still images are proposed, which are then used to calculate a prior for object detection. The concept is based on a sparse coding of contextual features, which are based on geometry and texture. In addition, bottom-up saliency and object co-occurrences are exploited, to define auxiliary visual context. To integrate the individual contextual cues with a local appearance-based object detector, a fully probabilistic framework is established. In contrast to other methods, our integration is based on modeling the underlying conditional probabilities between the different cues, which is done via kernel density estimation. This integration is a crucial part of the framework which is demonstrated within the detailed evaluation. Our method is evaluated using a novel demanding image data set and compared to a state-of-the-art method for context-aware object detection. An in-depth analysis is given discussing the contributions of the individual contextual cues and the limitations of visual context for object detection.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A Local-motion-based probabilistic model for visual tracking</title>
      <link>/publications/kristan2009a/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/kristan2009a/</guid>
      <description>&lt;p&gt;Color-based tracking is prone to failure in situations where visually similar targets are moving in a close proximity or occlude each other. To deal with the ambiguities in the visual information, we propose an additional color-independent visual model based on the target&amp;rsquo;s local motion. This model is calculated from the optical flow induced by the target in consecutive images. By modifying a color-based particle filter to account for the target&amp;rsquo;s local motion, the combined color/local-motion-based tracker is constructed. We compare the combined tracker to a purely color-based tracker on a challenging dataset from hand tracking, surveillance and sports. The experiments show that the proposed local-motion model largely resolves situations when the target is occluded by, or moves in front of, a visually similar object.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A modular toolkit for visual tracking performance evaluation</title>
      <link>/publications/cehovin2020a/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/cehovin2020a/</guid>
      <description>&lt;p&gt;We present a modular software package for conducting single-target visual object tracking experiments and analyzing results. Our software supports many of the common usage patterns in visual tracking evaluation out of the box, but is also modular and allows various extensions. Users are able to integrate existing implementations of visual tracking algorithms with little additional effort using a standardized and flexible communication protocol. The software has been the technical backbone of the VOT Challenge initiative for many years and has grown and evolved with the competitions that it supported. We present its current state and the capabilities of the package and conclude with some plans for future development.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A New Dataset and a Distractor-Aware Architecture for Transparent Object Tracking</title>
      <link>/publications/lukezic2024a/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/lukezic2024a/</guid>
      <description>&lt;p&gt;Performance of modern trackers degrades substantially on transparent objects compared to opaque objects. This is largely due to two distinct reasons. Transparent objects are unique in that their appearance is directly affected by the background. Furthermore, transparent object scenes often contain many visually similar objects (distractors), which often lead to tracking failure. However, development of modern tracking architectures requires large training sets, which do not exist in transparent object tracking. We present two contributions addressing the aforementioned issues. We propose the first transparent object tracking training dataset Trans2k that consists of over 2k sequences with 104,343 images overall, annotated by bounding boxes and segmentation masks. Standard trackers trained on this dataset consistently improve by up to 16%. Our second contribution is a new distractor-aware transparent object tracker (DiTra) that treats localization accuracy and target identification as separate tasks and implements them by a novel architecture. DiTra sets a new state-of-the-art in transparent object tracking and generalizes well to opaque objects.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A Novel Performance Evaluation Methodology for Single-Target Trackers</title>
      <link>/publications/kristan2016a/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/kristan2016a/</guid>
      <description>&lt;p&gt;This paper addresses the problem of single-target tracker performance evaluation. We consider the performance measures, the dataset and the evaluation system to be the most important components of tracker evaluation and propose requirements for each of them. The requirements are the basis of a new evaluation methodology that aims at a simple and easily interpretable tracker comparison. The ranking-based methodology addresses tracker equivalence in terms of statistical significance and practical differences. A fully-annotated dataset with per-frame annotations with several visual attributes is introduced. The diversity of its visual properties is maximized in a novel way by clustering a large number of videos according to their visual attributes. This makes it the most sophistically constructed and annotated dataset to date. A multi-platform evaluation system allowing easy integration of third-party trackers is presented as well. The proposed evaluation methodology was tested on the VOT2014 challenge on the new dataset and 38 trackers, making it the largest benchmark to date. Most of the tested trackers are indeed state-of-the-art since they outperform the standard baselines, resulting in a highly-challenging benchmark. An exhaustive analysis of the dataset from the perspective of tracking difficulty is carried out. To facilitate tracker comparison a new performance visualization technique is proposed.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A segmentation-based approach for polyp counting in the wild</title>
      <link>/publications/zavrtanik2020a/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/zavrtanik2020a/</guid>
      <description>&lt;p&gt;We address the problem of jellyfish polyp counting in underwater images. Modern methods utilize convolutional neural networks for feature extraction and work in two stages. First, hypothetical regions are proposed at potential locations, the features of the regions are extracted and classified according to the contained object. Such methods typically require a dense grid for region proposals, explicitly test various scales and are prone to failure in densely populated regions. We propose a segmentation-based polyp counter – SegCo. A convolutional neural network is trained to produce locally-circular segmentation masks on the polyps, which are then detected by localizing circularly symmetric areas in the segmented image. Detection stage is effcient and avoids a greedy search over position and scales. SegCo outperforms the current state-of-the-art object detector RetinaNet and the recent specialized polyp detection method PoCo by 2% and 24% in F-score, respectively, and sets a new state-of-the-art in polyp detection.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A Trajectory-Based Analysis of Coordinated Team Activity in a Basketball Game</title>
      <link>/publications/perse2009a/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/perse2009a/</guid>
      <description></description>
    </item>
    <item>
      <title>A Two-Stage Dynamic Model for Visual Tracking</title>
      <link>/publications/kristan2010a/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/kristan2010a/</guid>
      <description>&lt;p&gt;We propose a new dynamic model which can be used within blob trackers to track the target&amp;rsquo;s center of gravity. A strong point of the model is that it is designed to track a variety of motions which are usually encountered in applications such as pedestrian tracking, hand tracking and sports. We call the dynamic model a two-stage dynamic model due to its particular structure, which is a composition of two models: a liberal model and a conservative model. The liberal model allows larger perturbations in the target&amp;rsquo;s dynamics and is able to account for motions in between the random-walk dynamics and the nearly-constant-velocity dynamics. On the other hand, the conservative model assumes smaller perturbations and is used to further constrain the liberal model to the target&amp;rsquo;s current dynamics. We implement the two-stage dynamic model in a two-stage probabilistic tracker based on the particle filter and apply it to two separate examples of blob tracking: (i) tracking entire persons and (ii) tracking of a person&amp;rsquo;s hands. Experiments show that, in comparison to the widely used models, the proposed two-stage dynamic model allows tracking with smaller number of particles in the particle filter (e.g., 25 particles), while achieving smaller errors in the state estimation and a smaller failure rate. The results suggest that the improved performance comes from the model&amp;rsquo;s ability to actively adapt to the target&amp;rsquo;s motion during tracking.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Adding discriminative power to a generative hierarchical compositional model using histograms of compositions</title>
      <link>/publications/tabernik2015adding/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/tabernik2015adding/</guid>
      <description>&lt;p&gt;In this paper we identify two types of problems with excessive feature sharing and the lack of discriminative learning in hierarchical compositional models: (a) similar category misclassifications and (b) phantom detections in background objects. We propose to overcome those issues by fully utilizing a discriminative features already present in the generative models of hierarchical compositions. We introduce descriptor called Histogram of Compositions to capture the information important for improving discriminative power and use it with a classifier to learn distinctive features important for successful discrimination. The generative model of hierarchical compositions is combined with the discriminative descriptor by performing hypothesis verification of detections produced by the hierarchical compositional model. We evaluate proposed descriptor on five datasets and show to improve the misclassification rate between similar categories as well as the misclassification rate of phantom detections on backgrounds. Additionally, we compare our approach against a state-of-the-art convolutional neural network and show to outperform it under significant occlusions.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Aktivno učenje in vzajemnost med učiteljem in učencem</title>
      <link>/publications/majnik2013aktivno/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/majnik2013aktivno/</guid>
      <description>&lt;p&gt;Osnovni cilj aktivnega učenja je doseči želeno uspešnost danega učnega algoritma s čim manjšim številom učnih primerov. Vzrok te težnje je v dejstvu, da je označevanje učnih primerov običajno drago zaradi količine časa in umskega napora človeškega označevalca. Vendar pa ima aktivno učenje pomanjkljivosti, kot so neupoštevanje stopnje učiteljevega poznavanja problema in pomanjkanje mehanizma za zagotavljanje razumljivosti aktivno izbranih učnih primerov. V tem članku predlagamo nov pristop k aktivnemu učenju, t. i. &amp;ldquo;vzajemno aktivno učenje&amp;rdquo;, ki umetnemu inteligentnemu učencu pomaga, da svojemu učitelju zastavi kar se da jasna in razumljiva vprašanja. Tovrstno učenje se izkaže za bolj zanesljivo in uspešno v primerjavi z osnovnim aktivnim učenjem.&lt;/p&gt;</description>
    </item>
    <item>
      <title>An Analysis Of Basketball Players&#39; Movements In The Slovenian Basketball League Play-Offs Using The Sagit Tracking System</title>
      <link>/publications/erculj2008an/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/erculj2008an/</guid>
      <description></description>
    </item>
    <item>
      <title>An integrated system for interactive continuous learning of categorical knowledge</title>
      <link>/publications/skocaj2016an-integrated/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/skocaj2016an-integrated/</guid>
      <description>&lt;p&gt;This article presents an integrated robot system capable of interactive learning in dialogue with a human. Such a system needs to have several competencies and must be able to process different types of representations. In this article, we describe a collection of mechanisms that enable integration of heterogeneous competencies in a principled way. Central to our design is the creation of beliefs from visual and linguistic information, and the use of these beliefs for planning system behaviour to satisfy internal drives. The system is able to detect gaps in its knowledge and to plan and execute actions that provide information needed to fill these gaps. We propose a hierarchy of mechanisms which are capable of engaging in different kinds of learning interactions, e.g. those initiated by a tutor or by the system itself. We present the theory these mechanisms are build upon and an instantiation of this theory in the form of an integrated robot system. We demonstrate the operation of the system in the case of learning conceptual models of objects and their visual properties.&lt;/p&gt;</description>
    </item>
    <item>
      <title>An integrated system for interactive continuous learning of categorical knowledge</title>
      <link>/publications/skocaj2016an/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/skocaj2016an/</guid>
      <description>&lt;p&gt;This article presents an integrated robot system capable of interactive learning in dialogue&#xA;with a human. Such a system needs to have several competencies and must be able to process&#xA;dierent types of representations. In this article we describe a collection of mechanisms that&#xA;enable integration of heterogeneous competencies in a principled way. Central to our design&#xA;is the creation of beliefs from visual and linguistic information, and the use of these beliefs&#xA;for planning system behaviour to satisfy internal drives. The system is able to detect gaps in&#xA;its knowledge and to plan and execute actions that provide information needed to ll these&#xA;gaps. We propose a hierarchy of mechanisms which are capable of engaging in dierent kinds&#xA;of learning interactions, e.g. those initiated by a tutor or by the system itself. We present the&#xA;theory these mechanisms are build upon and an instantiation of this theory in the form of an&#xA;integrated robot system. We demonstrate the operation of the system in the case of learning&#xA;conceptual models of objects and their visual properties.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Analysis of multi-agent activity using Petri nets</title>
      <link>/publications/perse2009analysis/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/perse2009analysis/</guid>
      <description>&lt;p&gt;This paper presents the use of Place/Transition Petri Nets (PNs) for the recognition and evaluation of complex multi-agent activities. The PNs were built automatically from the activity templates that are routinely used by experts to encode domain-specific knowledge. The PNs were built in such a way that they encoded the complex temporal relations between the individual activity actions. We extended the original PN formalism to handle the propagation of evidence using net tokens. The evaluation of the spatial and temporal properties of the actions was carried out using trajectory-based action detectors and probabilistic models of the action durations. The presented approach was evaluated using several examples of real basketball activities. The obtained experimental results suggest that this approach can be used to determine the type of activity that a team has performed as well as the stage at which the activity ended.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Application of the HIDRA2 deep-learning model for sea level forecasting along the Estonian coast of the Baltic Sea</title>
      <link>/publications/barzandeh2025/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/barzandeh2025/</guid>
      <description></description>
    </item>
    <item>
      <title>Automated detection and segmentation of cracks in concrete surfaces using joined segmentation and classification deep neural network</title>
      <link>/publications/tabernik2023automated/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/tabernik2023automated/</guid>
      <description>&lt;p&gt;Automated quality control of pavement and concrete surfaces is essential for maintaining structural integrity and consistency in the construction and infrastructure industries. This paper presents a novel deep learning model designed for automated quality control of these surfaces during both construction and maintenance phases. The model employs per-pixel segmentation and per-image classification, integrating both local and broader context information. Additionally, we utilize the classification results to improve segmentation during both training and inference stages. We evaluated the proposed model on a publicly available dataset containing more than 7,000 images of pavement and concrete cracks. The model achieved a Dice score of 81% and an intersection-over-union of 71%, surpassing publicly available state-of-the-art methods by at least 6-7 percentage points. An ablation study confirms that leveraging classification information enhances overall segmentation performance. Furthermore, our model is computationally efficient, processing over 30 FPS for 512x512 images, making it suitable for real-time applications on medium-resolution images. Upon acceptance, both the code and the corrected dataset ground truths will be made publicly available.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Be the Change You Want to See: Revisiting Remote Sensing Change Detection Practices</title>
      <link>/publications/rolih2025btc/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/rolih2025btc/</guid>
      <description>&lt;p&gt;Remote sensing change detection aims to localize semantic changes between images of the same location captured at different times. In the past few years, newer methods have attributed enhanced performance to the additions of new and complex components to existing architectures. Most fail to measure the performance contribution of fundamental design choices such as backbone selection, pre-training strategies, and training configurations. We claim that such fundamental design choices often improve performance even more significantly than the addition of new architectural components. Due to that, we systematically revisit the design space of change detection models and analyse the full potential of a well-optimised baseline. We identify a set of fundamental design choices that benefit both new and existing architectures. Leveraging this insight, we demonstrate that when carefully designed, even an architecturally simple model can match or surpass state-of-the-art performance on six challenging change detection datasets. Our best practices generalise beyond our architecture and also offer performance improvements when applied to related methods, indicating that the space of fundamental design choices has been underexplored. Our guidelines and architecture provide a strong foundation for future methods, emphasizing that optimizing core components is just as important as architectural novelty in advancing change detection performance.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Beyond monthly composites: maximizing information retention in satellite image time series for forest stand classification</title>
      <link>/publications/racic2025beyond/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/racic2025beyond/</guid>
      <description>&lt;p&gt;This study investigates the effectiveness of data pre-processing and classifier selection in forest stand classification using Satellite Image Time Series (SITS). We compare the performance of Random Forest (RF) and Light Gradient Boosting Machine (LightGBM) on monthly composites and dense time series. While the monthly RF achieves an average accuracy of 74.1%, the use of LightGBM results in lower performance on monthly composites. Our approach, which utilizes synthetic bands generated based on the available Sentinel−2 SITS, improved RF performance by 13.2 percentage points, exceeding the improvement observed when using 10-day composites. This highlights the loss of information that occurs when using composites. LightGBM improved the results by an additional 1.9 percentage points. However, without additional pre-processing, LightGBM can use the raw SITS and outperform these results with an F1 score of 0.906. The generated map was further improved by using margin values to highlight uncertainties and mask areas of uncertainty. Overall, while monthly composites provide a good starting point, the best results are obtained with raw SITS, which allows efficient processing for larger regions without additional pre-processing.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Center Direction Network for Grasping Point Localization on Cloths</title>
      <link>/publications/tabernik2024center/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/tabernik2024center/</guid>
      <description>&lt;p&gt;Object grasping is a fundamental challenge in robotics and computer vision, critical for advancing robotic manipulation capabilities. Deformable objects, like fabrics and cloths, pose additional challenges due to their non-rigid nature. In this work, we introduce CeDiRNet-3DoF, a deep-learning model for grasp point detection, with a particular focus on cloth objects. CeDiRNet-3DoF employs center direction regression alongside a localization network, attaining first place in the perception task of ICRA 2023&amp;rsquo;s Cloth Manipulation Challenge. Recognizing the lack of standardized benchmarks in the literature that hinder effective method comparison, we present the ViCoS {Towel} Dataset. This extensive benchmark dataset comprises 8,000 real and 12,000 synthetic images, serving as a robust resource for training and evaluating contemporary data-driven deep-learning approaches. Extensive evaluation revealed CeDiRNet-3DoF&amp;rsquo;s robustness in real-world performance, outperforming state-of-the-art methods, including the latest transformer-based models. Our work bridges a crucial gap, offering a robust solution and benchmark for cloth grasping in computer vision and robotics. Code and dataset are available at: &lt;a href=&#34;https://github.com/vicoslab/CeDiRNet-3DoF&#34;&gt;https://github.com/vicoslab/CeDiRNet-3DoF&lt;/a&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Closed-world tracking of multiple interacting targets for indoor-sports applications</title>
      <link>/publications/kristan2009closed-world/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/kristan2009closed-world/</guid>
      <description>&lt;p&gt;In this paper we present an efficient algorithm for tracking multiple players during indoor sports matches. A sports match can be considered as a semi-controlled environment for which a set of closed-world assumptions regarding the visual as well as the dynamical properties of the players and the court can be derived. These assumptions are then used in the context of particle filtering to arrive at a computationally fast, closed-world, multi-player tracker. The proposed tracker is based on multiple, single-player trackers, which are combined using a closed-world assumption about the interactions among players. With regard to the visual properties, the robustness of the tracker is achieved by deriving a novel sports-domain-specific likelihood function and employing a novel background-elimination scheme. The restrictions on the player&amp;rsquo;s dynamics are enforced by employing a novel form of local smoothing. This smoothing renders the tracking more robust and reduces the computational complexity of the tracker. We evaluated the proposed closed-world, multi-player tracker on a challenging data set. In comparison with several similar trackers that did not utilize all of the closed-world assumptions, the proposed tracker produced better estimates of position and prediction as well as reducing the number of failures.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Combining Reconstructive and Discriminative Subspace Methods for Robust Classification and Regression by Subsampling</title>
      <link>/publications/fidler2006combining/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/fidler2006combining/</guid>
      <description>&lt;p&gt;Linear subspace methods that provide sufficient reconstruction of the data such as PCA offer an efficient way of dealing with missing pixels, outliers, and occlusions that often appear in the visual data. Discriminative methods, such as LDA and CCA, which on the other hand, are better suited for classification and regression tasks, are highly sensitive to corrupted data. We present a theoretical framework for achieving best of both types of methods: an approach that combines the discrimination power of discriminative methods with the reconstruction property of reconstructive methods which enables one to work on subsets of pixels in images, to efficiently detect and reject the outliers. The proposed approach is therefore capable of robust classification/regression with a high-breakdown point. The theoretical results are demonstrated on several computer vision tasks showing that the proposed approach significantly outperforms the standard discriminative methods in the case of missing pixels and images containing occlusions and outliers.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Correcting decalibration of stereo cameras in self-driving vehicles</title>
      <link>/publications/muhovic2020correcting/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/muhovic2020correcting/</guid>
      <description></description>
    </item>
    <item>
      <title>CRITER 1.0: a coarse reconstruction with iterative refinement network for sparse spatio-temporal satellite data</title>
      <link>/publications/muc2025_gmd/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/muc2025_gmd/</guid>
      <description>&lt;p&gt;Satellite observations of sea surface temperature (SST) are essential for accurate weather forecasting and climate modeling. However, these data often suffer from incomplete coverage due to cloud obstruction and limited satellite swath width, which requires development of dense reconstruction algorithms. The current state of the art struggles to accurately recover high-frequency variability, particularly in SST gradients in ocean fronts, eddies, and filaments, which are crucial for downstream processing and predictive tasks. To address this challenge, we propose a novel two-stage method CRITER (Coarse Reconstruction with ITerative Refinement Network), which consists of two stages. First, it reconstructs low-frequency SST components utilizing a Vision Transformer-based model, leveraging global spatio-temporal correlations in the available observations. Second, a UNet type of network iteratively refines the estimate by recovering high-frequency details. Extensive analysis on datasets from the Mediterranean, Adriatic, and Atlantic seas demonstrates CRITER&amp;rsquo;s superior performance over the current state of the art. Specifically, CRITER achieves up to 44 % lower reconstruction errors of the missing values and over 80 % lower reconstruction errors of the observed values compared to the state of the art.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Deep Learning for Large-Scale Traffic-Sign Detection and Recognition</title>
      <link>/publications/tabernik2019deep/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/tabernik2019deep/</guid>
      <description>&lt;p&gt;Automatic detection and recognition of traffic signs plays a crucial role in management of the traffic-sign inventory. It provides accurate and timely way to manage traffic-sign inventory with a minimal human effort. In the computer vision community the recognition and detection of traffic signs is a well-researched problem. A vast majority of existing approaches perform well on traffic signs needed for advanced drivers-assistance and autonomous systems. However, this represents a relatively small number of all traffic signs (around 50 categories out of several hundred) and performance on the remaining set of traffic signs, which are required to eliminate the manual labor in traffic-sign inventory management, remains an open question. In this paper, we address the issue of detecting and recognizing a large number of traffic-sign categories suitable for automating traffic-sign inventory management. We adopt a convolutional neural network (CNN) approach, the Mask R-CNN, to address the full pipeline of detection and recognition with automatic end-to-end learning. We propose several improvements that are evaluated on the detection of traffic signs and result in an improved overall performance. This approach is applied to detection of 200 traffic-sign categories represented in our novel dataset. Results are reported on highly challenging traffic-sign categories that have not yet been considered in previous works. We provide comprehensive analysis of the deep learning method for the detection of traffic signs with large intra-category appearance variation and show below 3% error rates with the proposed approach, which is sufficient for deployment in practical applications of traffic-sign inventory management.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Deep reinforcement learning for map-less goal-driven robot navigation</title>
      <link>/publications/dobrevski2021deep/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/dobrevski2021deep/</guid>
      <description>&lt;p&gt;Mobile robots that operate in real-world environments need to be able to safely navigate their surroundings. Obstacle avoidance and path planning are crucial capabilities for achieving autonomy of such systems. However, for new or dynamic environments, navigation methods that rely on an explicit map of the environment can be impractical or even impossible to use. We present a new local navigation method for steering the robot to global goals without relying on an explicit map of the environment. The proposed navigation model is trained in a deep reinforcement learning framework based on Advantage Actor–Critic method and is able to directly translate robot observations to movement commands. We evaluate and compare the proposed navigation method with standard map-based approaches on several navigation scenarios in simulation and demonstrate that our method is able to navigate the robot also without the map or when the map gets corrupted, while the standard approaches fail. We also show that our method can be directly transferred to a real robot.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Deformable Parts Correlation Filters for Robust Visual Tracking</title>
      <link>/publications/lukezic2017deformable/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/lukezic2017deformable/</guid>
      <description>&lt;p&gt;Deformable parts models show a great potential in tracking by principally addressing non-rigid object deformations and self occlusions, but according to recent benchmarks, they often lag behind the holistic approaches. The reason is that potentially large number of degrees of freedom have to be estimated for object localization and simplifications of the constellation topology are often assumed to make the inference tractable. We present a new formulation of the constellation model with correlation filters that treats the geometric and visual constraints within a single convex cost function and derive a highly efficient optimization for MAP inference of a fully-connected constellation. We propose a tracker that models the object at two levels of detail. The coarse level corresponds a root correlation filter and a novel color model for approximate object localization, while the mid-level representation is composed of the new deformable constellation of correlation filters that refine the object location. The resulting tracker is rigorously analyzed on a highly challenging OTB, VOT2014 and VOT2015 benchmarks, exhibits a state-of-the-art performance and runs in real-time.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Dense Center-Direction Regression for Object Counting and Localization with Point Supervision</title>
      <link>/publications/tabernik2024dense/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/tabernik2024dense/</guid>
      <description>&lt;p&gt;Object counting and localization problems are commonly addressed with point supervised learning, which allows the use of less labor-intensive point annotations. However, learning based on point annotations poses challenges due to the high imbalance between the sets of annotated and unannotated pixels, which is often treated with Gaussian smoothing of point annotations and focal loss. However, these approaches still focus on the pixels in the immediate vicinity of the point annotations and exploit the rest of the data only indirectly. In this work, we propose a novel approach termed CeDiRNet for point-supervised learning   hat uses a dense regression of directions pointing owards the nearest object centers, i.e. center-directions. This provides greater support for each center point arising from many surrounding pixels pointing towards the object center. We propose a formulation of center-directions that allows the problem to be split into the domain-specific dense regression of center-directions and the final localization task based on a small, lightweight, and domain-agnostic localization network that can be trained with synthetic data completely independent of the target domain. We demonstrate the performance of the proposed method on six different datasets for object counting and localization, and show that it outperforms the existing state-of-the-art methods. Keywords: Point-Supervision, Object Counting, Object Localization, Center-Point Prediction, Center-Direction Regression, CeDiRNet&lt;/p&gt;</description>
    </item>
    <item>
      <title>Detection of surface defects on pharmaceutical solid oral dosage forms with convolutional neural networks</title>
      <link>/publications/racki2021detection/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/racki2021detection/</guid>
      <description>&lt;p&gt;Deep-learning-based approaches have proven to outperform other approaches in various computer vision tasks, making application-focused machine learning a promising area of research in automated visual inspection. In this work, we apply deep learning to the challenging real-world problem domain of automated visual inspection of pharmaceutical products. We focus on investigating whether compact network architectures, adhering to performance, resource, and accuracy requirements, are suitable for usage in the pharmaceutical visual inspection domain. We propose a compact and efficient convolutional neural network architecture design for segmentation and scoring of surface defects, which we evaluate on challenging real-world datasets from the pharmaceutical product-inspection domain. In comparison with other related segmentation approaches, we achieve state-of-the-art performance in terms of defect detection as well as real-time computational efficiency. Compared to the nearest best-performing architecture we achieve state-of-the-art performance with merely 3% of the parameter count, an approximately 8-fold increase in inference speed, and increased surface defect detection performance.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Discriminative Correlation Filter Tracker with Channel and Spatial Reliability</title>
      <link>/publications/lukezic2018discriminative/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/lukezic2018discriminative/</guid>
      <description>&lt;p&gt;Short-term tracking is an open and challenging  problem for which discriminative correlation filters (DCF) have shown  excellent performance. We introduce the channel and spatial reliability concepts to DCF tracking and provide a learning algorithm for its efficient and seamless integration in the filter update and the tracking process. The spatial reliability map adjusts the  filter support to the part of the object suitable for tracking. This both allows to enlarge the search region and  improves tracking of non-rectangular objects.   Reliability scores reflect channel-wise quality of the learned filters and are used as feature weighting coefficients in localization. Experimentally,  with only two simple standard  feature sets, HoGs and Colornames, the novel CSR-DCF method &amp;ndash; DCF with Channel and Spatial Reliability &amp;ndash; achieves state-of-the-art results on VOT 2016, VOT 2015 and OTB100. The CSR-DCF runs close to real-time on a CPU.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Effects of rule changes on physical demands and shot characteristics of elite-standard men’s squash and implications for training</title>
      <link>/publications/murray2016effects/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/murray2016effects/</guid>
      <description></description>
    </item>
    <item>
      <title>Efficient Feature Distribution for Object Matching in Visual-Sensor Networks</title>
      <link>/publications/sulic2011efficient/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/sulic2011efficient/</guid>
      <description>&lt;p&gt;In this paper, we propose a framework of hierarchical feature distribution for object matching in a network of visual sensors. In our approach, we hierarchically distribute the information in such a way that each individual node maintains only a small amount of information about the objects seen by the network. Nevertheless, this amount is sufficient to efficiently route queries through the network without any degradation of the matching performance. A set of requirements that have to be fulfilled by the object-matching method to be used in such a framework is defined. We provide examples of mapping four well-known, object-matching methods to a hierarchical feature-distribution scheme. The proposed approach was tested on a standard COIL-100 image database and in a basic surveillance scenario using our own distributed network simulator. The results show that the amount of data transmitted through the network can be significantly reduced in comparison to naive feature-distribution schemes such as flooding.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Empirical evaluation of feature selection methods in classification</title>
      <link>/publications/cehovin2010empirical/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/cehovin2010empirical/</guid>
      <description></description>
    </item>
    <item>
      <title>eWaSR — An Embedded-Compute-Ready Maritime Obstacle Detection Network</title>
      <link>/publications/tersek2023ewasr/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/tersek2023ewasr/</guid>
      <description>&lt;p&gt;Maritime obstacle detection is critical for safe navigation of autonomous surface vehicles (ASVs). While the accuracy of image-based detection methods has advanced substantially, their computational and memory requirements prohibit deployment on embedded devices. In this paper, we analyze the current best-performing maritime obstacle detection network, WaSR. Based on the analysis, we then propose replacements for the most computationally intensive stages and propose its embedded-compute-ready variant, eWaSR. In particular, the new design follows the most recent advancements of transformer-based lightweight networks. eWaSR achieves comparable detection results to state-of-the-art WaSR with only a 0.52% F1 score performance drop and outperforms other state-of-the-art embedded-ready architectures by over 9.74% in F1 score. On a standard GPU, eWaSR runs 10× faster than the original WaSR (115 FPS vs. 11 FPS). Tests on a real embedded sensor OAK-D show that, while WaSR cannot run due to memory restrictions, eWaSR runs comfortably at 5.5 FPS. This makes eWaSR the first practical embedded-compute-ready maritime obstacle detection network. The source code and trained eWaSR models are publicly available.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Fast image-based obstacle detection from unmanned surface vehicles</title>
      <link>/publications/kristan2015fast/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/kristan2015fast/</guid>
      <description>&lt;p&gt;Obstacle detection plays an important role in unmanned surface vehicles (USV). The USVs operate in highly diverse environments in which an obstacle may be a floating piece of wood, a scuba diver, a pier, or a part of a shoreline, which presents a significant challenge to continuous detection from images taken onboard. This paper addresses the problem of online detection by constrained unsupervised segmentation. To this end, a new graphical model is proposed that affords a fast and continuous obstacle image-map estimation from a single video stream captured onboard a USV. The model accounts for the semantic structure of marine environment as observed from USV by imposing weak structural constraints. A Markov random field framework is adopted and a highly efficient algorithm for simultaneous optimization of model parameters and segmentation mask estimation is derived. Our approach does not require computationally intensive extraction of texture features and comfortably runs in real-time. The algorithm is tested on a new, challenging, dataset for segmentation and obstacle detection in marine environments, which is the largest annotated dataset of its kind. Results on this dataset show that our model outperforms the related approaches, while requiring a fraction of computational effort.&lt;/p&gt;</description>
    </item>
    <item>
      <title>HIDRA 1.0: deep-learning-based ensemble sea level forecasting in the northern Adriatic</title>
      <link>/publications/zust2021hidra/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/zust2021hidra/</guid>
      <description>&lt;p&gt;Interactions between atmospheric forcing, topographic constraints to air and water flow, and resonant character of the basin make sea level modelling in the Adriatic a challenging problem. In this study we present an ensemble deep-neural-network-based sea level forecasting method HIDRA, which outperforms our set-up of the general ocean circulation model ensemble (NEMO v3.6) for all forecast lead times and at a minuscule fraction of the numerical cost (order of 2×10−6). HIDRA exhibits larger bias but lower RMSE than our set-up of NEMO over most of the residual sea level bins. It introduces a trainable atmospheric spatial encoder and employs fusion of atmospheric and sea level features into a self-contained network which enables discriminative feature learning. HIDRA architecture building blocks are experimentally analysed in detail and compared to alternative approaches. Results show the importance of sea level input for forecast lead times below 24 h and the importance of atmospheric input for longer lead times. The best performance is achieved by considering the input as the total sea level, split into disjoint sets of tidal and residual signals. This enables HIDRA to optimize the prediction fidelity with respect to atmospheric forcing while compensating for the errors in the tidal model. HIDRA is trained and analysed on a 10-year (2006–2016) time series of atmospheric surface fields from a single member of ECMWF atmospheric ensemble. In the testing phase, both HIDRA and NEMO ensemble systems are forced by the ECMWF atmospheric ensemble. Their performance is evaluated on a 1-year (2019) hourly time series from a tide gauge in Koper (Slovenia). Spectral and continuous wavelet analysis of the forecasts at the semi-diurnal frequency (12 h)−1 and at the ground-state basin seiche frequency (21.5 h)−1 is performed. The energy at the basin seiche in the HIDRA forecast is close to that observed, while our set-up of NEMO underestimates it. Analyses of the January 2015 and November 2019 storm surges indicate that HIDRA has learned to mimic the timing and amplitude of basin seiches.&lt;/p&gt;</description>
    </item>
    <item>
      <title>HIDRA-D: deep-learning model for dense sea level forecasting using sparse altimetry and tide gauge data</title>
      <link>/publications/rus2026hidrad/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/rus2026hidrad/</guid>
      <description>&lt;p&gt;This paper introduces HIDRA-D, a novel deep-learning model for basin scale dense (gridded) sea level prediction using sparse satellite altimetry and in situ tide gauge data. Accurate sea level prediction is crucial for coastal risk management, marine operations, and sustainable development. While traditional numerical ocean models are computationally expensive, especially for probabilistic forecasts over many ensemble members, HIDRA-D offers a faster, numerically cheaper, observation-driven alternative. Unlike previous HIDRA models (HIDRA1, HIDRA2 and HIDRA3) that focused on point predictions at tide gauges, HIDRA-D provides dense, two-dimensional, gridded sea level forecasts. The core innovation lies in a new algorithm that effectively leverages sparse and unevenly distributed satellite altimetry data in combination with tide gauge observations, to learn the complex basin-scale dynamics of sea level. HIDRA-D achieves this by integrating a HIDRA3 module for point predictions at tide gauges with a novel Dense decoder module, which generates low-frequency spatial components of the sea level field in the Fourier domain, whose Fourier inverse is an hourly sea level forecast over a 3 d horizon. When comparing 3 d forecasts against satellite absolute dynamic topography (ADT) data in the Adriatic, HIDRA-D achieves a 28.0 % reduction in mean absolute error relative to the NEMO general circulation model. However, while HIDRA-D performs well in open waters, leave-one-out cross-validation at tide gauges indicates limitations in areas with complex bathymetry, such as the Neretva estuary located in a narrow bay, and in regions with sparse satellite ADT data, like the northern Adriatic. Importantly, the model shows robustness to spatially-limited tide gauge coverage, maintaining acceptable performance even when trained using data from distant stations. This suggests its potential for broader applicability in areas with limited in situ observations.&lt;/p&gt;</description>
    </item>
    <item>
      <title>HIDRA2: deep-learning ensemble sea level and storm tide forecasting in the presence of seiches – the case of the northern Adriatic</title>
      <link>/publications/rus2023hidra2/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/rus2023hidra2/</guid>
      <description>&lt;p&gt;We propose a new deep-learning architecture HIDRA2 for sea level and storm tide modeling, which is extremely fast to train and apply and outperforms both our previous network design HIDRA1 and two state-of-the-art numerical ocean models (a NEMO engine with sea level data assimilation and a SCHISM ocean modeling system), over all sea level bins and all forecast lead times. The architecture of HIDRA2 employs novel atmospheric, tidal and sea surface height (SSH) feature encoders as well as a novel feature fusion and SSH regression block. HIDRA2 was trained on surface wind and pressure fields from a single member of the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric ensemble and on Koper tide gauge observations. An extensive ablation study was performed to estimate the individual importance of input encoders and data streams. Compared to HIDRA1, the overall mean absolute forecast error is reduced by 13 %, while in storm events it is lower by an even larger margin of 25 %. Consistent superior performance over HIDRA1 as well as over general circulation models is observed in both tails of the sea level distribution: low tail forecasting is relevant for marine traffic scheduling to ports of the northern Adriatic, while high tail accuracy helps coastal flood response. Power spectrum analysis indicates that HIDRA2 most accurately represents the energy density peak centered on the ground state sea surface eigenmode (seiche) and comes a close second to SCHISM in the energy band of the first excited eigenmode. To assign model errors to specific frequency bands covering diurnal and semi-diurnal tides and the two lowest basin seiches, spectral decomposition of sea levels during several historic storms is performed. HIDRA2 accurately predicts amplitudes and temporal phases of the Adriatic basin seiches, which is an important forecasting benefit due to the high sensitivity of the Adriatic storm tide level to the temporal lag between peak tide and peak seiche.&lt;/p&gt;</description>
    </item>
    <item>
      <title>HIDRA3: a deep-learning model for multipoint ensemble sea level forecasting in the presence of tide gauge sensor failures</title>
      <link>/publications/rus2025hidra3/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/rus2025hidra3/</guid>
      <description>&lt;p&gt;Accurate modeling of sea level and storm surge dynamics with several days of temporal horizons is essential for effective coastal flood responses and the protection of coastal communities and economies. The classical approach to this challenge involves computationally intensive ocean models that typically calculate sea levels relative to the geoid, which must then be correlated with local tide gauge observations of sea surface height (SSH). A recently proposed deep-learning model, HIDRA2 (HIgh-performance Deep tidal Residual estimation method using Atmospheric data, version 2), avoids numerical simulations while delivering competitive forecasts. Its forecast accuracy depends on the availability of a sufficiently long history of recorded SSH observations used in training. This makes HIDRA2 less reliable for locations with less abundant SSH training data. Furthermore, since the inference requires immediate past SSH measurements as input, forecasts cannot be made during temporary tide gauge failures. We address the aforementioned issues using a new architecture, HIDRA3, that considers observations from multiple locations, shares the geophysical encoder across the locations, and constructs a joint latent state that is decoded into forecasts at individual locations. The new architecture brings several benefits: (i) it improves training at locations with scarce historical SSH data, (ii) it enables predictions even at locations with sensor failures, and (iii) it reliably estimates prediction uncertainties. HIDRA3 is evaluated by jointly training on 11 tide gauge locations along the Adriatic. Results show that HIDRA3 outperforms HIDRA2 and the Mediterranean basin Nucleus for European Modelling of the Ocean (NEMO) setup of the Copernicus Marine Environment Monitoring Service (CMEMS) by ∼ 15 % and ∼ 13 % mean absolute error (MAE) reductions at high SSH values, creating a solid new state of the art. The forecasting skill does not deteriorate even in the case of simultaneous failure of multiple sensors in the basin or when predicting solely from the tide gauges far outside the Rossby radius of a failed sensor. Furthermore, HIDRA3 shows remarkable performance with substantially smaller amounts of training data compared with HIDRA2, making it appropriate for sea level forecasting in basins with high regional variability in the available tide gauge data.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Histograms of optical flow for efficient representation of body motion</title>
      <link>/publications/pers2010histograms/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/pers2010histograms/</guid>
      <description></description>
    </item>
    <item>
      <title>Incremental and robust learning of subspace representations</title>
      <link>/publications/skocaj2008incremental/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/skocaj2008incremental/</guid>
      <description>&lt;p&gt;Learning is a fundamental capability of any cognitive system. To enable efficient operation of a cognitive agent in a real-world environment, visual learning has to be a continuous and robust process. In this article, we present a method for subspace learning, which takes these considerations into account. We present an incremental method, which sequentially updates the principal subspace considering weighted influence of individual images as well as individual pixels within an image. We further extend this approach to enable determination of consistencies in the input data and imputation of the inconsistent values using the previously acquired knowledge, resulting in a novel method for incremental, weighted, and robust subspace learning. We demonstrate the effectiveness of the proposed concept in several experiments on learning of object and background representations.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Joint calibration of a multimodal sensor system for autonomous vehicles</title>
      <link>/publications/muhovic2023joint/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/muhovic2023joint/</guid>
      <description>&lt;p&gt;Multimodal sensor systems require precise calibration if they are to be used in the field. Due to the difficulty of obtaining the corresponding features from different modalities, the calibration of such systems is an open problem. We present a systematic approach for calibrating a set of cameras with different modalities (RGB, thermal, polarization, and dual-spectrum near infrared) with regard to a LiDAR sensor using a planar calibration target. Firstly, a method for calibrating a single camera with regard to the LiDAR sensor is proposed. The method is usable with any modality, as long as the calibration pattern is detected. A methodology for establishing a parallax-aware pixel mapping between different camera modalities is then presented. Such a mapping can then be used to transfer annotations, features, and results between highly differing camera modalities to facilitate feature extraction and deep detection and segmentation methods.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Karhunen-Loeve Transform of a Set of Rotated Templates</title>
      <link>/publications/jogan2003karhunen-loeve/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/jogan2003karhunen-loeve/</guid>
      <description>&lt;p&gt;We propose a novel method for efficiently calculating the eigenvectors of uniformly rotated images of a set of templates. As we show, the images can be optimally approximated by a linear series of eigenvectors which can be calculated without actually decomposing the sample covariance matrix.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Keep DRÆMing: Discriminative 3D anomaly detection through anomaly simulation</title>
      <link>/publications/zavrtanik2024keep/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/zavrtanik2024keep/</guid>
      <description>&lt;p&gt;Recent surface anomaly detection methods rely on pretrained backbone networks for efficient anomaly detection. On standard RGB anomaly detection benchmarks these methods achieve excellent results but fail on 3D anomaly detection due to a lack of pretrained backbones that suit this domain. Additionally, there is a lack of industrial depth data that would enable the backbone network training that could be used in 3D anomaly detection models. Discriminative anomaly detection methods do not require pretrained networks and are trained using simulated anomalies. The process of simulating anomalies that fit the domain of industrial depth data is not trivial and is necessary for training discriminative methods. We propose a novel 3D anomaly simulation process that follows the natural characteristics of industrial depth data and generates diverse deformations, making it suitable for training discriminative anomaly detection methods. We demonstrate its effectiveness by adapting the DRÆM method to work on 3D anomaly detection, thus obtaining 3DRÆM, a strong discriminative 3D anomaly detection model. The proposed approach achieves excellent results on the MVTec3D anomaly detection benchmark where it achieves state-of-the-art results on both 3D and RGB+3D problem setups, significantly outperforming competing methods.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Learning part-based spatial models for laser-vision-based room categorization</title>
      <link>/publications/ursic2017learning/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/ursic2017learning/</guid>
      <description>&lt;p&gt;Room categorization, i.e., recognizing the functionality of a never before seen room, is a crucial capability for a household mobile robot. We present a new approach for room categorization that is based on 2D laser range data. The method is based on a novel spatial model consisting of mid-level parts that are built on top of a low-level part-based representation. The approach is then fused with a vision-based method for room categorization, which is also based on a spatial model consisting of mid-level visual-parts. In addition, we propose a new discriminative dictionary learning technique that is applied for part-dictionary selection in both laser-based and vision-based modalities. Finally, we present a comparative analysis between laser-based, vision-based, and laser-vision-fusion-based approaches in a uniform part-based framework that is evaluated on a large dataset with several categories of rooms from the domestic environments.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Learning with Weak Annotations for Robust Maritime Obstacle Detection</title>
      <link>/publications/zust2022learning-with/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/zust2022learning-with/</guid>
      <description>&lt;p&gt;Robust maritime obstacle detection is critical for safe navigation of autonomous boats and timely collision avoidance. The current state-of-the-art is based on deep segmentation networks trained on large datasets. However, per-pixel ground truth labeling of such datasets is labor-intensive and expensive. We propose a new scaffolding learning regime (SLR) that leverages weak annotations consisting of water edges, the horizon location, and obstacle bounding boxes to train segmentation-based obstacle detection networks, thereby reducing the required ground truth labeling effort by a factor of twenty. SLR trains an initial model from weak annotations and then alternates between re-estimating the segmentation pseudo-labels and improving the network parameters. Experiments show that maritime obstacle segmentation networks trained using SLR on weak annotations not only match but outperform the same networks trained with dense ground truth labels, which is a remarkable result. In addition to the increased accuracy, SLR also increases domain generalization and can be used for domain adaptation with a low manual annotation load. The SLR code and pre-trained models are freely available online.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Low-Cost Open-Source Robotic Platform for Education</title>
      <link>/publications/cehovin2023low-cost/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/cehovin2023low-cost/</guid>
      <description>&lt;p&gt;This article describes an open-source robotic manipulator platform aimed at different levels of STEM education and popularization. It presents the hardware that was used to make a suitable low-cost low-weight manipulator and an evaluation of its capabilities, as well as the software components that were developed to make the platform accessible at different levels of education and in various usage scenarios. Finally, the results of a comprehensive user evaluation study spanning over several years are presented. The system was tested in several different educational scenarios, ranging from a summer school for primary-school students to a university-level course. The results of the study show that the introduction of the system into the educational process improves the motivation as well as the acquired knowledge of the participants.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Mixed supervision for surface-defect detection: from weakly to fully supervised learning</title>
      <link>/publications/bozic2021mixed/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/bozic2021mixed/</guid>
      <description>&lt;p&gt;Deep-learning methods have recently started being employed for addressing surface-defect detection problems in industrial quality control. However, with a large amount of data needed for learning, often requiring high-precision labels, many industrial problems cannot be easily solved, or the cost of the solutions would significantly increase due to the annotation requirements. In this work, we relax heavy requirements of fully supervised learning methods and reduce the need for highly detailed annotations. By proposing a deep-learning architecture, we explore the use of annotations of different details ranging from weak (image-level) labels through mixed supervision to full (pixel-level) annotations on the task of surface-defect detection. The proposed end-to-end architecture is composed of two sub-networks yielding defect segmentation and classification results. The proposed method is evaluated on several datasets for industrial quality inspection: KolektorSDD, DAGM and Severstal Steel Defect. We also present a new dataset termed KolektorSDD2 with over 3000 images containing several types of defects, obtained while addressing a real-world industrial problem. We demonstrate state-of-the-art results on all four datasets. The proposed method outperforms all related approaches in fully supervised settings and also outperforms weakly-supervised methods when only image-level labels are available. We also show that mixed supervision with only a handful of fully annotated samples added to weakly labelled training images can result in performance comparable to the fully supervised model&amp;rsquo;s performance but at a significantly lower annotation cost.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Modeling binding and cross-modal learning in Markov logic networks</title>
      <link>/publications/vrecko2012modeling/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/vrecko2012modeling/</guid>
      <description>&lt;p&gt;Binding - the ability to combine two or more modal representations of the same entity into a single shared representation - is vital for every cognitive system operating in a complex environment. In order to successfully adapt to changes in a dynamic environment the binding mechanism has to be supplemented with cross-modal learning. In this paper we define the problems of high-level binding and cross-modal learning. By these definitions we model a binding mechanism in a Markov logic network and define its role in a cognitive architecture. We evaluate a prototype binding system off-line, using three different inference methods.&lt;/p&gt;</description>
    </item>
    <item>
      <title>MODS--A USV-Oriented Object Detection and Obstacle Segmentation Benchmark</title>
      <link>/publications/bovcon2021mods/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/bovcon2021mods/</guid>
      <description>&lt;p&gt;Small-sized unmanned surface vehicles (USV) are coastal water devices with a broad range of applications such as environmental control and surveillance. A crucial capability for autonomous operation is obstacle detection for timely reaction and collision avoidance, which has been recently explored in the context of camera-based visual scene interpretation. Owing to curated datasets, substantial advances in scene interpretation have been made in a related field of unmanned ground vehicles. However, the current maritime datasets do not adequately capture the complexity of real-world USV scenes and the evaluation protocols are not standardised, which makes cross-paper comparison of different methods difficult and hinders the progress. To address these issues, we introduce a new obstacle detection benchmark MODS, which considers two major perception tasks: maritime object detection and the more general maritime obstacle segmentation. We present a new diverse maritime evaluation dataset containing approximately 81k stereo images synchronized with an on-board IMU, with over 60k objects annotated. We propose a new obstacle segmentation performance evaluation protocol that reflects the detection accuracy in a way meaningful for practical USV navigation. Nineteen recent state-of-the-art object detection and obstacle segmentation methods are evaluated using the proposed protocol, creating a benchmark to facilitate development of the field. The proposed dataset, as well as evaluation routines, are made publicly available at vicos.si/resources.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Multi-Year Time Series Transfer Learning: Application of Early Crop Classification</title>
      <link>/publications/racic2024multi-year/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/racic2024multi-year/</guid>
      <description>&lt;p&gt;Crop classification is an important task in remote sensing with many applications, such as estimating yields, detecting crop diseases and pests, and ensuring food security. In this study, we combined knowledge from remote sensing, machine learning, and agriculture to investigate the application of transfer learning with a transformer model for variable length satellite image time series (SITS). The objective was to produce a map of agricultural land, reduce required interventions, and limit in-field visits. Specifically, we aimed to provide reliable agricultural land class predictions in a timely manner and quantify the necessary amount of reference parcels to achieve these outcomes. Our dataset consisted of Sentinel-2 satellite imagery and reference crop labels for Slovenia spanning over years 2019, 2020, and 2021. We evaluated adaptability through fine-tuning in a real-world scenario of early crop classification with limited up-to-date reference data. The base model trained on a different year achieved an average F1 score of 82.5% for the target year without having a reference from the target year. To increase accuracy with a new model trained from scratch, an average of 48,000 samples are required in the target year. Using transfer learning, the pre-trained models can be efficiently adapted to an unknown year, requiring less than 0.3% (1500) samples from the dataset. Building on this, we show that transfer learning can outperform the baseline in the context of early classification with only 9% of the data after 210 days in the year.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Multivariate Online Kernel Density Estimation with Gaussian Kernels</title>
      <link>/publications/kristan2011multivariate/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/kristan2011multivariate/</guid>
      <description>&lt;p&gt;We propose a novel approach to online estimation of probability density functions, which is based on kernel density estimation (KDE). The method maintains and updates a non-parametric model of the observed data, from which the KDE can be calculated. We propose an online bandwidth estimation approach and a compression/revitalization scheme which maintains the KDE&amp;rsquo;s complexity low. We compare the proposed online KDE to the state-of-the-art approaches on examples of estimating stationary and non-stationary distributions, and on examples of classification. The results show that the online KDE outperforms or achieves a comparable performance to the state-of-the-art and produces models with a significantly lower complexity while allowing online adaptation.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Obstacle Tracking for Unmanned Surface Vessels using 3D Point Cloud</title>
      <link>/publications/muhovic2019obstacle/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/muhovic2019obstacle/</guid>
      <description>&lt;p&gt;We present a method for detecting and tracking waterborne obstacles from an unmanned surface vehicle (USV) for the purpose of short-term obstacle avoidance. A stereo camera system provides a point cloud of the scene in front of the vehicle. The water surface is estimated by fitting a plane to the point cloud and outlying points are further processed to find potential obstacles. We propose a new plane fitting algorithm for water surface detection that applies a fast approximate semantic segmentation to filter the point cloud and utilizes an external IMU reading to constrain the plane orientation. A novel histogram-like depth appearance model is proposed to keep track of the identity of the detected obstacles through time and to filter out false detections, which negatively impact vehicle&amp;rsquo;s automatic guidance system. The improved plane fitting algorithm and the temporal verification using depth fingerprints result in notable improvement on the challenging MODD2 dataset, by significantly reducing the amount of false positive detections. The proposed method is able to run in real time on board of a small-sized USV, which was used to acquire the MODD2 dataset as well.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Online Discriminative Kernel Density Estimator With Gaussian Kernels</title>
      <link>/publications/kristan2013online/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/kristan2013online/</guid>
      <description>&lt;p&gt;We propose a new method for a supervised online estimation of probabilistic discriminative models for classification tasks. The method estimates the class distributions from a stream of data in form of Gaussian mixture models (GMM). The reconstructive updates of the distributions are based on the recently proposed online Kernel Density Estimator (oKDE). We maintain the number of components in the model low by compressing the GMMs from time to time. We propose a new cost function that measures loss of interclass discrimination during compression, thus guiding the compression towards simpler models that still retain discriminative properties. The resulting classifier thus independently updates the GMM of each class, but these GMMs interact during their compression through the proposed cost function. We call the proposed method the online discriminative Kernel Density Estimator (odKDE). We compare the odKDE to oKDE, batch state-of-the-art KDEs and batch/incremental support vector machines (SVM) on the publicly-available datasets. The odKDE achieves comparable classification performance to that of best batch KDEs and SVM, while allowing online adaptation from large datasets, and produces models of lower complexity than the oKDE.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Online Kernel Density Estimation For Interactive Learning</title>
      <link>/publications/kristan2009online/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/kristan2009online/</guid>
      <description>&lt;p&gt;In this paper we propose a Gaussian-kernel-based online kernel density estimation which can be used for applications of online probability density estimation and online learning. Our approach generates a Gaussian mixture model of the observed data and allows online adaptation from positive examples as well as from the negative examples. The adaptation from the negative examples is realized by a novel concept of unlearning in mixture models. Low complexity of the mixtures is maintained through a novel compression algorithm. In contrast to the existing approaches, our approach does not require fine-tuning parameters for a specific application, we do not assume specific forms of the target distributions and temporal constraints are not assumed on the observed data. The strength of the proposed approach is demonstrated with examples of online estimation of complex distributions, an example of unlearning, and with an interactive learning of basic visual concepts.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Performance Evaluation Methodology for Long-Term Single Object Tracking</title>
      <link>/publications/lukezic2020performance/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/lukezic2020performance/</guid>
      <description>&lt;p&gt;A long-term visual object tracking performance evaluation methodology and a benchmark are proposed. Performance measures are designed by following a long-term tracking definition to maximize the analysis probing strength. The new measures outperform existing ones in interpretation potential and in better distinguishing between different tracking behaviors. We show that these measures generalize the short-term performance measures, thus linking the two tracking problems. Furthermore, the new measures are highly robust to temporal annotation sparsity and allow annotation of sequences hundreds of times longer than in the current datasets without increasing manual annotation labor. A new challenging dataset of carefully selected sequences with many target disappearances is proposed. A new tracking taxonomy is proposed to position trackers on the short-term/long-term spectrum. The benchmark contains an extensive evaluation of the largest number of long-term trackers and comparison to state-of-the-art short-term trackers. We analyze the influence of tracking architecture implementations to long-term performance and explore various re-detection strategies as well as influence of visual model update strategies to long-term tracking drift. The methodology is integrated in the VOT toolkit to automate experimental analysis and benchmarking and to facilitate future development of long-term trackers.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Prepletanje umetne inteligence in fizike pri napovedovanju obalnih poplav</title>
      <link>/publications/licer2021prepletanje/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/licer2021prepletanje/</guid>
      <description>&lt;p&gt;Podnebne spremembe prek številnih mehanizmov povzročajo dvig srednje gladine globalnih oceanov, kar velja tudi za Slovensko morje. Modelske projekcije rasti globalne gladine morja predvidevajo, da bo do leta 2050 srednja gladina morja v Tržaškem zalivu najverjetneje narasla za 30 do 50 centimetrov, do konca stoletja pa za 40 do 100 cm. To pomeni, da naj bi do srede stoletja pogostost poplav narasla 10 do 20-krat, do konca stoletja pa naj bi bile poplave tudi do dvestokrat bolj pogoste. Napovedovanje poplav je zaradi specifike jadranskega bazena izredno zahtevno, saj vključuje simulacijo razvoja atmofserskefa modela in modela morja. V članku razložimo pristop k napovedovanju z globoko nevronsko mrežo, ki dosega ali presega natančnost napovedi fizikalnega modela.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Reconstruction by inpainting for visual anomaly detection</title>
      <link>/publications/zavrtanik2021reconstruction/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/zavrtanik2021reconstruction/</guid>
      <description>&lt;p&gt;Visual anomaly detection addresses the problem of classification or localization of regions in an image that deviate from their normal appearance. A popular approach trains an auto-encoder on anomaly-free images and performs anomaly detection by calculating the difference between the input and the reconstructed image. This approach assumes that the auto-encoder will be unable to accurately reconstruct anomalous regions. But in practice neural networks generalize well even to anomalies and reconstruct them sufficiently well, thus reducing the detection capabilities. Accurate reconstruction is far less likely if the anomaly pixels were not visible to the auto-encoder. We thus cast anomaly detection as a self-supervised reconstruction-by-inpainting problem. Our approach (RIAD) randomly removes partial image regions and reconstructs the image from partial inpaintings, thus addressing the drawbacks of auto-enocoding methods. RIAD is extensively evaluated on several benchmarks and sets a new state-of-the art on a recent highly challenging anomaly detection benchmark.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Robust and efficient vision system for group of cooperating mobile robots with application to soccer robots</title>
      <link>/publications/klancar2004robust/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/klancar2004robust/</guid>
      <description>&lt;p&gt;In this paper a global vision scheme for estimation of positions and orientations of mobile robots is presented. It is applied to robot soccer application which is a fast dynamic game and therefore needs an efficient and robust vision system implemented. General applicability of the vision system can be found in other robot applications such as mobile transport robots in production, warehouses, attendant robots, fast vision tracking of targets of interest and entertainment robotics. Basic operation of the vision system is divided into two steps. In the first, the incoming image is scanned and pixels are classified into a finite number of classes. At the same time, a segmentation algorithm is used to find corresponding regions belonging to one of the classes. In the second step, all the regions are examined. Selection of the ones that are a part of the observed object is made by means of simple logic procedures. The novelty is focused on optimization of the processing time needed to finish the estimation of possible object positions. Better results of the vision system are achieved by implementing camera calibration and shading correction algorithm. The former corrects camera lens distortion, while the latter increases robustness to irregular illumination conditions.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Robust Localization using an Omnidirectional Appearance-based Subspace Model of Environment</title>
      <link>/publications/jogan2003robust/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/jogan2003robust/</guid>
      <description>&lt;p&gt;Appearance-based visual learning and recognition techniques that are based on models derived from a training set of 2D images are being widely used in computer vision applications. In robotics, they have received most attention in visual servoing and navigation. In this paper we discuss a framework for visual self-localization of mobile robots using a parametric model built from panoramic snapshots of the environment. In particular, we propose solutions to the problems related to robustness against occlusions and invariance to the rotation of the sensor. Our principal contribution is an ``eigenspace of spinning-images&amp;rsquo;&amp;rsquo;, i.e., a model of the environment which successfully exploits some of the specific properties of panoramic images in order to efficiently calculate the optimal subspace in terms of principal components analysis (PCA) of a set of training snapshots without actually decomposing the covariance matrix. By integrating a robust recover-and-select algorithm for the computation of image parameters we achieve reliable localization even in the case when the input images are partly occluded or noisy. In this way, the robot is capable of localizing itself in realistic environments.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Robust Visual Tracking using an Adaptive Coupled-layer Visual Model</title>
      <link>/publications/cehovin2013robust/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/cehovin2013robust/</guid>
      <description>&lt;p&gt;This paper addresses the problem of tracking objects which undergo rapid and significant appearance changes. We propose a novel coupled-layer visual model that combines the target&amp;rsquo;s global and local appearance by interlacing two layers. The local layer in this model is a set of local patches that geometrically constrain the changes in the target&amp;rsquo;s appearance. This layer probabilistically adapts to the target&amp;rsquo;s geometric deformation, while its structure is updated by removing and adding the local patches. The addition of these patches is constrained by the global layer that probabilistically models target&amp;rsquo;s global visual properties such as color, shape and apparent local motion. The global visual properties are updated during tracking using the stable patches from the local layer. By this coupled constraint paradigm between the adaptation of the global and the local layer, we achieve a more robust tracking through significant appearance changes. We experimentally compare our tracker to eleven state-of-the-art trackers. The experimental results on challenging sequences confirm that our tracker outperforms the related trackers in many cases by having smaller failure rate as well as better accuracy. Furthermore, the parameter analysis shows that our tracker is stable over a range of parameter values.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Robustno vizualno učenje na podlagi podprostorov</title>
      <link>/publications/skocaj2004robustno/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/skocaj2004robustno/</guid>
      <description>&lt;p&gt;Vizualno uèenje, tj. uèenje iz vizualnih podatkov, mora biti robusten in kontinuiran proces. Vsi razpoložljivi vizualni podatki namreè niso enako pomembni; v primeru prekrivanj in drugih nezaželenih motenj v vidnem polju so lahko nekateri celo zavajajoèi. Èloveški vizualni sistem obravnava vizualne podatke selektivno in zgradi uèinkovite predstavitve opazovanih predmetov in prizorov tudi v slabih pogojih. Te predstavitve lahko nato še posodablja z na novo pridobljenimi informacijami in jih tako prilagaja spremembam. V tem èlanku predstavljamo veè metod, ki uvedejo podobne principe tudi na podroèje strojnega vizualnega uèenja in razpoznavanja. Vizualno uèenje je realizirano z modeliranjem osnovanim na direktnem videzu predmetov in prizorov. Gradnja modelov temelji na metodi glavnih komponent (PCA), ki pa ima v svoji standardni izvedbi pomanjkljivosti, ki onemogoèajo uveljavitev prej omenjenih naèel. Za premostitev teh pomanjkljivosti smo predlagali veè razširitev standardne metode glavnih komponent, tj. metode za inkrementalno, uteženo in robustno uèenje. Predlagane metode smo tudi ovrednotili na razliènih slikovnih domenah. Iz rezultatov je razvidna uporabnost metod za vizualno uèenje in razpoznavanje v razliènih primerih.&lt;/p&gt;</description>
    </item>
    <item>
      <title>ROC analysis of classifiers in machine learning : a survey</title>
      <link>/publications/majnik2013roc/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/majnik2013roc/</guid>
      <description>&lt;p&gt;The use of ROC (Receiver Operating Characteristics) analysis as a tool for evaluating the performance of classification models in machine learning has been increasing in the last decade. Among the most notable advances in this area are the extension of two-class ROC analysis to the multi-class case as well as the employment of ROC analysis in cost-sensitive learning. Methods now exist which take instance-varying costs into account. The purpose of our paper is to present a survey of this field with the aim of gathering important achievements in one place. In the paper, we present application areas of the ROC analysis in machine learning, describe its problems and challenges and provide a summarized list of alternative approaches to ROC analysis. In addition to presented theory, we also provide a couple of examples intended to illustrate the described approaches.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Room Categorization Based on a Hierarchical Representation of Space</title>
      <link>/publications/ursic2013room/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/ursic2013room/</guid>
      <description>&lt;p&gt;For successful operation in real-world environments, a mobile robot requires an effective spatial model. The model should be compact, should possess large expressive power and should scale well with respect to the number of modelled categories. In this paper we propose a new compositional hierarchical representation of space that is based on learning statistically significant observations, in terms of the frequency of occurrence of various shapes in the environment. We have focused on a two-dimensional space, since many robots perceive their surroundings in two dimensions with the use of a laser range finder or sonar. We also propose a new low-level image descriptor, by which we demonstrate the performance of our representation in the context of a room categorization problem. Using only the lower layers of the hierarchy, we obtain state-of-the-art categorization results in two different experimental scenarios. We also present a large, freely available, dataset, which is intended for room categorization experiments based on data obtained with a laser range finder.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Segmentation-Based Deep-Learning Approach for Surface-Defect Detection</title>
      <link>/publications/tabernik2020segmentation-based/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/tabernik2020segmentation-based/</guid>
      <description>&lt;p&gt;Automated surface-anomaly detection using machine learning has become an interesting and promising area of research, with a very high and direct impact on the application domain of visual inspection. Deep-learning methods have become the most suitable approaches for this task. They allow the inspection system to learn to detect the surface anomaly by simply showing it a number of exemplar images. This paper presents a segmentation-based deep-learning architecture that is designed for the detection and segmentation of surface anomalies and is demonstrated on a specific domain of surface-crack detection. The design of the architecture enables the model to be trained using a small number of samples, which is an important requirement for practical applications. The proposed model is compared with the related deep-learning methods, including the state-of-the-art commercial software, showing that the proposed approach outperforms the related methods on the specific domain of surface-crack detection. The large number of experiments also shed light on the required precision of the annotation, the number of required training samples and on the required computational cost. Experiments are performed on a newly created dataset based on a real-world quality control case and demonstrates that the proposed approach is able to learn on a small number of defected surfaces, using only approximately 25-30 defective training samples, instead of hundreds or thousands, which is usually the case in deep-learning applications. This makes the deep-learning method practical for use in industry where the number of available defective samples is limited. The dataset is also made publicly available to encourage the development and evaluation of new methods for surface-defect detection.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Selecting features for object detection using an AdaBoost-compatible evaluation function</title>
      <link>/publications/furst2008selecting/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/furst2008selecting/</guid>
      <description>&lt;p&gt;This paper addresses the problem of selecting features in a visual object detection setup where a detection algorithm is applied to an input image represented by a set of features. The set of features to be employed in the test stage is prepared in two training-stage steps. In the first step, a feature extraction algorithm produces a (possibly large) initial set of features. In the second step, on which this paper focuses, the initial set is reduced using a selection procedure. The proposed selection procedure is based on a novel evaluation function that measures the utility of individual features for a certain detection task. Owing to its design, the evaluation function can be seamlessly embedded into an AdaBoost selection framework. The developed selection procedure is integrated with state-of-the-art feature extraction and object detection methods. The presented system was tested on five challenging detection setups. In three of them, a fairly high detection accuracy was effected by as few as six features selected out of several hundred initial candidates.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Self-understanding and self-extension: a systems and representational approach</title>
      <link>/publications/wyatt2010self-understanding/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/wyatt2010self-understanding/</guid>
      <description>&lt;p&gt;There are many different approaches to building a system that can engage in autonomous mental development. In this paper we present an approach based on what we term \em self-understanding, by which we mean the use of explicit representation of and reasoning about what a system does and doesn&amp;rsquo;t know, and how that understanding changes under action. We present a coherent architecture and a set of representations used in two robot systems that exhibit a limited degree of autonomous mental development, what we term \em self-extension. The contributions include: representations of gaps and uncertainty for specific kinds of knowledge, and a motivational and planning system for setting and achieving learning goals.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Sledenje več igralcev v športnih igrah na podlagi vizualne informacije</title>
      <link>/publications/kristan2007sledenje/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/kristan2007sledenje/</guid>
      <description>&lt;p&gt;V članku je predstavljen sledilnik za sledenje več igralcev v dvoranskih športnih igrah, kot sta rokomet in košarka, na podlagi vizualne informacije, pridobljene s kamero, nameščeno nad igriščem. Sledenje posameznega igralca je postavljeno v kontekst Bayesovega filtriranja za rekurzivno ocenjevanje a posteriori porazdelitve stanja tarče in temelji na metodah filtrov z delci. V članku sta obdelana dva glavna dela sledilnika: sledilnik za sledenje posameznega igralca in mehanizem za sledenje več vizualno podobnih igralcev. V okviru slednjega mehanizma predlagamo originalno rešitev, kjer sliko v vsakem časovnem koraku razdelimo v take neprekrivajoče se regije, da vsaka vsebuje le po enega igralca, ter tako dosežemo poenostavitev problema sledenja več tarč, kadar med vizualno podobnimi tarčami prihaja do trkov. Predlagani sledilnik smo primerjali z nerobustnim sledilnikom, ki ni vseboval mehanizma za obvladovanje situacij, ko med tarčami prihaja do trkov. Ugotovili smo, da predlagani mehanizem zmanjša število potrebnih intervencij operaterja in tako omogoča robustno in hitro obdelavo velike količine videopodatkov.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Spatially-Adaptive Filter Units for Compact and Efficient Deep Neural Networks</title>
      <link>/publications/tabernik2020spatially-adaptive/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/tabernik2020spatially-adaptive/</guid>
      <description>&lt;p&gt;Convolutional neural networks excel in a number of computer vision tasks. One of their most crucial architectural elements is the effective receptive field size, which has to be manually set to accommodate a specific task. Standard solutions involve large kernels, down/up-sampling and dilated convolutions. These require testing a variety of dilation and down/up-sampling factors and result in non-compact networks and large number of parameters. We address this issue by proposing a new convolution filter composed of displaced aggregation units (DAU). DAUs learn spatial displacements and adapt the receptive field sizes of individual convolution filters to a given problem, thus reducing the need for hand-crafted modifications. DAUs provide a seamless substitution of convolutional filters in existing state-of-the-art architectures, which we demonstrate on AlexNet, ResNet50, ResNet101, DeepLab and SRN-DeblurNet. The benefits of this design are demonstrated on a variety of computer vision tasks and datasets, such as image classification (ILSVRC 2012), semantic segmentation (PASCAL VOC 2011, Cityscape) and blind image de-blurring (GOPRO). Results show that DAUs efficiently allocate parameters resulting in up to 4× more compact networks in terms of the number of parameters at similar or better performance.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Stereo obstacle detection for unmanned surface vehicles by IMU-assisted semantic segmentation</title>
      <link>/publications/bovcon2018stereo/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/bovcon2018stereo/</guid>
      <description>&lt;p&gt;A new obstacle detection algorithm for unmanned surface vehicles (USVs) is presented. A state-of-the-art graphical model for semantic segmentation is extended to incorporate boat pitch and roll measurements from the on-board inertial measurement unit (IMU), and a stereo verification algorithm that consolidates tentative detections obtained from the segmentation is proposed. The IMU readings are used to estimate the location of horizon line in the image, which automatically adjusts the priors in the probabilistic semantic segmentation model. We derive the equations for projecting the horizon into images, propose an efficient optimization algorithm for the extended graphical model, and offer a practical IMU–camera–USV calibration procedure. Using an USV equipped with multiple synchronized sensors, we captured a new challenging multi-modal dataset, and annotated its images with water edge and obstacles. Experimental results show that the proposed algorithm significantly outperforms the state of the art, with nearly 30% improvement in water-edge detection accuracy, an over 21% reduction of false positive rate, an almost 60% reduction of false negative rate, and an over 65% increase of true positive rate, while its Matlab implementation runs in real-time.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Towards automated scyphistoma census in underwater imagery : a useful research and monitoring too</title>
      <link>/publications/vodopivec2018towards-automated/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/vodopivec2018towards-automated/</guid>
      <description></description>
    </item>
    <item>
      <title>Towards automated scyphistoma census in underwater imagery: a useful research and monitoring tool</title>
      <link>/publications/vodopivec2018towards/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/vodopivec2018towards/</guid>
      <description>&lt;p&gt;Manual annotation and counting of entities in underwater photographs is common in many branches of marine biology. With a marked increase of jellyfish populations worldwide, understanding the dynamics of the polyp (scyphistoma) stage of their life-cycle is becoming increasingly important. In-situ studies of polyp population dynamics are scarce due to small size of the polyps and tedious manual work required to annotate and count large numbers of items in underwater photographs. We devised an experiment which shows a large variance between human annotators, as well as in annotations made by the same annotator. We have tackled this problem, which is present in many areas of marine biology, by developing a method for automated detection and counting. Our polyp counter (PoCo) uses a two-stage approach with a fast detector (Aggregated Channel Features) and a precise classifier consisting of a pre-trained Convolutional Neural Network and a Support Vector Machine. PoCo was tested on a year-long image dataset and performed with accuracy comparable to human annotators but with 70-fold reduction in time. The algorithm can be used in many marine biology applications, vastly reducing the amount of manual labor and enabling processing of much larger datasets. The source code is freely available on GitHub.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Towards commoditized smart-camera design</title>
      <link>/publications/murovec2013towards/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/murovec2013towards/</guid>
      <description>&lt;p&gt;We propose a set of design principles for a cost-effective embedded smart camera. Our aim is to alleviate the shortcomings of the existing designs, such as excessive reliance on battery power and wireless networking, over-emphasized focus on specific use cases, and use of specialized technologies. In our opinion, these shortcomings prevent widespread commercialization and adoption of embedded smart cameras, especially in the context of visual-sensor networks. The proposed principles lead to a distinctively different design, which relies on commoditized, standardized and widely-available components, tools and knowledge. As an example of using these principles in practice, we present a smart camera, which is inexpensive, easy to build and support, capable of high-speed communication and enables rapid transfer of computer-vision algorithms to the embedded world.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Towards the deep learning recognition of cultivated terraces based on Lidar data: The case of Slovenia</title>
      <link>/publications/ciglic2024towards/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/ciglic2024towards/</guid>
      <description></description>
    </item>
    <item>
      <title>Tracking by Identification Using Computer Vision and Radio</title>
      <link>/publications/mandeljc2013tracking/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/mandeljc2013tracking/</guid>
      <description>&lt;p&gt;We present a novel system for detection, localization and tracking of multiple people, which fuses a multi-view computer vision approach with a radio-based localization system. The proposed fusion combines the best of both worlds, excellent computer-vision-based localization, and strong identity information provided by the radio system, and is therefore able to perform tracking by identification, which makes it impervious to propagated identity switches. We present comprehensive methodology for evaluation of systems that perform person localization in world coordinate system and use it to evaluate the proposed system as well as its components. Experimental results on a challenging indoor dataset, which involves multiple people walking around a realistically cluttered room, confirm that proposed fusion of both systems significantly outperforms its individual components. Compared to the radio-based system, it achieves better localization results, while at the same time it successfully prevents propagation of identity switches that occur in pure computer-vision-based tracking.&lt;/p&gt;</description>
    </item>
    <item>
      <title>TraX: The visual Tracking eXchange Protocol and Library</title>
      <link>/publications/cehovin2017trax/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/cehovin2017trax/</guid>
      <description>&lt;p&gt;In this paper we address the problem of developing on-line visual tracking algorithms. We present a specialized communication protocol that serves as a bridge between a tracker implementation and utilizing application. It decouples development of algorithms and application, encouraging re-usability. The primary use case is algorithm evaluation where the protocol facilitates more complex evaluation scenarios that are used nowadays thus pushing forward the field of visual tracking. We present a reference implementation of the protocol that makes it easy to use in several popular programming languages and discuss where the protocol is already used and some usage scenarios that we envision for the future.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Unsupervised Learning of a Hierarchy of Topological Maps using Omnidirectional Images</title>
      <link>/publications/stimec2008unsupervised/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/stimec2008unsupervised/</guid>
      <description>&lt;p&gt;This paper presents a novel appearance-based method for path-based map learning by a mobile robot equipped with an omnidirectional camera. In particular, we focus on an unsupervised construction of topological maps, which provide an abstraction of the environment in terms of visual aspects. An unsupervised clustering algorithm is used to represent the images in multiple subspaces, forming thus a sensory grounded representation of the environment&amp;rsquo;s appearance. By introducing transitional fields between clusters we are able to obtain a partitioning of the image set into distinctive visual aspects. By abstracting the low-level sensory data we are able to efficiently reconstruct the overall topological layout of the covered path. After the high level topology is estimated, we repeat the procedure on the level of visual aspects to obtain local topological maps. We demonstrate how the resulting representation can be used for modeling indoor and outdoor environments, how it successfully detects previously visited locations and how it can be used for the estimation of the current visual aspect and the retrieval of the relative position within the current visual aspect.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Visual object tracking performance measures revisited</title>
      <link>/publications/cehovin2016visual/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/cehovin2016visual/</guid>
      <description>&lt;p&gt;The problem of visual tracking evaluation is sporting a large variety of performance measures, and largely suffers from lack of consensus about which measures should be used in experiments. This makes the cross-paper tracker comparison difficult. Furthermore, as some measures may be less effective than others, the tracking results may be skewed or biased towards particular tracking aspects. In this paper we revisit the popular performance measures and tracker performance visualizations and analyze them theoretically and experimentally. We show that several measures are equivalent from the point of information they provide for tracker comparison and, crucially, that some are more brittle than the others. Based on our analysis we narrow down the set of potential measures to only two complementary ones, describing accuracy and robustness, thus pushing towards homogenization of the tracker evaluation methodology. These two measures can be intuitively interpreted and visualized and have been employed by the recent Visual Object Tracking (VOT) challenges as the foundation for the evaluation methodology.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Visual re-identification across large, distributed camera networks</title>
      <link>/publications/kenk2015visual/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/kenk2015visual/</guid>
      <description>&lt;p&gt;We propose a holistic approach to the problem of re-identification in an environment of distributed smart cameras. We model the re-identification process in a distributed camera network as a distributed multi-class classifier, composed of spatially distributed binary classifiers. We treat the problem of re-identification as an open-world problem, and address novelty detection and forgetting. As there are many tradeoffs in design and operation of such a system, we propose a set of evaluation measures to be used in addition to the recognition performance. The proposed concept is illustrated and evaluated on a new many-camera surveillance dataset and SAIVT-SoftBio dataset.&lt;/p&gt;</description>
    </item>
    <item>
      <title>WaSR -- A Water Segmentation and Refinement Maritime Obstacle Detection Network</title>
      <link>/publications/bovcon2021wasr/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/bovcon2021wasr/</guid>
      <description>&lt;p&gt;Obstacle detection using semantic segmentation has become an established approach in autonomous vehicles. However, existing segmentation methods, primarily developed for ground vehicles, are inadequate in an aquatic environment as they produce many false positive (FP) detections in the presence of water reflections and wakes. We propose a novel deep encoder-decoder architecture, a water segmentation and refinement (WaSR) network, specifically designed for the marine environment to address these issues. A deep encoder based on ResNet101 with atrous convolutions enables the extraction of rich visual features, while a novel decoder gradually fuses them with inertial information from the inertial measurement unit (IMU). The inertial information greatly improves the segmentation accuracy of the water component in the presence of visual ambiguities, such as fog on the horizon. Furthermore, a novel loss function for semantic separation is proposed to enforce the separation of different semantic components to increase the robustness of the segmentation. We investigate different loss variants and observe a significant reduction in false positives and an increase in true positives (TP). Experimental results show that WaSR outperforms the current state-of-the-art by approximately 4% in F1-score on a challenging USV dataset. WaSR shows remarkable generalization capabilities and outperforms the state of the art by over 24% in F1 score on a strict domain generalization experiment.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Weighted and robust learning of subspace representations</title>
      <link>/publications/skocaj2007weighted/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/skocaj2007weighted/</guid>
      <description>&lt;p&gt;A reliable system for visual learning and recognition should enable a selective treatment of individual parts of input data and should successfully deal with noise and occlusions. These requirements are not satisfactorily met when visual learning is approached by appearance-based modeling of objects and scenes using the traditional PCA approach. In this paper we extend standard PCA approach to overcome these shortcomings. We first present a weighted version of PCA, which, unlike the standard approach, considers individual pixels and images selectively, depending on the corresponding weights. Then we propose a robust PCA method for obtaining a consistent subspace representation in the presence of outlying pixels in the training images. The method is based on the EM algorithm for estimation of principal subspaces in the presence of missing data. We demonstrate the efficiency of the proposed methods in a number of experiments.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Wide-angle camera distortions and non-uniform illumination in mobile robot tracking</title>
      <link>/publications/klancar2004wide-angle/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/publications/klancar2004wide-angle/</guid>
      <description>&lt;p&gt;In this paper some fundamentals and solutions to accompanying problems in vision system design for mobile robot tracking are presented. The main topics are correction of camera lens distortion and compensation of non-uniform illumination. Both correction methods contribute to vision system performance if implemented in the appropriate manner. Their applicability is demonstrated by applying them to vision for robot soccer. The lens correction method successfully corrects the distortion caused by the camera lens, thus achieving a more accurate and precise estimation of object position. The illumination compensation improves robustness to irregular and non-uniform illumination that is nearly always present in real conditions.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
